Unsupervised Taxonomy Learning / (Record no. 8778)
[ view plain ]
000 -LEADER | |
---|---|
fixed length control field | 07861nam a22002537a 4500 |
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION | |
fixed length control field | 210111b2014 a|||f mb|| 00| 0 eng d |
040 ## - CATALOGING SOURCE | |
Original cataloging agency | EG-CaNU |
Transcribing agency | EG-CaNU |
041 0# - Language Code | |
Language code of text | eng |
Language code of abstract | eng |
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER | |
Classification number | 627 |
100 0# - MAIN ENTRY--PERSONAL NAME | |
Personal name | Mahmoud Mostafa Hosny |
245 1# - TITLE STATEMENT | |
Title | Unsupervised Taxonomy Learning / |
Statement of responsibility, etc. | Mahmoud Mostafa Hosny |
260 ## - PUBLICATION, DISTRIBUTION, ETC. | |
Date of publication, distribution, etc. | 2014 |
300 ## - PHYSICAL DESCRIPTION | |
Extent | 57 p. |
Other physical details | ill. |
Dimensions | 21 cm. |
500 ## - GENERAL NOTE | |
Materials specified | Supervisor: Mahmoud Allam |
502 ## - Dissertation Note | |
Dissertation type | Thesis (M.A.)—Nile University, Egypt, 2014 . |
504 ## - Bibliography | |
Bibliography | "Includes bibliographical references" |
505 0# - Contents | |
Formatted contents note | Contents:<br/>Dedication ................................................................................................................... iv<br/>Acknowledgments......................................................................................................... v<br/>List of Tables ............................................................................................................ viii<br/>List of Figures ............................................................................................................. ix<br/>Abstract ......................................................................................................................... x<br/>Introduction ................................................................................................................... 1<br/>Motivation ......................................................................................................... 1<br/>Contribution ...................................................................................................... 2<br/>The proposed unsupervised taxonomy learning approach ................................ 3<br/>Structure of the thesis........................................................................................ 4<br/>Background and Related work ...................................................................................... 5<br/>Text Mining ...................................................................................................... 5<br/>Data collection ......................................................................................... 6<br/>Language Identification .................................................................. 6<br/>Data preprocessing ................................................................................... 7<br/>Tokenization ................................................................................... 7<br/>Normalization of data ..................................................................... 7<br/>Stop words removal ........................................................................ 8<br/>Data representation .................................................................................. 8<br/>Text Mining ............................................................................................. 9<br/>Text classification ........................................................................... 9<br/>Unsupervised taxonomy learning ................................................................... 11<br/>Wikipedia ........................................................................................................ 14<br/>Wikipedia knowledge inclusion in text mining .............................................. 17<br/>Unsupervised Taxonomy Learning ............................................................................. 20<br/>Document preparation ..................................................................................... 21<br/>Keyphrase extraction ...................................................................................... 21<br/>Category extraction ......................................................................................... 22<br/>Querying Wikipedia ...............................................................................<br/>Category parsing .................................................................................... 24<br/>Category refining ................................................................................... 25<br/>Taxonomy building ......................................................................................... 27<br/>Case Study .................................................................................................................. 29<br/>Unsupervised taxonomy scheme generation from Arabic dataset .................. 29<br/>Wikipedia Dataset ........................................................................................... 33<br/>Local Wikipedia article dataset ............................................................. 33<br/>Building local Wikipedia category tree ................................................ 35<br/>Implementation ............................................................................................... 37<br/>The Document preparation module ...................................................... 37<br/>The Keyphrase extraction module ........................................................ 38<br/>The Category extraction module ........................................................... 39<br/>The Taxonomy building module ........................................................... 39<br/>Experimental results........................................................................................ 40<br/>Conclusion and future work ........................................................................................ 46<br/>Future work ..................................................................................................... 46<br/>Results improvements ........................................................................... 46<br/>Conformation improvements ................................................................ 47<br/>References ................................................................................................................... 48<br/>Appendix: Sample category list and associated keyphrases ....................................... 54 |
520 3# - Abstract | |
Abstract | Abstract:<br/>The ability of effectively organizing textual information is one of the great challenges in intelligent text processing. This is especially becoming more essential with the increasing amount of data that are continuously being generated. A key technique in the organization of information is automated text classification. The classification accuracy achieved by automated methods and approaches have shown good results and performance as effective as its human comparative, thus making text classification an attractive technique for information organization. However, automated text classification techniques depend on predefined classification schemes and training datasets in order to correctly accomplish their goals. Manual input procedure and development for building classification schemes and taxonomies are prone to biases and errors and is extremely costly and time consuming especially with large amounts of data. This justifies the need for methodologies and approaches that enable the automation of the process. In this thesis we present an unsupervised computer-aided tool for automatically building classification schemes and taxonomies for enhancing the process of automated text classification. The tool utilizes the Wikipedia knowledge base and its categorization system. Validation of the tool was done using a subset of a large language dataset obtained from the Google moderator series (Egypt 2.0) idea bank. The output of the tool was evaluated by comparing the similarity between the results obtained automatically from the tool, and those set manually by three different human evaluators, verifying the effectiveness of the tool. The tool showed effectiveness with a precision of 88.6% and recall of 81.2%. |
546 ## - Language Note | |
Language Note | Text in English, abstracts in English . |
650 #4 - Subject | |
Subject | Software Engineering |
655 #7 - Index Term-Genre/Form | |
Source of term | NULIB |
focus term | Dissertation, Academic |
690 ## - Subject | |
School | Software Engineering |
942 ## - ADDED ENTRY ELEMENTS (KOHA) | |
Source of classification or shelving scheme | Dewey Decimal Classification |
Koha item type | Thesis |
650 #4 - Subject | |
-- | 211 |
655 #7 - Index Term-Genre/Form | |
-- | 187 |
690 ## - Subject | |
-- | 211 |
Withdrawn status | Lost status | Source of classification or shelving scheme | Damaged status | Not for loan | Home library | Current library | Date acquired | Total Checkouts | Full call number | Date last seen | Price effective from | Koha item type |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Dewey Decimal Classification | Not For Loan | Main library | Main library | 01/11/2021 | 627/ M.H.U 2014 | 01/11/2021 | 01/11/2021 | Thesis |