Unsupervised Taxonomy Learning / (Record no. 8778)

MARC details
000 -LEADER
fixed length control field 07861nam a22002537a 4500
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 210111b2014 a|||f mb|| 00| 0 eng d
040 ## - CATALOGING SOURCE
Original cataloging agency EG-CaNU
Transcribing agency EG-CaNU
041 0# - Language Code
Language code of text eng
Language code of abstract eng
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number 627
100 0# - MAIN ENTRY--PERSONAL NAME
Personal name Mahmoud Mostafa Hosny
245 1# - TITLE STATEMENT
Title Unsupervised Taxonomy Learning /
Statement of responsibility, etc. Mahmoud Mostafa Hosny
260 ## - PUBLICATION, DISTRIBUTION, ETC.
Date of publication, distribution, etc. 2014
300 ## - PHYSICAL DESCRIPTION
Extent 57 p.
Other physical details ill.
Dimensions 21 cm.
500 ## - GENERAL NOTE
Materials specified Supervisor: Mahmoud Allam
502 ## - Dissertation Note
Dissertation type Thesis (M.A.)—Nile University, Egypt, 2014 .
504 ## - Bibliography
Bibliography "Includes bibliographical references"
505 0# - Contents
Formatted contents note Contents:<br/>Dedication ................................................................................................................... iv<br/>Acknowledgments......................................................................................................... v<br/>List of Tables ............................................................................................................ viii<br/>List of Figures ............................................................................................................. ix<br/>Abstract ......................................................................................................................... x<br/>Introduction ................................................................................................................... 1<br/>Motivation ......................................................................................................... 1<br/>Contribution ...................................................................................................... 2<br/>The proposed unsupervised taxonomy learning approach ................................ 3<br/>Structure of the thesis........................................................................................ 4<br/>Background and Related work ...................................................................................... 5<br/>Text Mining ...................................................................................................... 5<br/>Data collection ......................................................................................... 6<br/>Language Identification .................................................................. 6<br/>Data preprocessing ................................................................................... 7<br/>Tokenization ................................................................................... 7<br/>Normalization of data ..................................................................... 7<br/>Stop words removal ........................................................................ 8<br/>Data representation .................................................................................. 8<br/>Text Mining ............................................................................................. 9<br/>Text classification ........................................................................... 9<br/>Unsupervised taxonomy learning ................................................................... 11<br/>Wikipedia ........................................................................................................ 14<br/>Wikipedia knowledge inclusion in text mining .............................................. 17<br/>Unsupervised Taxonomy Learning ............................................................................. 20<br/>Document preparation ..................................................................................... 21<br/>Keyphrase extraction ...................................................................................... 21<br/>Category extraction ......................................................................................... 22<br/>Querying Wikipedia ...............................................................................<br/>Category parsing .................................................................................... 24<br/>Category refining ................................................................................... 25<br/>Taxonomy building ......................................................................................... 27<br/>Case Study .................................................................................................................. 29<br/>Unsupervised taxonomy scheme generation from Arabic dataset .................. 29<br/>Wikipedia Dataset ........................................................................................... 33<br/>Local Wikipedia article dataset ............................................................. 33<br/>Building local Wikipedia category tree ................................................ 35<br/>Implementation ............................................................................................... 37<br/>The Document preparation module ...................................................... 37<br/>The Keyphrase extraction module ........................................................ 38<br/>The Category extraction module ........................................................... 39<br/>The Taxonomy building module ........................................................... 39<br/>Experimental results........................................................................................ 40<br/>Conclusion and future work ........................................................................................ 46<br/>Future work ..................................................................................................... 46<br/>Results improvements ........................................................................... 46<br/>Conformation improvements ................................................................ 47<br/>References ................................................................................................................... 48<br/>Appendix: Sample category list and associated keyphrases ....................................... 54
520 3# - Abstract
Abstract Abstract:<br/>The ability of effectively organizing textual information is one of the great challenges in intelligent text processing. This is especially becoming more essential with the increasing amount of data that are continuously being generated. A key technique in the organization of information is automated text classification. The classification accuracy achieved by automated methods and approaches have shown good results and performance as effective as its human comparative, thus making text classification an attractive technique for information organization. However, automated text classification techniques depend on predefined classification schemes and training datasets in order to correctly accomplish their goals. Manual input procedure and development for building classification schemes and taxonomies are prone to biases and errors and is extremely costly and time consuming especially with large amounts of data. This justifies the need for methodologies and approaches that enable the automation of the process. In this thesis we present an unsupervised computer-aided tool for automatically building classification schemes and taxonomies for enhancing the process of automated text classification. The tool utilizes the Wikipedia knowledge base and its categorization system. Validation of the tool was done using a subset of a large language dataset obtained from the Google moderator series (Egypt 2.0) idea bank. The output of the tool was evaluated by comparing the similarity between the results obtained automatically from the tool, and those set manually by three different human evaluators, verifying the effectiveness of the tool. The tool showed effectiveness with a precision of 88.6% and recall of 81.2%.
546 ## - Language Note
Language Note Text in English, abstracts in English .
650 #4 - Subject
Subject Software Engineering
655 #7 - Index Term-Genre/Form
Source of term NULIB
focus term Dissertation, Academic
690 ## - Subject
School Software Engineering
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme Dewey Decimal Classification
Koha item type Thesis
650 #4 - Subject
-- 211
655 #7 - Index Term-Genre/Form
-- 187
690 ## - Subject
-- 211
Holdings
Withdrawn status Lost status Source of classification or shelving scheme Damaged status Not for loan Home library Current library Date acquired Total Checkouts Full call number Date last seen Price effective from Koha item type
    Dewey Decimal Classification   Not For Loan Main library Main library 01/11/2021   627/ M.H.U 2014 01/11/2021 01/11/2021 Thesis