Text Auto-Tagging Using Wikipiedia / Shaimaa Abdelber Shamseldin Ali

By:

Shaimaa Abdelber Shamseldin Ali

Material type: Text

TextLanguage: English Summary language: English Publication details: 2018Description: 95 p. ill. 21 cmSubject(s):

Genre/Form:

Dissertation, Academic

DDC classification:

Contents:

Contents: Chapter 1: Introduction ................................................................................... 1 1.1 Motivation…………. ..................................................................................... 1 1.2 Problem definition………………. ................................................................ 2 1.3 Contributions. …………….. .......................................................................... 2 1.4 Thesis outline ………………. ....................................................................... 3 Chapter 2: Background ................................................................................... 4 2.1 Wikipedia…………….. ................................................................................. 4 2.2 Text mining…………….. .............................................................................. 5 2.2.1 Text mining application…………….. ................................................. 6 2.2.2 Text mining pre-processing …………….. ........................................... 7 2.2.3 Information Retrieval (IR) …………….. ............................................ 8 2.2.4 Word Sense Disambiguation (WSD) …………….. .......................... 11 2.3 Measuring semantic relatedness…………….. ............................................. 12 2.3.1 Cosine Similarity…………….. ........................................................ 12 2.3.2 The Jaccard Cofficient…………….. ................................................ 13 2.3.3 Milne and Witten’s Wikipedia Link-based Measure (WLM)…………….. ........................................................................ 14 2.4 Information retrieval evaluation measures…………….. ............................. 17 Chapter 3: Related Work .............................................................................. 19 3.1 Wikify! Linking Documents to Encyclopedia knowledge ......................... 19 3.2 Learning to Link with Wikipedia .............................................................. 25 3.3 Fast and accurate annotation of short text with Wikipedia pages ............. 30 Table of Contents v 3.3.1 Information Stored ..................................................................... 31 3.3.2 Algorithm Applied ..................................................................... 32 3.4 A model for Auto-Tagging of Research Papers based on Keyphrase Extraction Methods ................................................................................... 37 Chapter 4: Design and Implementation ........................................................ 39 4.1 Design objective ......................................................................................... 39 4.2 The proposed approach .............................................................................. 40 4.2.1 Phase 1: Building the concept dictionary ...................................... 41 4.2.1.1 Extract needed information from Wikipedia and carry out processing on it ............................................................... 41 4.2.1.2 Perform entry filtration ................................................... 52 4.2.1.3 Measure semantic relatedness .......................................... 53 4.2.1.4 Build an inverted index of dictionary entries .................. 56 4.2.1.5 Perform entry partitioning ................................................ 57 4.2.2 Phase 2: Tagging input text........................................................... 58 Chapter 5: Evaluation ................................................................................... 68 5.1 Building the evaluation dataset .................................................................. 68 5.2 Result ........................................................................................................ 71 5.3 Conclusion ................................................................................................ 71 Chapter 6: Conclusion and Future Work ...................................................... 73 6.1 Summary and Conclusion .......................................................................... 73 6.2 Future Work ............................................................................................... 74 List of Abbreviations ..................................................................................... 76 References ......................................................................................................

Dissertation note: Thesis (M.A.)—Nile University, Egypt, 2018 . Abstract: Abstract: Because of large amounts of unstructured text data generated on the Internet, Text mining is believed to have high opportunity to significant developments. An important goal of text mining is to sift through large volumes of text to extract patterns and models that can then be incorporated in intelligent applications, such as automatic text categorizers and named entity recognition. This dissertation proposes an efficient method for automatically annotating Arabic news stories with tags using Wikipedia. The idea of the system is to use Wikipedia article names, properties, and re-directs to build a pool of meaningful tags. Sophisticated and efficient matching methods are then used to detect text fragments in input news stories that correspond to entries in the constructed tag pool. Generated tags represent real life entities or concepts such as the names of popular places, known organizations, celebrities, etc. These tags can be used indirectly by a news site for indexing, clustering, classification, statistics generation or directly to give a news reader an overview of news story contents. Evaluation of the system has shown that the tags it generates are better than those generated by MSN Arabic news.

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings
Item type	Current library	Call number	Status	Date due	Barcode
Thesis	Main library	610 / S.A.T / 2018 (Browse shelf(Opens below))	Not For Loan

Supervisor: Samhaa El-Beltagy

Thesis (M.A.)—Nile University, Egypt, 2018 .

"Includes bibliographical references"

Contents:
Chapter 1: Introduction ................................................................................... 1
1.1 Motivation…………. ..................................................................................... 1
1.2 Problem definition………………. ................................................................ 2
1.3 Contributions. …………….. .......................................................................... 2
1.4 Thesis outline ………………. ....................................................................... 3
Chapter 2: Background ................................................................................... 4
2.1 Wikipedia…………….. ................................................................................. 4
2.2 Text mining…………….. .............................................................................. 5
2.2.1 Text mining application…………….. ................................................. 6
2.2.2 Text mining pre-processing …………….. ........................................... 7
2.2.3 Information Retrieval (IR) …………….. ............................................ 8
2.2.4 Word Sense Disambiguation (WSD) …………….. .......................... 11
2.3 Measuring semantic relatedness…………….. ............................................. 12
2.3.1 Cosine Similarity…………….. ........................................................ 12
2.3.2 The Jaccard Cofficient…………….. ................................................ 13
2.3.3 Milne and Witten’s Wikipedia Link-based Measure (WLM)…………….. ........................................................................ 14
2.4 Information retrieval evaluation measures…………….. ............................. 17
Chapter 3: Related Work .............................................................................. 19
3.1 Wikify! Linking Documents to Encyclopedia knowledge ......................... 19
3.2 Learning to Link with Wikipedia .............................................................. 25
3.3 Fast and accurate annotation of short text with Wikipedia pages ............. 30
Table of Contents
v
3.3.1 Information Stored ..................................................................... 31
3.3.2 Algorithm Applied ..................................................................... 32
3.4 A model for Auto-Tagging of Research Papers based on Keyphrase Extraction Methods ................................................................................... 37
Chapter 4: Design and Implementation ........................................................ 39
4.1 Design objective ......................................................................................... 39
4.2 The proposed approach .............................................................................. 40
4.2.1 Phase 1: Building the concept dictionary ...................................... 41
4.2.1.1 Extract needed information from Wikipedia and carry out processing on it ............................................................... 41
4.2.1.2 Perform entry filtration ................................................... 52
4.2.1.3 Measure semantic relatedness .......................................... 53
4.2.1.4 Build an inverted index of dictionary entries .................. 56
4.2.1.5 Perform entry partitioning ................................................ 57
4.2.2 Phase 2: Tagging input text........................................................... 58
Chapter 5: Evaluation ................................................................................... 68
5.1 Building the evaluation dataset .................................................................. 68
5.2 Result ........................................................................................................ 71
5.3 Conclusion ................................................................................................ 71
Chapter 6: Conclusion and Future Work ...................................................... 73
6.1 Summary and Conclusion .......................................................................... 73
6.2 Future Work ............................................................................................... 74
List of Abbreviations ..................................................................................... 76
References ......................................................................................................

Abstract:
Because of large amounts of unstructured text data generated on the Internet, Text mining is believed to have high opportunity to significant developments. An important goal of text mining is to sift through large volumes of text to extract patterns and models that can then be incorporated in intelligent applications, such as automatic text categorizers and named entity recognition. This dissertation proposes an efficient method for automatically annotating Arabic news stories with tags using Wikipedia. The idea of the system is to use Wikipedia article names, properties, and re-directs to build a pool of meaningful tags. Sophisticated and efficient matching methods are then used to detect text fragments in input news stories that correspond to entries in the constructed tag pool. Generated tags represent real life entities or concepts such as the names of popular places, known organizations, celebrities, etc. These tags can be used indirectly by a news site for indexing, clustering, classification, statistics generation or directly to give a news reader an overview of news story contents. Evaluation of the system has shown that the tags it generates are better than those generated by MSN Arabic news.

Text in English, abstracts in English.

There are no comments on this title.

to post a comment.