000 07865nam a22002537a 4500
008 210112b2019 a|||f mb|| 00| 0 eng d
040 _aEG-CaNU
_cEG-CaNU
041 0 _aeng
_beng
082 _a610
100 0 _aMohab Youssef Shawky
_9280
245 1 _aMoArLex:
_bAn Arabic Sentiment Lexicon Built Through Automatic Lexicon Expansion /
_cMohab Youssef Shawky
260 _c2019
300 _a 88 p.
_bill.
_c21 cm.
500 _3Supervisor: Mohamed A. El-Helw
502 _aThesis (M.A.)—Nile University, Egypt, 2019 .
504 _a"Includes bibliographical references"
505 0 _aContents: CHAPTER I Introduction ................................................................................................ 14 1.1 Problem Definition .................................................................................................. 14 1.2 Objectives ................................................................................................................ 15 1.3 Motivation ............................................................................................................... 15 1.4 Thesis organization ................................................................................................. 17 1.5 Publication related to this work .............................................................................. 17 CHAPTER II Literature Review ...................................................................................... 18 2.1 The beginning of Arabic Sentiment Lexicons ........................................................ 18 2.2 Current alternative Arabic lexicons ........................................................................ 19 2.2.1 Arabic sentiment analysis: Lexicon-based and Corpus-based [27] ................. 19 2.2.2 Automatic expandable large-scale sentiment lexicon of Modern Standard Arabic and Colloquial [36] ....................................................................................... 23 2.2.3 SANA: A Large-Scale Multi-Genre, Multi-Dialect Lexicon for Arabic Subjectivity and Sentiment Analysis [18] ................................................................. 26 2.2.4 A Proposed Sentiment Analysis Tool for Modern Arabic ............................... 28 vi Using Human-Based Computing [40] ....................................................................... 28 2.2.5 Mining Arabic Business Reviews [41] ............................................................ 30 2.2.6 Idioms-Proverbs Lexicon for Modern Standard Arabic and Colloquial Sentiment Analysis [44] ............................................................................................ 32 2.2.7 BiSAL - A bilingual sentiment analysis lexicon to analyse Dark Web forums for cybersecurity [46] ................................................................................................ 35 2.2.8 NileULex: A Phrase and Word Level Sentiment Lexicon for Egyptian and Modern Standard Arabic [10] ................................................................................... 39 2.2.9 Sentiment Lexicons for Arabic Social Media [16] .......................................... 44 2.2.9 A web-based tool for Arabic sentiment analysis [58] ...................................... 48 2.2.10 Lexicon-Based Sentiment Analysis of Arabic Tweets [65] ........................... 51 2.2.11 Improving Sentiment Analysis in Arabic Using Word Representation [67] 54 2.2 Summary of Lexicon Expansion Works ........................................................... 57 2.3 Conclusion .............................................................................................................. 59 CHAPTER III The Proposed System: Arabic Sentiment Lexicon Construction and Sentiment Analysis Tool ................................................................................................... 60 3.1 Methodology ........................................................................................................... 60 3.1.1 Lexicon Expansion ........................................................................................... 60 3.1.1.1 Generation of candidate terms .................................................................. 62 3.1.1.2 Filtering of candidate terms ...................................................................... 64 3.1.1.3 Polarity determination ............................................................................... 65 3.1.1.4 Time taken for building the lexicon .......................................................... 69 3.1.1.5 Hardware & software used for building lexicon ....................................... 69 3.2 Main features of the presented lexicon ................................................................... 69 vii 3.3 A Simple Sentiment Analysis Tool for Lexicon Evaluation ................................... 70 CHAPTER IV Evaluation ................................................................................................. 73 Introduction ................................................................................................................... 73 4.1 Manual Evaluation of the Lexicon .......................................................................... 73 4.2 Comparison between MoArLex and other lexicons ............................................... 76 4.3 Supervised Learning Experiment ............................................................................ 79 4.4 Evaluating the Lexicon through Sentiment Analysis .............................................. 80 4.5 Conclusion .............................................................................................................. 80 CHAPTER V Conclusion and Future Work ..................................................................... 82 5.1 Conclusion .............................................................................................................. 82 5.2 Future Work ............................................................................................................ 82 References .........................................................................................................................
520 3 _aAbstract: Research addressing Sentiment Analysis has witnessed great attention over the last decade especially after the huge increase in social media network usage. Social networks like Facebook and Twitter generate an incredible amount of data on a daily basis, containing posts that discuss all kinds of different topics ranging from sports and products to politics and current events. Since data generated within these mediums are created by users from all over the world, it is multilingual in nature. Arabic is one of the important languages recently targeted by many sentiment analysis efforts. However, Arabic is considered to be under-resourced in terms of lexicons and datasets when compared to English. This work presents a novel technique for automatically expanding an Arabic sentiment lexicon using word embeddings. The main aim of this work is to build high coverage Arabic lexicon automatically. The main objective of this work is to use the built Arabic lexicon in a sentiment analysis task. Moreover, the proposed system is designed to overcome the low accuracy problem of the other automatically expanded Arabic lexicons. Evaluation of the quality of the automatically added terms was done in multiple ways, all of which have shown that lexicon entries added using the presented way are more accurate than sentiment lexicon entries obtained using machine learning or distant supervision methods.
546 _aText in English, abstracts in English.
650 4 _aInformatics-IFM
_9266
655 7 _2NULIB
_aDissertation, Academic
_9187
690 _aInformatics-IFM
_9266
942 _2ddc
_cTH
999 _c8818
_d8818