TY - BOOK AU - Omar Khaled Enayet TI - Rumour Mining On Social Media U1 - 610 PY - 2018/// KW - Informatics-IFM KW - NULIB KW - Dissertation, Academic N1 - Thesis (M.A.)—Nile University, Egypt, 2018; "Includes bibliographical references"; Contents: Chapter 1 Introduction .................................................................................................................11 1.1 Rumours on Social Media .....................................................................................................12 1.2 Twitter’s Terminology ..........................................................................................................12 1.3 Rumour Definition ...............................................................................................................13 1.4 Machine Learning and Data Mining ........................................................................................13 1.5 Natural Language Processing .................................................................................................14 1.6 Text Mining .........................................................................................................................14 1.7 Rumour Mining ....................................................................................................................14 1.8 SemEval Task ......................................................................................................................14 1.9 Applications and Motivation ..................................................................................................15 1.9.1 Security ........................................................................................................................15 1.9.2 Professional Journalism ..................................................................................................15 1.10 Thesis Organization ..............................................................................................................16 Chapter 2 Introduction to Rumour Mining ......................................................................................17 2.1 Active Research topics ..........................................................................................................17 2.1.1 Rumour Detection .........................................................................................................18 2.1.2 Analysing Replies to Rumours ........................................................................................20 2.1.3 Estimating Veracity of Rumours .....................................................................................22 2.1.4 Identifying the Rumour Source .......................................................................................23 2.1.5 Analysing the Diffusion of Rumours ................................................................................23 2.2 Data Collection and Annotation .............................................................................................24 2.2.1 Detecting Rumorous Topic(s) .........................................................................................24 2.2.2 Collecting Related Microblogs ........................................................................................24 2.2.3 Annotating Microblogs ..................................................................................................25 2.3 Conclusion ..........................................................................................................................25 Chapter 3 Features for Rumour Mining ..........................................................................................26 3.1 Content-Based Features .........................................................................................................26 6 3.1.1 Denial Terms ................................................................................................................26 3.1.2 Support Terms ..............................................................................................................26 3.1.3 N-Grams ......................................................................................................................26 3.1.4 Emoticons ....................................................................................................................27 3.1.5 Tweet Sentiment ...........................................................................................................27 3.1.6 Questions & Queries ......................................................................................................27 3.1.7 URL Existence ..............................................................................................................28 3.1.8 Hashtag Existence .........................................................................................................28 3.1.9 Multimedia Existence ....................................................................................................28 3.1.10 Tweet is Reply/Retweet .................................................................................................29 3.1.11 Number of Replies/Retweets ...........................................................................................29 3.1.12 Similarity to Replied-to Tweet ........................................................................................29 3.1.13 Retweet Ratio ...............................................................................................................30 3.1.14 Tweet Complexity .........................................................................................................30 3.1.15 Part of Speech Features ..................................................................................................30 3.1.16 Consistency with External Sources ..................................................................................30 3.1.17 Named Entities and Events .............................................................................................30 3.1.18 Post Time .....................................................................................................................31 3.2 User Based Features .............................................................................................................31 3.2.1 User is Verified .............................................................................................................31 3.2.2 User Being Replied to is Verified ....................................................................................31 3.2.3 User who Tweeted a Rumour’s Source Tweet is Verified ...................................................31 3.2.4 Number of Followers .....................................................................................................31 3.2.5 Ratio of User’s Original Tweets ......................................................................................32 3.2.6 Number of User’s Friends ..............................................................................................32 3.2.7 Days since User Account’s Creation ................................................................................32 3.2.8 Client used ...................................................................................................................32 3.2.9 User Controversiality .....................................................................................................32 3.3 Rumour Network Based Features ...........................................................................................32 3.3.1 Rumour Replies Analysis ...............................................................................................33 3.3.2 Rumour Network Graph Analysis ....................................................................................33 3.3.3 Temporal Changes .........................................................................................................33 3.4 Conclusion ..........................................................................................................................33 7 Chapter 4 Proposed System Overview ............................................................................................35 4.1 System Overview .................................................................................................................35 4.2 Data Set Description .............................................................................................................35 4.3 Pre-processing .....................................................................................................................37 4.4 Speech Act Classification of Replies to Rumours .....................................................................38 4.4.1 Problem Statement ........................................................................................................38 4.4.2 Data Set Example ..........................................................................................................38 4.4.3 Used Features ...............................................................................................................39 4.5 Veracity Prediction of Rumours .............................................................................................40 4.5.1 Problem Statement ........................................................................................................40 4.5.2 Data Set Example ..........................................................................................................40 4.5.3 Used Features ...............................................................................................................40 4.6 Feature Selection ..................................................................................................................41 4.7 Machine Learning Classifiers .................................................................................................41 4.8 Conclusion ..........................................................................................................................43 Chapter 5 System Evaluation ........................................................................................................44 5.1 Speech Act Classification of Replies to Rumours .....................................................................44 5.1.1 Evaluation ....................................................................................................................44 5.1.2 RumourEval Evaluation .................................................................................................47 5.1.3 Conclusion ...................................................................................................................47 5.2 Veracity Prediction of Rumours .............................................................................................48 5.2.1 Evaluation ....................................................................................................................48 5.2.2 RumourEval Evaluation .................................................................................................50 5.2.3 Conclusion ...................................................................................................................51 Chapter 6 Conclusion and Future Work ..........................................................................................53 6.1 Summary .............................................................................................................................53 6.2 Future Directions ..................................................................................................................53 6.2.1 Finding Hidden Meanings in Text and Multimedia ............................................................53 6.2.2 Exploiting external resources ..........................................................................................54 6.2.3 Analysing Big Picture of Rumour Network.......................................................................55 References N2 - Abstract: False news or rumours may greatly affect the social, economic and political stability of any society around the world. In this thesis, we provide a comprehensive survey about the latest research about rumour analysis on social media, by categorizing its research topics, as well as detailing the features, resources and techniques used in the literature. We shed some light on the applications of rumour analysis such as security and professional journalism. A rumour analysis system is proposed for the speech act categorization of replies to rumorous tweets and the estimation of a rumour veracity. A number of content-based, user-based and rumour-network based features are extracted from a Twitter data set and are experimented with. We present our cross-validation results where an ensemble classifier for speech act categorization of replies to rumours tweets achieved an accuracy of 77.12%. On the other hand, a neural network classifier for rumour veracity detection on Twitter achieved an accuracy of 57%. We analyse the evaluation results of the system which was used to address the task “RumourEval: Determining rumour veracity and support for rumours” in SemEval 2017. The evaluation of the system in SemEval 2017 revealed that the developed system can achieve relatively good results, especially for rumour verification where it achieved 1st rank ER -