A Multi-Embeddings Approach Coupled with Deep Learning for Arabic Named Entity Recognition / (Record no. 9080)

MARC details
000 -LEADER
fixed length control field 10703nam a22002537a 4500
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 210831s2021 ||||f mb|| 00| 0 eng d
040 ## - CATALOGING SOURCE
Original cataloging agency EG-CaNU
Transcribing agency EG-CaNU
041 0# - Language Code
Language code of text eng
Language code of abstract eng
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number 610
100 0# - MAIN ENTRY--PERSONAL NAME
Personal name Abeer Youssef Mohamed
245 1# - TITLE STATEMENT
Title A Multi-Embeddings Approach Coupled with Deep Learning for Arabic Named Entity Recognition /
Statement of responsibility, etc. Abeer Youssef Mohamed
260 ## - PUBLICATION, DISTRIBUTION, ETC.
Date of publication, distribution, etc. 2021
300 ## - PHYSICAL DESCRIPTION
Extent 89 p.
Other physical details ill.
Dimensions 21 cm.
500 ## - GENERAL NOTE
General note Supervisor: Samhaa El-Beltagy
502 ## - Dissertation Note
Dissertation type Thesis (M.A.)—Nile University, Egypt, 2021 .
504 ## - Bibliography
Bibliography "Includes bibliographical references"
505 0# - Contents
Formatted contents note Contents:<br/>ABSTRACT ................................................................................................................................ I<br/>DEDICATION ........................................................................................................................... II<br/>ACKNOWLEDGEMENTS..................................................................................................... III<br/>PUBLICATION RELATED TO THIS WORK .................................................................... IV<br/>LIST OF TABLES ................................................................................................................. VII<br/>LIST OF FIGURES .............................................................................................................. VIII<br/>LIST OF ABBREVIATIONS .................................................................................................. X<br/>CHAPTER 1 INTRODUCTION .............................................................................................. 1<br/>1.1 PROBLEM DEFINITION ......................................................................................................... 1<br/>1.2 OBJECTIVES ......................................................................................................................... 2<br/>1.3 MOTIVATION ....................................................................................................................... 3<br/>1.4 CONTRIBUTION .................................................................................................................... 3<br/>1.5 THESIS ORGANIZATION ....................................................................................................... 3<br/>CHAPTER 2 BACKGROUND ................................................................................................. 5<br/>2.1 ARABIC DATASETS .............................................................................................................. 5<br/>2.2 ARTIFICIAL NEURAL NETWORKS (ANN) ............................................................................ 6<br/>2.2.1 Recurrent Neural Networks (RNN) ............................................................................. 7<br/>2.3 CONDITIONAL RANDOM FIELD (CRF) ................................................................................. 9<br/>2.4 WORD EMBEDDINGS ......................................................................................................... 10<br/>2.4.1 Static Embeddings ..................................................................................................... 11<br/>2.4.2 Contextual Embeddings ............................................................................................ 12<br/>CHAPTER 3 LITERATURE REVIEW ON NAMED ENTITY RECOGNITION .......... 14<br/>3.1 PREAMBLE ......................................................................................................................... 14<br/>3.2 RULE-BASED APPROACHES FOR NER USING GAZEETTER AND POS .................................. 14<br/>3.3 MACHINE LEARNING APPROACHES FOR NER ................................................................... 18<br/>3.3.1 A Machine Learning Approach using Brown Clustering Technique ........................ 18<br/>3.3.2 Machine Learning Approach using Bayesian Classifier Combination .................... 19<br/>3.4 HYBRID APPROACHES FOR NER ....................................................................................... 20<br/>3.4.1 A Hybrid Approach using Naïve Bayes Classifier and Dictionary ........................... 20<br/>3.4.2 A Hybrid Approach using Decision Tree and POS ................................................... 21<br/>3.5 DEEP LEARNING APPROACHES FOR NER .......................................................................... 22<br/>3.5.1 A Deep Learning Approaches using Conditional Random Field.............................. 23<br/>3.5.2 A Deep Learning Approaches using Attention Mechanism ...................................... 28<br/>3.5.3 A Deep Learning Approach using Pooled Contextualized Embeddings ................... 32<br/>vi<br/>3.5.4 A Deep Learning Approach using Semi-Supervised Co-Training (SVM and BiLSTM-CRF) .................................................................................................................................. 33<br/>3.6 CHAPTER SUMMARY ......................................................................................................... 36<br/>CHAPTER 4 RESEARCH METHODOLOGY .................................................................... 37<br/>4.1 PREAMBLE ......................................................................................................................... 37<br/>4.2 DATASET ........................................................................................................................... 37<br/>4.3 PROPOSED MODEL ............................................................................................................ 39<br/>4.3.1 Multi-Embeddings ..................................................................................................... 41<br/>4.3.2 Encoding ................................................................................................................... 44<br/>4.3.3 Decoding ................................................................................................................... 44<br/>4.4 TESTING OF OTHER MULTI-EMBEDDINGS MODELS ........................................................... 45<br/>4.4.1 Pooled Contextual Embedding with Word2Vec ........................................................ 45<br/>4.4.2 Pooled Contextual Embedding with fastText ............................................................ 47<br/>4.4.3 Pooled Contextual Embedding with Multilingual BERT .......................................... 49<br/>4.4.4 Pooled Contextual Embedding with Arabic BERT ................................................... 50<br/>4.4.5 Pooled Contextual Embedding with fastText and Word2Vec ................................... 51<br/>4.4.6 Pooled Contextual Embedding with Arabic BERT, and Word2Vec ......................... 53<br/>4.4.7 Pooled Contextual Embedding with XLNet, and fastText ......................................... 54<br/>4.4.8 Pooled Contextual Embedding with XLM-RoBERTa, and fastText .......................... 55<br/>4.4.9 Pooled Contextual Embedding with Arabic BERT, and fastText .............................. 56<br/>CHAPTER 5 EVALUATION ................................................................................................. 58<br/>5.1 PREAMBLE ...................................................................................................................... 58<br/>5.2 EXPERIMENTATION SETUP AND IMPLEMENTATION............................................................ 58<br/>5.3 QUALITY METRICS ............................................................................................................ 58<br/>5.4 RESULTS AND DISCUSSION ................................................................................................ 59<br/>5.5 ERROR ANALYSIS .............................................................................................................. 64<br/>5.6 CHAPTER SUMMARY ......................................................................................................... 65<br/>CHAPTER 6 CONCLUSION AND FUTURE WORK ........................................................ 67<br/>6.1 CONCLUSION ..................................................................................................................... 67<br/>6.2 FUTURE WORK .................................................................................................................. 67<br/>REFERENCES .........................................................................................................................
520 3# - Abstract
Abstract Abstract:<br/>Named Entity Recognition (NER) is an essential task in many natural language processing applications. Extracting crucial information using NER is a primary phase in most NLP downstream tasks, it is used to identify entities into predefined classes. There are several studies that have focused on NER for the English language. However, there are some limitations when applying the current methodologies directly to the Arabic language text. Recent studies have shown the effectiveness of contextual embedding representations and their significant improvements that were shown in English NER tasks.<br/>The aim of this thesis is to examine the hypotheses, that a multi-embeddings approach coupled with a deep learning network is capable of extracting named entities from Arabic sentences and thereby reduces the manual costs. Also, to examine the hypothesis, that the end-to-end deep learning NER approach is capable to outperform the traditional models based on rule-based or machine learning approaches.<br/>To evaluate these hypotheses, an end-to-end deep learning model has been presented which was able to extract Arabic Named Entities and eventually reduce human efforts such as preprocessing work and feature engineering. To reach the best model, different types of classical and contextual embeddings have been combined and evaluated. Thus, the presented model utilized a combination of several types of embeddings (multi-embeddings) such as traditional word embeddings as well as contextual embeddings to gain the advantage of both. For encoding and decoding, we chose to use the models that are heavily used in sequential data modeling. The results of the proposed model achieved a competitive advantage, outperforming all previously published results of deep learning, and non-deep learning models working on the same dataset. The presented results also surpassed those of the winning systems for the same task on the same data in the Topcoder.com competition. Since this approach does not rely on any external resources or handcrafted features, the model can be easily extended to other Arabic NER domains.
546 ## - Language Note
Language Note Text in English, abstracts in English.
650 #4 - Subject
Subject Informatics-IFM
655 #7 - Index Term-Genre/Form
Source of term NULIB
focus term Dissertation, Academic
690 ## - Subject
School Informatics-IFM
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme Dewey Decimal Classification
Koha item type Thesis
650 #4 - Subject
-- 266
655 #7 - Index Term-Genre/Form
-- 187
690 ## - Subject
-- 266
Holdings
Withdrawn status Lost status Source of classification or shelving scheme Damaged status Not for loan Home library Current library Date acquired Total Checkouts Full call number Date last seen Price effective from Koha item type
    Dewey Decimal Classification     Main library Main library 08/31/2021   610 / A.Y.M / 2021 08/31/2021 08/31/2021 Thesis