Normal view MARC view ISBD view

A Multi-Embeddings Approach Coupled with Deep Learning for Arabic Named Entity Recognition / (Record no. 9080)

MARC details
000 -LEADER
fixed length control field	10703nam a22002537a 4500
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field	210831s2021 \|\|\|\|f mb\|\| 00\| 0 eng d
040 ## - CATALOGING SOURCE
Original cataloging agency	EG-CaNU
Transcribing agency	EG-CaNU
041 0# - Language Code
Language code of text	eng
Language code of abstract	eng
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number	610
100 0# - MAIN ENTRY--PERSONAL NAME
Personal name	Abeer Youssef Mohamed
245 1# - TITLE STATEMENT
Title	A Multi-Embeddings Approach Coupled with Deep Learning for Arabic Named Entity Recognition /
Statement of responsibility, etc.	Abeer Youssef Mohamed
260 ## - PUBLICATION, DISTRIBUTION, ETC.
Date of publication, distribution, etc.	2021
300 ## - PHYSICAL DESCRIPTION
Extent	89 p.
Other physical details	ill.
Dimensions	21 cm.
500 ## - GENERAL NOTE
General note	Supervisor: Samhaa El-Beltagy
502 ## - Dissertation Note
Dissertation type	Thesis (M.A.)—Nile University, Egypt, 2021 .
504 ## - Bibliography
Bibliography	"Includes bibliographical references"
505 0# - Contents
Formatted contents note	Contents:<br/>ABSTRACT ................................................................................................................................ I<br/>DEDICATION ........................................................................................................................... II<br/>ACKNOWLEDGEMENTS..................................................................................................... III<br/>PUBLICATION RELATED TO THIS WORK .................................................................... IV<br/>LIST OF TABLES ................................................................................................................. VII<br/>LIST OF FIGURES .............................................................................................................. VIII<br/>LIST OF ABBREVIATIONS .................................................................................................. X<br/>CHAPTER 1 INTRODUCTION .............................................................................................. 1<br/>1.1 PROBLEM DEFINITION ......................................................................................................... 1<br/>1.2 OBJECTIVES ......................................................................................................................... 2<br/>1.3 MOTIVATION ....................................................................................................................... 3<br/>1.4 CONTRIBUTION .................................................................................................................... 3<br/>1.5 THESIS ORGANIZATION ....................................................................................................... 3<br/>CHAPTER 2 BACKGROUND ................................................................................................. 5<br/>2.1 ARABIC DATASETS .............................................................................................................. 5<br/>2.2 ARTIFICIAL NEURAL NETWORKS (ANN) ............................................................................ 6<br/>2.2.1 Recurrent Neural Networks (RNN) ............................................................................. 7<br/>2.3 CONDITIONAL RANDOM FIELD (CRF) ................................................................................. 9<br/>2.4 WORD EMBEDDINGS ......................................................................................................... 10<br/>2.4.1 Static Embeddings ..................................................................................................... 11<br/>2.4.2 Contextual Embeddings ............................................................................................ 12<br/>CHAPTER 3 LITERATURE REVIEW ON NAMED ENTITY RECOGNITION .......... 14<br/>3.1 PREAMBLE ......................................................................................................................... 14<br/>3.2 RULE-BASED APPROACHES FOR NER USING GAZEETTER AND POS .................................. 14<br/>3.3 MACHINE LEARNING APPROACHES FOR NER ................................................................... 18<br/>3.3.1 A Machine Learning Approach using Brown Clustering Technique ........................ 18<br/>3.3.2 Machine Learning Approach using Bayesian Classifier Combination .................... 19<br/>3.4 HYBRID APPROACHES FOR NER ....................................................................................... 20<br/>3.4.1 A Hybrid Approach using Naïve Bayes Classifier and Dictionary ........................... 20<br/>3.4.2 A Hybrid Approach using Decision Tree and POS ................................................... 21<br/>3.5 DEEP LEARNING APPROACHES FOR NER .......................................................................... 22<br/>3.5.1 A Deep Learning Approaches using Conditional Random Field.............................. 23<br/>3.5.2 A Deep Learning Approaches using Attention Mechanism ...................................... 28<br/>3.5.3 A Deep Learning Approach using Pooled Contextualized Embeddings ................... 32<br/>vi<br/>3.5.4 A Deep Learning Approach using Semi-Supervised Co-Training (SVM and BiLSTM-CRF) .................................................................................................................................. 33<br/>3.6 CHAPTER SUMMARY ......................................................................................................... 36<br/>CHAPTER 4 RESEARCH METHODOLOGY .................................................................... 37<br/>4.1 PREAMBLE ......................................................................................................................... 37<br/>4.2 DATASET ........................................................................................................................... 37<br/>4.3 PROPOSED MODEL ............................................................................................................ 39<br/>4.3.1 Multi-Embeddings ..................................................................................................... 41<br/>4.3.2 Encoding ................................................................................................................... 44<br/>4.3.3 Decoding ................................................................................................................... 44<br/>4.4 TESTING OF OTHER MULTI-EMBEDDINGS MODELS ........................................................... 45<br/>4.4.1 Pooled Contextual Embedding with Word2Vec ........................................................ 45<br/>4.4.2 Pooled Contextual Embedding with fastText ............................................................ 47<br/>4.4.3 Pooled Contextual Embedding with Multilingual BERT .......................................... 49<br/>4.4.4 Pooled Contextual Embedding with Arabic BERT ................................................... 50<br/>4.4.5 Pooled Contextual Embedding with fastText and Word2Vec ................................... 51<br/>4.4.6 Pooled Contextual Embedding with Arabic BERT, and Word2Vec ......................... 53<br/>4.4.7 Pooled Contextual Embedding with XLNet, and fastText ......................................... 54<br/>4.4.8 Pooled Contextual Embedding with XLM-RoBERTa, and fastText .......................... 55<br/>4.4.9 Pooled Contextual Embedding with Arabic BERT, and fastText .............................. 56<br/>CHAPTER 5 EVALUATION ................................................................................................. 58<br/>5.1 PREAMBLE ...................................................................................................................... 58<br/>5.2 EXPERIMENTATION SETUP AND IMPLEMENTATION............................................................ 58<br/>5.3 QUALITY METRICS ............................................................................................................ 58<br/>5.4 RESULTS AND DISCUSSION ................................................................................................ 59<br/>5.5 ERROR ANALYSIS .............................................................................................................. 64<br/>5.6 CHAPTER SUMMARY ......................................................................................................... 65<br/>CHAPTER 6 CONCLUSION AND FUTURE WORK ........................................................ 67<br/>6.1 CONCLUSION ..................................................................................................................... 67<br/>6.2 FUTURE WORK .................................................................................................................. 67<br/>REFERENCES .........................................................................................................................
520 3# - Abstract
Abstract	Abstract:<br/>Named Entity Recognition (NER) is an essential task in many natural language processing applications. Extracting crucial information using NER is a primary phase in most NLP downstream tasks, it is used to identify entities into predefined classes. There are several studies that have focused on NER for the English language. However, there are some limitations when applying the current methodologies directly to the Arabic language text. Recent studies have shown the effectiveness of contextual embedding representations and their significant improvements that were shown in English NER tasks.<br/>The aim of this thesis is to examine the hypotheses, that a multi-embeddings approach coupled with a deep learning network is capable of extracting named entities from Arabic sentences and thereby reduces the manual costs. Also, to examine the hypothesis, that the end-to-end deep learning NER approach is capable to outperform the traditional models based on rule-based or machine learning approaches.<br/>To evaluate these hypotheses, an end-to-end deep learning model has been presented which was able to extract Arabic Named Entities and eventually reduce human efforts such as preprocessing work and feature engineering. To reach the best model, different types of classical and contextual embeddings have been combined and evaluated. Thus, the presented model utilized a combination of several types of embeddings (multi-embeddings) such as traditional word embeddings as well as contextual embeddings to gain the advantage of both. For encoding and decoding, we chose to use the models that are heavily used in sequential data modeling. The results of the proposed model achieved a competitive advantage, outperforming all previously published results of deep learning, and non-deep learning models working on the same dataset. The presented results also surpassed those of the winning systems for the same task on the same data in the Topcoder.com competition. Since this approach does not rely on any external resources or handcrafted features, the model can be easily extended to other Arabic NER domains.
546 ## - Language Note
Language Note	Text in English, abstracts in English.
650 #4 - Subject
Subject	Informatics-IFM
655 #7 - Index Term-Genre/Form
Source of term	NULIB
focus term	Dissertation, Academic
690 ## - Subject
School	Informatics-IFM
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme	Dewey Decimal Classification
Koha item type	Thesis
650 #4 - Subject
--	266
655 #7 - Index Term-Genre/Form
--	187
690 ## - Subject
--	266

Holdings
Withdrawn status	Lost status	Source of classification or shelving scheme	Damaged status	Not for loan	Home library	Current library	Date acquired	Total Checkouts	Full call number	Date last seen	Price effective from	Koha item type
		Dewey Decimal Classification			Main library	Main library	08/31/2021		610 / A.Y.M / 2021	08/31/2021	08/31/2021	Thesis