MARC details
| 000 -LEADER |
| fixed length control field |
08793nam a22002657a 4500 |
| 008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION |
| fixed length control field |
201210b2025 a|||f bm|| 00| 0 eng d |
| 024 7# - Author Identifier |
| Standard number or code |
0009-0002-9308-0291 |
| Source of number or code |
ORCID |
| 040 ## - CATALOGING SOURCE |
| Original cataloging agency |
EG-CaNU |
| Transcribing agency |
EG-CaNU |
| 041 0# - Language Code |
| Language code of text |
eng |
| Language code of abstract |
eng |
| -- |
ara |
| 082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER |
| Classification number |
610 |
| 100 0# - MAIN ENTRY--PERSONAL NAME |
| Personal name |
Salma Khaled Ali Mohamed Ali |
| 245 1# - TITLE STATEMENT |
| Title |
UniTextFusion: |
| Remainder of title |
A Unified Early Fusion Framework for Arabic Multimodal Sentiment Analysis with LLMs |
| Statement of responsibility, etc. |
/Salma Khaled Ali Mohamed Ali |
| 260 ## - PUBLICATION, DISTRIBUTION, ETC. |
| Date of publication, distribution, etc. |
2025 |
| 300 ## - PHYSICAL DESCRIPTION |
| Extent |
86p. |
| Other physical details |
ill. |
| Dimensions |
21 cm. |
| 500 ## - GENERAL NOTE |
| Materials specified |
Supervisor: <br/>Dr. Walaa Medhat<br/>Dr. Ensaf Hussein Mohamed<br/> |
| 502 ## - Dissertation Note |
| Dissertation type |
Thesis (M.A.)—Nile University, Egypt, 2025 . |
| 504 ## - Bibliography |
| Bibliography |
"Includes bibliographical references" |
| 505 0# - Contents |
| Formatted contents note |
Contents:<br/>Contents<br/>Page<br/>Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV<br/>Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI<br/>List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X<br/>List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XI<br/>List of Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XII<br/>Chapters:<br/>1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br/>1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br/>1.2 Challenges in Arabic Multimodal Sentiment Analysis . . . . . . . . . . . . . . . . . 2<br/>1.3 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2<br/>1.4 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3<br/>1.4.1 Background and Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3<br/>1.4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3<br/>1.4.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3<br/>1.4.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4<br/>1.4.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 4<br/>2. Background and Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5<br/>2.1 Multi-modal Sentiment Analsysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5<br/>2.2 Multi-modal Data Fusion Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . 6<br/>2.2.1 Feature Level (early-stage) Fusion Technique . . . . . . . . . . . . . . . . . 6<br/>2.2.2 Model Level (mid-stage) Fusion Technique . . . . . . . . . . . . . . . . . . . 6<br/>2.2.3 Decision Level (late-stage) Fusion Technique . . . . . . . . . . . . . . . . . . 7<br/>2.2.4 Comparison of Fusion Techniques . . . . . . . . . . . . . . . . . . . . . . . . 7<br/>2.3 MuSA Techniques and Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 9<br/>2.3.1 Traditional Machine Learning Approaches . . . . . . . . . . . . . . . . . . . 9<br/>2.3.2 Deep Learning Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 9<br/>2.3.3 Large Language Model (LLM)-based Generative Approaches . . . . . . . . . 10<br/>3. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12<br/>3.1 Existing Multimodal Datasets for Sentiment Analysis . . . . . . . . . . . . . . . . . 12<br/>VII<br/>3.1.1 Comparison of English and Arabic Datasets . . . . . . . . . . . . . . . . . . 12<br/>3.1.2 Gaps in Existing Multimodal Datasets . . . . . . . . . . . . . . . . . . . . . 14<br/>3.2 Approaches to Multimodal Sentiment Analysis . . . . . . . . . . . . . . . . . . . . 15<br/>3.3 Research Gaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18<br/>4. Arabic Multimodal Sentiment Analysis Methodology . . . . . . . . . . . . . . . . . . . . 20<br/>4.1 Ar-MUSA: Arabic Multimodal Sentiment Analysis Dataset . . . . . . . . . . . . . . 20<br/>4.1.1 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20<br/>4.1.2 Data Preperation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20<br/>4.1.3 Data Labeling Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22<br/>4.1.4 Ethical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25<br/>4.2 Multimodal Sentiment Analysis Fusion and Models . . . . . . . . . . . . . . . . . . 25<br/>4.2.1 Pre-trained Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26<br/>4.2.2 Generative LLMs Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30<br/>4.2.3 UniText Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34<br/>5. Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46<br/>5.1 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46<br/>5.1.1 Weighted Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46<br/>5.1.2 Weighted Recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46<br/>5.1.3 Weighted F1-Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47<br/>5.1.4 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47<br/>5.2 Pre-trained Models: Setup and Results . . . . . . . . . . . . . . . . . . . . . . . . . 47<br/>5.2.1 Text Based Transformer: MarBERT Model . . . . . . . . . . . . . . . . . . 47<br/>5.2.2 Audio Based Transformer: Egyptian HuBERT . . . . . . . . . . . . . . . . . 48<br/>5.2.3 Image Based Transformer: MobileNet V2 . . . . . . . . . . . . . . . . . . . 49<br/>5.2.4 Multi-Modal Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50<br/>5.3 Generative LLMs Models: Setup and Results . . . . . . . . . . . . . . . . . . . . . 51<br/>5.3.1 Text Based LLM: Qwen2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51<br/>5.3.2 Audio Based LLM: Qwen2-Audio . . . . . . . . . . . . . . . . . . . . . . . . 51<br/>5.3.3 Image Based LLM: Qwen2-VL . . . . . . . . . . . . . . . . . . . . . . . . . 52<br/>5.3.4 Multi-modal Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52<br/>5.4 UniTect Fusion Approach: Setup and Results . . . . . . . . . . . . . . . . . . . . . 54<br/>5.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54<br/>5.4.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55<br/>5.5 Comparative Analysis of UniText Fusion and State-of-the-Art Techniques . . . . . 57<br/>6. Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60<br/>6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60<br/>6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61<br/>Appendices:<br/>A. Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62<br/>VIII<br/>Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 |
| 520 3# - Abstract |
| Abstract |
Abstract:<br/>Multimodal Sentiment Analysis (MuSA) combines text, audio, and visual inputs to detect and classify<br/>emotions. Despite its growing relevance, Arabic MuSA research is limited due to the lack of highquality annotated datasets and the complexity of Arabic language processing. This work presents<br/>Ar-MuSA, an open-source Arabic MuSA dataset containing aligned text, audio, and visual data. Unlike existing unimodal resources, Ar-MuSA supports sentiment analysis across multiple modalities.<br/>The dataset is evaluated using MarBERT (text), HuBERT (audio), MobileNet (vision), Qwen2 (multimodal), and ensemble methods. Results indicate improved performance through modality fusion;<br/>MarBERT achieved a 71% F1-score for text-only classification, while audio and image modalities performed lower individually. Fusion with text improved performance from 39% to 67%, representing an<br/>absolute gain of 28%. To further improve results, the UniTextFusion framework is proposed. It performs Early Fusion by converting audio and visual signals into text descriptions, which are combined<br/>with transcripts and used as input to large language models (LLMs). Fine-tuning Arabic-compatible<br/>LLMs—LLaMA 3.1-8B Instruct and SILMA AI 9B—using LoRA (Low-Rank Adaptation) yielded<br/>F1-scores of 68% and 71%, surpassing unimodal baselines of 34% and 41% by 34 and 30 percentage<br/>points, respectively.<br/>Keywords:<br/>Arabic Multimodal Sentiment Analysis, LoRA, Fine-tuning, Arabic MuSA Dataset, Multimodal<br/>Generative LLMs, Fusion |
| 546 ## - Language Note |
| Language Note |
Text in English, abstracts in English and Arabic |
| 650 #4 - Subject |
| Subject |
InformaticsIFM |
| 655 #7 - Index Term-Genre/Form |
| Source of term |
NULIB |
| focus term |
Dissertation, Academic |
| 690 ## - Subject |
| School |
InformaticsIFM |
| 942 ## - ADDED ENTRY ELEMENTS (KOHA) |
| Source of classification or shelving scheme |
Dewey Decimal Classification |
| Koha item type |
Thesis |
| 655 #7 - Index Term-Genre/Form |
| -- |
187 |