Speech emotion recognition system/ (Record no. 9781)

MARC details
000 -LEADER
fixed length control field 09322nam a22002537a 4500
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 201210b2022 a|||f bm|| 00| 0 eng d
040 ## - CATALOGING SOURCE
Original cataloging agency EG-CaNU
Transcribing agency EG-CaNU
041 0# - Language Code
Language code of text eng
Language code of abstract eng
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number 610
100 0# - MAIN ENTRY--PERSONAL NAME
Personal name Mai Mohamed Magdy Abd ElSalam El Seknedy
245 1# - TITLE STATEMENT
Title Speech emotion recognition system/
Statement of responsibility, etc. Mai Mohamed Magdy Abd ElSalam El Seknedy
260 ## - PUBLICATION, DISTRIBUTION, ETC.
Date of publication, distribution, etc. 2022
300 ## - PHYSICAL DESCRIPTION
Extent 135 p.
Other physical details ill.
Dimensions 21 cm.
500 ## - GENERAL NOTE
Materials specified Supervisor: <br/>Sahar Ali Fawzi
502 ## - Dissertation Note
Dissertation type Thesis (M.A.)—Nile University, Egypt, 2022 .
504 ## - Bibliography
Bibliography "Includes bibliographical references"
505 0# - Contents
Formatted contents note Contents:<br/>TABLE OF CONTENTS PAGE<br/>Dedication ..................................................................................................................... v<br/>Acknowledgments....................................................................................................... vi<br/>List of Tables ............................................................................................................ viii<br/>List of Figures ............................................................................................................... x<br/>List of Abbrevations ................................................................................................ xiv<br/>Abstract ...................................................................................................................... xvi<br/>Chapter 1: Introduction ................................................................................................. 1<br/>1.1 Motivation …………………………………………………………….…..1<br/>1.2 Problem Definition ………………………………………………….….. 2<br/>1.3 Research Objective …………………………………………………..... 4<br/>1.4 Research Structure. ……………………………………………………..... 5<br/>Chapter 2: Background and Literature review ...............................................................6<br/>2.1 Chapter overview…………………………………………….……….… 6<br/>2.2 Emotions ………………………….……………………………………. 7<br/>2.2.1 Discrete emotional model …………………………………….. 7<br/>2.2.2 Contanous emotional model ………………………………….. 8<br/>2.3 Features………………………………………………………….…....... 9<br/>2.3.1 Acoustic feature types …………………………………….…..7<br/>2.3.2 Feature selection techniques ………………………………....15<br/>2.3.3 Feature Normalization ……………..………………………...16<br/>2.4 Datasets ………………………………...................................................16<br/>2.5 Model classification ……………………………….................................20<br/>2.6 Literature Review……………………………….....................................28<br/>Chapter 3: Materials and Methods .............................................................................. 33<br/>3.1 SER System Architecture ………………………………………...…… 33<br/>3.2 Datasets………………………………………………………………. .34<br/>3.2.1 Datasets overview…………………………………………… .34<br/>3.2.2 Arabic Survey ……………………………………………...35<br/>viii<br/>viii<br/>3.3 Feature sets………………………………………………………………. 33<br/>3.3.1 Feature Scaling (Data preprocessing)………………………42<br/>3.3.2 Feature Importance ………………………………………...43<br/>3.4 Classifers……………………………………………………………….…. 47<br/>3.4.1 Model’s hyper-parameters …………………………..…….42<br/>3.4.2 Evaluation metrics …………………………………….…...43<br/>3.5 Experimentation Tools………………………………………………….…. 53<br/>Chapter 4: Results and Discussions ............................................................................ 56<br/>4.1 Single corpus SER ………………………………………………………... 56<br/>4.1.1 Arabic corpus SER………………………………….…56<br/>4.1.1.1 Arabic corpus survey………………….…....57<br/>4.1.1.2 Arabic corpus SER Results…………………64<br/>4.1.2 Urdu corpus SER…………………………………..…..76<br/>4.1.3 English corpus SER……………………………………79<br/>4.1.4 German corpus SER……………………………………80<br/>4.1.5 French corpus SER…………………………………….82<br/>4.1.6 Baseline single corpus SER…………………………....84<br/>4.2 Cross corpus SER ……………………………………….…... ……………..85<br/>4.2.1 Latin based cross-corpus SER………………………….85<br/>4.2.2 Arabic vs Urdu cross-corpus SER…………………......88<br/>4.2.3 Cross-corpus SER –including five languages…….........90<br/>4.3 SER computational performance evaluation……………………………….. 93<br/>4.3.1 Classifiers computational performance ………………..94<br/>4.3.2 Features sets computational performance ……..………..95<br/>Chapter 5: Conclusion and Future work ..................................................................... 96<br/>References ................................................................................................................... 98<br/>Appendix A: Publications ........................................................................................ 107<br/>Appendix B: Datasets samples................................................................................. 108<br/>Appendix C: Experimental results – Extra Diagrams ……………………………….
520 3# - Abstract
Abstract Abstract:<br/>Nowadays, the Speech Emotion Recognition (SER) system is considered one of the most important applications of human-computer interaction. It creates a new means of communication between humans and machines through interpreting the speech signal and extracting the emotional content. The speech emotion recognition system has proved to have a very crucial part of our daily life applications as in call-centers, e-learning, medical therapies such as physiological diseases analysis and autonomous driver emotion detection. Although the great evolution of technology and wide research scope in that area, there is still a gap between the SER research applications and the ones needed on the everyday life applications. Most research focuses on new methodologies such as Deep learning models or new feature extraction techniques without giving more focus on system computational performance in real-time data that is suitable for commercial applications. Furthermore, there’s still a vague question that’s mostly addressed in research “what is the best speech feature set to be used to achieve the SER best performance?”. So far there’s no precise generic featureset to be used for the best performance. In addition, most of the existing research focuses on the performance of SER in a single corpus domain where the model is trained and tested on the same language. On the other hand, cross-corpus is still an ongoing challenge, as few studies have addressed cross-corpus emotion recognition. The main motivation in this work is to investigate the best featureset with high performance and low computational cost compared to benchmarked “Interspeech 2009 – 2010” featuresets. Two new feature sets were developed from a combination of spectral and prosodic features that is experimented and tested on a cross-corpus domain showing outperformed performance compared to other featuresets when experimented on the same datasets. The proposed SER system has been successfully experimented through the use of 5 datasets in 5 different languages (English, German, French, Arabic and URDU): Radvess, Cafe, Emodb, EYASE and Urdu datasets, respectively..<br/>Furthermore, this research addressed the introduction of the Arabic language in SER systems due to their scarcity in the research domain. Studying its performance in the cross-corpus domain with Latin-based and Urdu languages was very promising. This research studied the performance of SER system using different models: Multi-Layer Perceptron, Support Vector Machine, Random Forest, Logistic Regression and Ensemble Learning using Majority voting. Results were analyzed and findings of the most convenient classifier to each language were concluded. Enhancement of performance compared to previous work of 16% in Urdu, 6.25% in English, 9.36% in German and 13.42% in French SER systems were achieved. Furthermore, featureset-2 showed very promising results compared to benchmarked Interspeech feature sets for both recognition rates and computational time. Cross-corpus showed results close to the baseline single corpus SER where in Arabic, an enhancement of 2.73% was achieved. Enhancements were achieved in Urdu and Arabic languages in cross-corpus domain compared to previous work.
546 ## - Language Note
Language Note Text in English, abstracts in English.
650 #4 - Subject
Subject Informatics-IFM
655 #7 - Index Term-Genre/Form
Source of term NULIB
focus term Dissertation, Academic
690 ## - Subject
School Informatics-IFM
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme Dewey Decimal Classification
Koha item type Thesis
650 #4 - Subject
-- 266
655 #7 - Index Term-Genre/Form
-- 187
690 ## - Subject
-- 266
Holdings
Withdrawn status Lost status Source of classification or shelving scheme Damaged status Not for loan Home library Current library Date acquired Total Checkouts Full call number Date last seen Price effective from Koha item type
    Dewey Decimal Classification     Main library Main library 09/14/2022   610 /M.S.S/ 2022 09/14/2022 09/14/2022 Thesis