Normal view MARC view ISBD view

Speech emotion recognition system/ (Record no. 9781)

MARC details
000 -LEADER
fixed length control field	09322nam a22002537a 4500
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field	201210b2022 a\|\|\|f bm\|\| 00\| 0 eng d
040 ## - CATALOGING SOURCE
Original cataloging agency	EG-CaNU
Transcribing agency	EG-CaNU
041 0# - Language Code
Language code of text	eng
Language code of abstract	eng
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number	610
100 0# - MAIN ENTRY--PERSONAL NAME
Personal name	Mai Mohamed Magdy Abd ElSalam El Seknedy
245 1# - TITLE STATEMENT
Title	Speech emotion recognition system/
Statement of responsibility, etc.	Mai Mohamed Magdy Abd ElSalam El Seknedy
260 ## - PUBLICATION, DISTRIBUTION, ETC.
Date of publication, distribution, etc.	2022
300 ## - PHYSICAL DESCRIPTION
Extent	135 p.
Other physical details	ill.
Dimensions	21 cm.
500 ## - GENERAL NOTE
Materials specified	Supervisor: <br/>Sahar Ali Fawzi
502 ## - Dissertation Note
Dissertation type	Thesis (M.A.)—Nile University, Egypt, 2022 .
504 ## - Bibliography
Bibliography	"Includes bibliographical references"
505 0# - Contents
Formatted contents note	Contents:<br/>TABLE OF CONTENTS PAGE<br/>Dedication ..................................................................................................................... v<br/>Acknowledgments....................................................................................................... vi<br/>List of Tables ............................................................................................................ viii<br/>List of Figures ............................................................................................................... x<br/>List of Abbrevations ................................................................................................ xiv<br/>Abstract ...................................................................................................................... xvi<br/>Chapter 1: Introduction ................................................................................................. 1<br/>1.1 Motivation …………………………………………………………….…..1<br/>1.2 Problem Definition ………………………………………………….….. 2<br/>1.3 Research Objective …………………………………………………..... 4<br/>1.4 Research Structure. ……………………………………………………..... 5<br/>Chapter 2: Background and Literature review ...............................................................6<br/>2.1 Chapter overview…………………………………………….……….… 6<br/>2.2 Emotions ………………………….……………………………………. 7<br/>2.2.1 Discrete emotional model …………………………………….. 7<br/>2.2.2 Contanous emotional model ………………………………….. 8<br/>2.3 Features………………………………………………………….…....... 9<br/>2.3.1 Acoustic feature types …………………………………….…..7<br/>2.3.2 Feature selection techniques ………………………………....15<br/>2.3.3 Feature Normalization ……………..………………………...16<br/>2.4 Datasets ………………………………...................................................16<br/>2.5 Model classification ……………………………….................................20<br/>2.6 Literature Review……………………………….....................................28<br/>Chapter 3: Materials and Methods .............................................................................. 33<br/>3.1 SER System Architecture ………………………………………...…… 33<br/>3.2 Datasets………………………………………………………………. .34<br/>3.2.1 Datasets overview…………………………………………… .34<br/>3.2.2 Arabic Survey ……………………………………………...35<br/>viii<br/>viii<br/>3.3 Feature sets………………………………………………………………. 33<br/>3.3.1 Feature Scaling (Data preprocessing)………………………42<br/>3.3.2 Feature Importance ………………………………………...43<br/>3.4 Classifers……………………………………………………………….…. 47<br/>3.4.1 Model’s hyper-parameters …………………………..…….42<br/>3.4.2 Evaluation metrics …………………………………….…...43<br/>3.5 Experimentation Tools………………………………………………….…. 53<br/>Chapter 4: Results and Discussions ............................................................................ 56<br/>4.1 Single corpus SER ………………………………………………………... 56<br/>4.1.1 Arabic corpus SER………………………………….…56<br/>4.1.1.1 Arabic corpus survey………………….…....57<br/>4.1.1.2 Arabic corpus SER Results…………………64<br/>4.1.2 Urdu corpus SER…………………………………..…..76<br/>4.1.3 English corpus SER……………………………………79<br/>4.1.4 German corpus SER……………………………………80<br/>4.1.5 French corpus SER…………………………………….82<br/>4.1.6 Baseline single corpus SER…………………………....84<br/>4.2 Cross corpus SER ……………………………………….…... ……………..85<br/>4.2.1 Latin based cross-corpus SER………………………….85<br/>4.2.2 Arabic vs Urdu cross-corpus SER…………………......88<br/>4.2.3 Cross-corpus SER –including five languages…….........90<br/>4.3 SER computational performance evaluation……………………………….. 93<br/>4.3.1 Classifiers computational performance ………………..94<br/>4.3.2 Features sets computational performance ……..………..95<br/>Chapter 5: Conclusion and Future work ..................................................................... 96<br/>References ................................................................................................................... 98<br/>Appendix A: Publications ........................................................................................ 107<br/>Appendix B: Datasets samples................................................................................. 108<br/>Appendix C: Experimental results – Extra Diagrams ……………………………….
520 3# - Abstract
Abstract	Abstract:<br/>Nowadays, the Speech Emotion Recognition (SER) system is considered one of the most important applications of human-computer interaction. It creates a new means of communication between humans and machines through interpreting the speech signal and extracting the emotional content. The speech emotion recognition system has proved to have a very crucial part of our daily life applications as in call-centers, e-learning, medical therapies such as physiological diseases analysis and autonomous driver emotion detection. Although the great evolution of technology and wide research scope in that area, there is still a gap between the SER research applications and the ones needed on the everyday life applications. Most research focuses on new methodologies such as Deep learning models or new feature extraction techniques without giving more focus on system computational performance in real-time data that is suitable for commercial applications. Furthermore, there’s still a vague question that’s mostly addressed in research “what is the best speech feature set to be used to achieve the SER best performance?”. So far there’s no precise generic featureset to be used for the best performance. In addition, most of the existing research focuses on the performance of SER in a single corpus domain where the model is trained and tested on the same language. On the other hand, cross-corpus is still an ongoing challenge, as few studies have addressed cross-corpus emotion recognition. The main motivation in this work is to investigate the best featureset with high performance and low computational cost compared to benchmarked “Interspeech 2009 – 2010” featuresets. Two new feature sets were developed from a combination of spectral and prosodic features that is experimented and tested on a cross-corpus domain showing outperformed performance compared to other featuresets when experimented on the same datasets. The proposed SER system has been successfully experimented through the use of 5 datasets in 5 different languages (English, German, French, Arabic and URDU): Radvess, Cafe, Emodb, EYASE and Urdu datasets, respectively..<br/>Furthermore, this research addressed the introduction of the Arabic language in SER systems due to their scarcity in the research domain. Studying its performance in the cross-corpus domain with Latin-based and Urdu languages was very promising. This research studied the performance of SER system using different models: Multi-Layer Perceptron, Support Vector Machine, Random Forest, Logistic Regression and Ensemble Learning using Majority voting. Results were analyzed and findings of the most convenient classifier to each language were concluded. Enhancement of performance compared to previous work of 16% in Urdu, 6.25% in English, 9.36% in German and 13.42% in French SER systems were achieved. Furthermore, featureset-2 showed very promising results compared to benchmarked Interspeech feature sets for both recognition rates and computational time. Cross-corpus showed results close to the baseline single corpus SER where in Arabic, an enhancement of 2.73% was achieved. Enhancements were achieved in Urdu and Arabic languages in cross-corpus domain compared to previous work.
546 ## - Language Note
Language Note	Text in English, abstracts in English.
650 #4 - Subject
Subject	Informatics-IFM
655 #7 - Index Term-Genre/Form
Source of term	NULIB
focus term	Dissertation, Academic
690 ## - Subject
School	Informatics-IFM
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme	Dewey Decimal Classification
Koha item type	Thesis
650 #4 - Subject
--	266
655 #7 - Index Term-Genre/Form
--	187
690 ## - Subject
--	266

Holdings
Withdrawn status	Lost status	Source of classification or shelving scheme	Damaged status	Not for loan	Home library	Current library	Date acquired	Total Checkouts	Full call number	Date last seen	Price effective from	Koha item type
		Dewey Decimal Classification			Main library	Main library	09/14/2022		610 /M.S.S/ 2022	09/14/2022	09/14/2022	Thesis