Performance Comparison of Deep Learning Algorithm for Speech Emotion Recognition

I Gusti Bagus Arya Pradnja Paramitha; Hendra Budi Kusnawan; Muji Ernawati

doi:10.29303/jcosine.v6i2.443

I Gusti Bagus Arya Pradnja Paramitha Universitas Nusa Mandiri
Hendra Budi Kusnawan Universitas Nusa Mandiri Jakarta
Muji Ernawati Universitas Nusa Mandiri Jakarta

DOI: https://doi.org/10.29303/jcosine.v6i2.443

Abstract: 309 Viewers

PDF: 287 Viewers

Abstract

One of the problems in Speech emotion recognition is related to time series data, while the feedforward process in neural networks is unidirectional where the results from one layer are directly channeled to the next layer. This kind of feedforward process cannot store past data. Thus, if Deep Neural Network (DNN) is used for Speech emotion recognition, some problems arise, such as the speech rate of the speaker. DNN cannot analyze the existing acoustic patterns and so cannot map different levels of speech rate. Another method that can take input at once while retaining relevant data in the previous process is the Recurrent Neural Network (RNN). This paper presents the characteristics of the RNN method consisting of LSTM and GRU techniques for Speech emotion recognition using the Berlin EMODB dataset. The dataset is divided into 80% for training and 20% for testing. The feature extraction methods used are Zero crossing Rate (ZCR), Mel Frequency Cepstral Coefficients (MFCC), Root Mean Square Energy (RMSE), Mel Spectrogram, and Chroma. This study compares the CNN, LSTM, and GRU algorithms. The classification results show that the CNN algorithm gets better results, namely 79.13%. Meanwhile, LSTM and GRU only got an accuracy of 55.76% and 55.14%, respectively

Performance Comparison of Deep Learning Algorithm for Speech Emotion Recognition

Abstract

Information:

Quick Access:

Document Templates:

Tools: