Research Article | Open Access
Speech Emotion Recognition Based on CNN
RAKSHA
Pages: 129-140
Abstract
The ability to naturally connect with computers has contributed to the rise in popularity of automatic speech emotion recognition. Voice analysis is a method for identifying feelings. However, there is quiet in speech that has nothing to do with expression. One technique to perform better is to get rid of the silence, while another is to ignore the stillness and pay more attention to the speech. In this research, we propose incorporating silence reduction with a caring model to improve speech emotion performance. The findings prove that combining the noise-cancelling and attention-focusing models is superior to using either one alone. In the subject of human-computer interaction, speech emotion recognition is a crucial and difficult topic. Several models and feature sets have been proposed for use in training the system. In this study, we tested extensively using convolutional neural networks that can learn from several perspectives. We evaluate the effectiveness of the system with input signals of varying durations, acoustic feature kinds, and emotive speech (improvement/written) styles. Our experimental results on the Ryerson Emotional Voice and Song Audiovisual Database (RAVDESS) demonstrate that the recognition performance is not related to the input function, but rather is dependent on the type of voice data. We have gathered the most recent findings from RAVDESS's unplanned voice data.
Keywords
CNN, Emotion Recognition, Speech Emotion