Feature selection in affective speech classification

Abstract

The increasing role of spoken language interfaces in human-computer interaction applications has created conditions to facilitate a new area of research – namely recognizing the emotional state of the speaker through speech signals. This paper proposes a text independent method for emotion classification of speech signals used for the recognition of the emotional state of the speaker. Different feature selection criteria are explored and analyzed, namely Mutual Information Maximization (MIM) feature scoring criterion and its derivatives, to measure how potentially useful a feature or feature subset may be when used in a classifier. The proposed method employs different groups of low-level features, such as energy, zero-crossing rate, frequency bands in Mel scale, fundamental frequency or pitch, the delta- and delta-delta regression and statistical functions such as regression coefficients, extremums, moments etc., to represent the speech signals and a Neural Network classifier for the classification task. For the experiments the EMO-DB dataset is used with seven primary emotions including neutral. Results show that the proposed system yields an average accuracy of over 85% for recognizing 7 emotions with 5 of the best performing feature selection algorithms.

Authors

Anguel Manolov
Ognian Boumbarov
Agata Manolova
Vladimir Poulkov
Krasimir Tonchev

Venue

International Conference on Telecommunications and Signal Processing (TSP), 2017.

Links

https://ieeexplore.ieee.org/document/8076004