Bi-directional Recurrent End-to-End Neural Network Classifier for Spoken Arab Digit Recognition

Abstract : —Automatic Speech Recognition can be considered as a transcription of spoken utterances into text which can be used to monitor/command a specific system. In this paper, we propose a general end-to-end approach to sequence learning that uses Long Short-Term Memory (LSTM) to deal with the non-uniform sequence length of the speech utterances. The neural architecture can recognize the Arabic spoken digit spelling of an isolated Arabic word using a classification methodology, with the aim to enable natural human-machine interaction. The proposed system consists to, first, extract the relevant features from the input speech signal using Mel Frequency Cepstral Coefficients (MFCC) and then these features are processed by a deep neural network able to deal with the non uniformity of the sequences length. A recurrent LSTM or GRU architecture is used to encode sequences of MFCC features as a fixed size vector that will feed a multilayer perceptron network to perform the classification. The whole neural network classifier is trained in an end-to-end manner. The proposed system outperforms by a large gap the previous published results on the same database.
Type de document :
Communication dans un congrès
ICNSLP 2018 - 2nd International Conference on Natural Language and Speech Processing, Apr 2018, Algier, Algeria. IEEE, pp.1-6, 〈10.1109/ICNLSP.2018.8374374〉
Liste complète des métadonnées

Littérature citée [20 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01835440
Contributeur : Christian Raymond <>
Soumis le : mercredi 11 juillet 2018 - 13:41:06
Dernière modification le : mercredi 24 octobre 2018 - 14:22:13
Document(s) archivé(s) le : samedi 13 octobre 2018 - 01:25:09

Fichier

ICNLSP2018.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Naima Zerari, Samir Abdelhamid, Hassen Bouzgou, Christian Raymond. Bi-directional Recurrent End-to-End Neural Network Classifier for Spoken Arab Digit Recognition. ICNSLP 2018 - 2nd International Conference on Natural Language and Speech Processing, Apr 2018, Algier, Algeria. IEEE, pp.1-6, 〈10.1109/ICNLSP.2018.8374374〉. 〈hal-01835440〉

Partager

Métriques

Consultations de la notice

117

Téléchargements de fichiers

36