Bi-directional Recurrent End-to-End Neural Network Classifier for Spoken Arab Digit Recognition - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Bi-directional Recurrent End-to-End Neural Network Classifier for Spoken Arab Digit Recognition

Résumé

—Automatic Speech Recognition can be considered as a transcription of spoken utterances into text which can be used to monitor/command a specific system. In this paper, we propose a general end-to-end approach to sequence learning that uses Long Short-Term Memory (LSTM) to deal with the non-uniform sequence length of the speech utterances. The neural architecture can recognize the Arabic spoken digit spelling of an isolated Arabic word using a classification methodology, with the aim to enable natural human-machine interaction. The proposed system consists to, first, extract the relevant features from the input speech signal using Mel Frequency Cepstral Coefficients (MFCC) and then these features are processed by a deep neural network able to deal with the non uniformity of the sequences length. A recurrent LSTM or GRU architecture is used to encode sequences of MFCC features as a fixed size vector that will feed a multilayer perceptron network to perform the classification. The whole neural network classifier is trained in an end-to-end manner. The proposed system outperforms by a large gap the previous published results on the same database.
Fichier principal
Vignette du fichier
ICNLSP2018.pdf (215.37 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01835440 , version 1 (11-07-2018)

Identifiants

Citer

Naima Zerari, Samir Abdelhamid, Hassen Bouzgou, Christian Raymond. Bi-directional Recurrent End-to-End Neural Network Classifier for Spoken Arab Digit Recognition. ICNSLP 2018 - 2nd International Conference on Natural Language and Speech Processing, Apr 2018, Algier, Algeria. pp.1-6, ⟨10.1109/ICNLSP.2018.8374374⟩. ⟨hal-01835440⟩
142 Consultations
755 Téléchargements

Altmetric

Partager

Gmail Mastodon Facebook X LinkedIn More