Bi-directional Recurrent End-to-End Neural Network Classifier for Spoken Arab Digit Recognition

Naima Zerari; Samir Abdelhamid; Hassen Bouzgou; Christian Raymond

doi:10.1109/ICNLSP.2018.8374374

Communication Dans Un Congrès Année : 2018

Bi-directional Recurrent End-to-End Neural Network Classifier for Spoken Arab Digit Recognition

(1) , (1) , (1) , (2)

1
2

Naima Zerari

Fonction : Auteur

Université Mustapha Ben Boulaid de Batna 2

Samir Abdelhamid

Fonction : Auteur

Université Mustapha Ben Boulaid de Batna 2

Hassen Bouzgou

Fonction : Auteur

Université Mustapha Ben Boulaid de Batna 2

Christian Raymond

Fonction : Auteur
PersonId : 1778
IdHAL : christian-raymond
IdRef : 099236486

Creating and exploiting explicit links between multimedia fragments

Résumé

—Automatic Speech Recognition can be considered as a transcription of spoken utterances into text which can be used to monitor/command a specific system. In this paper, we propose a general end-to-end approach to sequence learning that uses Long Short-Term Memory (LSTM) to deal with the non-uniform sequence length of the speech utterances. The neural architecture can recognize the Arabic spoken digit spelling of an isolated Arabic word using a classification methodology, with the aim to enable natural human-machine interaction. The proposed system consists to, first, extract the relevant features from the input speech signal using Mel Frequency Cepstral Coefficients (MFCC) and then these features are processed by a deep neural network able to deal with the non uniformity of the sequences length. A recurrent LSTM or GRU architecture is used to encode sequences of MFCC features as a fixed size vector that will feed a multilayer perceptron network to perform the classification. The whole neural network classifier is trained in an end-to-end manner. The proposed system outperforms by a large gap the previous published results on the same database.

Mots clés

Long Short-Term Memory Speech recognition Arabic digits Mel Frequency Cepstral Coefficients Auto-encoder Multilayer perceptron network

Domaines

Informatique et langage [cs.CL] Interface homme-machine [cs.HC] Réseau de neurones [cs.NE]

Fichier principal

ICNLSP2018.pdf (215.37 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Christian Raymond : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01835440

Soumis le : mercredi 11 juillet 2018-13:41:06

Dernière modification le : vendredi 24 mars 2023-14:53:07

Archivage à long terme le : samedi 13 octobre 2018-01:25:09

Dates et versions

hal-01835440 , version 1 (11-07-2018)

Identifiants

HAL Id : hal-01835440 , version 1
DOI : 10.1109/ICNLSP.2018.8374374

Citer

Naima Zerari, Samir Abdelhamid, Hassen Bouzgou, Christian Raymond. Bi-directional Recurrent End-to-End Neural Network Classifier for Spoken Arab Digit Recognition. ICNSLP 2018 - 2nd International Conference on Natural Language and Speech Processing, Apr 2018, Algier, Algeria. pp.1-6, ⟨10.1109/ICNLSP.2018.8374374⟩. ⟨hal-01835440⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA IRISA-INSA-R CENTRALESUPELEC INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

141 Consultations

747 Téléchargements

Bi-directional Recurrent End-to-End Neural Network Classifier for Spoken Arab Digit Recognition

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager