Skip to Main content Skip to Navigation

Reconnaissance de la parole pour l’aide à la communication pour les sourds et malentendants

Luiza Orosanu 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : This thesis is part of the RAPSODIE project which aims at proposing a speech recognition device specialized on the needs of deaf and hearing impaired people. Two aspects are studied: optimizing the lexical models and extracting para-lexical information. Regarding the lexical modeling, we focused on optimizing the choice of lexical units defining the vocabulary and the associated language model. We evaluated various lexical units, such as phonemes and words, and proposed the use of syllables.We also proposed a new approach based on the combination of words and syllables in a hybrid language model. This kind of model aims to ensure proper recognition of the most frequent words and to offer sequences of syllables for speech segments corresponding to out-of-vocabulary words. Another focus was on adding new words into the language model, in order to ensure proper recognition of specific words in a certain area. We proposed and evaluated a new approach based on a principle of similarity between words ; two words are similar if they have similar neighbor distributions. The approach involves three steps: using a few examples of sentences including the new word, looking for invocabulary words similar to the new word, defining the n-grams associated with the new word based on the n-grams of its similar in-vocabulary words. Regarding the extraction of para-lexical information, we focused mainly on the detection of questions and statements, in order to inform the deaf and hearing impaired people when a question is addressed to them. In our study, several approaches were analyzed using only prosodic features (extracted from the audio signal), using only linguistic features (extracted from word sequences and sequences of POS tags) or using both types of information. The evaluation of the classifiers is performed using linguistic and prosodic features (alone or in combination) extracted from automatic transcriptions (to study the performance under real conditions) and from manual transcriptions (to study the performance under ideal conditions).
Document type :
Complete list of metadata

Cited literature [145 references]  Display  Hide  Download
Contributor : Luiza Orosanu Connect in order to contact the contributor
Submitted on : Tuesday, January 5, 2016 - 4:30:11 PM
Last modification on : Saturday, October 16, 2021 - 11:26:09 AM
Long-term archiving on: : Thursday, April 7, 2016 - 3:38:25 PM


  • HAL Id : tel-01251128, version 1


Luiza Orosanu. Reconnaissance de la parole pour l’aide à la communication pour les sourds et malentendants. Traitement du signal et de l'image [eess.SP]. Université de Lorraine, 2015. Français. ⟨NNT : 2015LORR0172⟩. ⟨tel-01251128⟩



Les métriques sont temporairement indisponibles