Skip to Main content Skip to Navigation

Low latency and tight resources viseme recognition from speech using an artificial neural network

Nathan Souviraà-Labastie 1 Frédéric Bimbot 1 
1 METISS - Speech and sound data modeling and processing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : We present a speech driven real-time viseme recognition system based on Artificial Neural Network (ANN). A Multi-Layer Perceptron (MLP) is used to provide a light and responsive framework, adapted to the final application (i.e., the animation of the lips of an avatar on multi-task platforms with embedded resources and latency constraints). Several improvements of this system are studied such as data selection, network size, training set size, or choice of the best acoustic unit to recognize. All variants are compared to a baseline system, and the combined improvements achieve a recognition rate of 64.3% for a set of 18 visemes and 70.8% for 9 visemes. We then propose a tradeoff system between the recognition performance, the resource requirements and the latency constraints. A scalable method is also described.
Document type :
Complete list of metadata

Cited literature [8 references]  Display  Hide  Download
Contributor : Nathan Souviraà-Labastie Connect in order to contact the contributor
Submitted on : Friday, July 26, 2013 - 4:30:08 PM
Last modification on : Friday, February 4, 2022 - 3:24:10 AM
Long-term archiving on: : Sunday, October 27, 2013 - 3:20:11 AM


Files produced by the author(s)


  • HAL Id : hal-00848629, version 1


Nathan Souviraà-Labastie, Frédéric Bimbot. Low latency and tight resources viseme recognition from speech using an artificial neural network. [Research Report] RR-8338, INRIA. 2013. ⟨hal-00848629⟩



Record views


Files downloads