Low latency and tight resources viseme recognition from speech using an artificial neural network

Nathan Souviraà-Labastie 1 Frédéric Bimbot 1
1 METISS - Speech and sound data modeling and processing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : We present a speech driven real-time viseme recognition system based on Artificial Neural Network (ANN). A Multi-Layer Perceptron (MLP) is used to provide a light and responsive framework, adapted to the final application (i.e., the animation of the lips of an avatar on multi-task platforms with embedded resources and latency constraints). Several improvements of this system are studied such as data selection, network size, training set size, or choice of the best acoustic unit to recognize. All variants are compared to a baseline system, and the combined improvements achieve a recognition rate of 64.3% for a set of 18 visemes and 70.8% for 9 visemes. We then propose a tradeoff system between the recognition performance, the resource requirements and the latency constraints. A scalable method is also described.
Document type :
Reports
Liste complète des métadonnées

Cited literature [8 references]  Display  Hide  Download

https://hal.inria.fr/hal-00848629
Contributor : Nathan Souviraà-Labastie <>
Submitted on : Friday, July 26, 2013 - 4:30:08 PM
Last modification on : Friday, November 16, 2018 - 1:25:15 AM
Document(s) archivé(s) le : Sunday, October 27, 2013 - 3:20:11 AM

File

RR-8338.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00848629, version 1

Citation

Nathan Souviraà-Labastie, Frédéric Bimbot. Low latency and tight resources viseme recognition from speech using an artificial neural network. [Research Report] RR-8338, INRIA. 2013. 〈hal-00848629〉

Share

Metrics

Record views

509

Files downloads

221