Expression-preserving face frontalization improves visually assisted speech processing

Zhiqi Kang; Mostafa Sadeghi; Radu Horaud; Xavier Alameda-Pineda

doi:10.1007/s11263-022-01742-1

Journal Articles International Journal of Computer Vision Year : 2023

Expression-preserving face frontalization improves visually assisted speech processing

(1) , (2) , (3) , (3)

1
2
3

Zhiqi Kang

Function : Author

Apprentissage de modèles à partir de données massives

Mostafa Sadeghi

Function : Author

Speech Modeling for Facilitating Oral-Based Communication

Radu Horaud

Function : Author
PersonId : 16183
IdHAL : radu-horaud
ORCID : 0000-0001-5232-024X
IdRef : 032302495

Vers des robots à l’intelligence sociale au travers de l’apprentissage, de la perception et de la commande

Xavier Alameda-Pineda

Function : Author
PersonId : 16186
IdHAL : xavier-alameda-pineda
ORCID : 0000-0002-5354-1084
IdRef : 18450919X

Vers des robots à l’intelligence sociale au travers de l’apprentissage, de la perception et de la commande

Abstract

Face frontalization consists of synthesizing a frontally-viewed face from an arbitrarily-viewed one. The main contribution of this paper is a frontalization methodology that preserves non-rigid facial deformations in order to boost the performance of visually assisted speech communication. The method alternates between the estimation of (i) the rigid transformation (scale, rotation, and translation) and (ii) the non-rigid deformation between an arbitrarily-viewed face and a face model. The method has two important merits: it can deal with non-Gaussian errors in the data and it incorporates a dynamical face deformation model. For that purpose, we use the generalized Student t-distribution in combination with a linear dynamic system in order to account for both rigid head motions and time-varying facial deformations caused by speech production. We propose to use the zero-mean normalized cross-correlation (ZNCC) score to evaluate the ability of the method to preserve facial expressions. The method is thoroughly evaluated and compared with several state of the art methods, either based on traditional geometric models or on deep learning. Moreover, we show that the method, when incorporated into deep learning pipelines, namely lip reading and speech enhancement, improves word recognition and speech intelligibilty scores by a considerable margin. Supplemental material is accessible at https://team.inria.fr/robotlearn/research/facefrontalization/

Keywords

Face frontalization Student’s t-distribution Robust point registration Bayesian filtering Lip reading Audio-visual speech enhancement Variational auto-encoders

Domains

Computer Vision and Pattern Recognition [cs.CV] Artificial Intelligence [cs.AI] Machine Learning [cs.LG] Sound [cs.SD]

Fichier principal

Kang-IJCV2022.pdf (8 Mo)

Origin : Publisher files allowed on an open archive

Radu Horaud : Connect in order to contact the contributor

https://hal.science/hal-03902610

Submitted on : Thursday, January 12, 2023-10:26:02 AM

Last modification on : Saturday, April 27, 2024-3:13:18 AM

Dates and versions

hal-03902610 , version 1 (16-12-2022)

hal-03902610 , version 2 (12-01-2023)

Identifiers

HAL Id : hal-03902610 , version 2
ARXIV : 2204.02810
DOI : 10.1007/s11263-022-01742-1

Cite

Zhiqi Kang, Mostafa Sadeghi, Radu Horaud, Xavier Alameda-Pineda. Expression-preserving face frontalization improves visually assisted speech processing. International Journal of Computer Vision, 2023, 131 (5), pp.1122-1140. ⟨10.1007/s11263-022-01742-1⟩. ⟨hal-03902610v2⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA INSMI LJK LJK_GI UNIV-LORRAINE INRIA2 LJK-GI-THOTH LORIA LORIA-NLPKD MIAI ANR LJK-GI-ROBOTLEARN

131 View

42 Download

Expression-preserving face frontalization improves visually assisted speech processing

Abstract

Keywords

Domains

Dates and versions

Identifiers

Cite

Export

Collections

Altmetric

Share