Finding Speaker Face Region by Audiovisual Correlation - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2008

Finding Speaker Face Region by Audiovisual Correlation

Résumé

The ability to find the speaker face region in a video is important in various application areas. In this work, we develop a novel technique to find this region robustly against different views and complex backgrounds using gray images only. The main thrust of this technique is to integrate audiovisual correlation analysis into an image segmentation framework to extract the speaker face region. We first analyze the video in a time window and evaluate the audiovisual correlation locally at each pixel position using a novel statistical measure based on Quadratic Mutual Information. As only local visual information is adopted in this stage, the analysis is robust against the view change of the human face. Analyzed correlation is then incorporated into Graph Cut-based image segmentation, which optimizes an energy function defined over multiple video frames. As this process can find the global optimum segmentation with image information balanced, we thus can extract a reliable region aligned to real visual boundaries. Experimental results demonstrate the effectiveness and robustness of our method.
Fichier principal
Vignette du fichier
1569139970.pdf (437 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

inria-00326761 , version 1 (05-10-2008)

Identifiants

  • HAL Id : inria-00326761 , version 1

Citer

Yuyu Liu, Yoichi Sato. Finding Speaker Face Region by Audiovisual Correlation. Workshop on Multi-camera and Multi-modal Sensor Fusion Algorithms and Applications - M2SFA2 2008, Andrea Cavallaro and Hamid Aghajan, Oct 2008, Marseille, France. ⟨inria-00326761⟩

Collections

M2SFA2
101 Consultations
177 Téléchargements

Partager

Gmail Facebook X LinkedIn More