Finding Speaker Face Region by Audiovisual Correlation

Yuyu Liu; Yoichi Sato

Communication Dans Un Congrès Année : 2008

Finding Speaker Face Region by Audiovisual Correlation

(1, 2) , (1)

1
2

Yuyu Liu

Fonction : Auteur

Institute of Industrial Science

System Technologies Laboratories

Yoichi Sato

Fonction : Auteur

Institute of Industrial Science

Résumé

The ability to find the speaker face region in a video is important in various application areas. In this work, we develop a novel technique to find this region robustly against different views and complex backgrounds using gray images only. The main thrust of this technique is to integrate audiovisual correlation analysis into an image segmentation framework to extract the speaker face region. We first analyze the video in a time window and evaluate the audiovisual correlation locally at each pixel position using a novel statistical measure based on Quadratic Mutual Information. As only local visual information is adopted in this stage, the analysis is robust against the view change of the human face. Analyzed correlation is then incorporated into Graph Cut-based image segmentation, which optimizes an energy function defined over multiple video frames. As this process can find the global optimum segmentation with image information balanced, we thus can extract a reliable region aligned to real visual boundaries. Experimental results demonstrate the effectiveness and robustness of our method.

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

1569139970.pdf (437 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Peter Sturm : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00326761

Soumis le : dimanche 5 octobre 2008-14:10:19

Dernière modification le : lundi 17 mai 2021-12:00:04

Archivage à long terme le : lundi 8 octobre 2012-14:02:36

Dates et versions

inria-00326761 , version 1 (05-10-2008)

Identifiants

HAL Id : inria-00326761 , version 1

Citer

Yuyu Liu, Yoichi Sato. Finding Speaker Face Region by Audiovisual Correlation. Workshop on Multi-camera and Multi-modal Sensor Fusion Algorithms and Applications - M2SFA2 2008, Andrea Cavallaro and Hamid Aghajan, Oct 2008, Marseille, France. ⟨inria-00326761⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

M2SFA2

101 Consultations

177 Téléchargements

Finding Speaker Face Region by Audiovisual Correlation

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager