Learning Multi-Modal Dictionaries

Gianluca Monaci 1 Philippe Jost 1 Pierre Vandergheynst 1 Boris Mailhé 2 Sylvain Lesage 2 Rémi Gribonval 2
2 METISS - Speech and sound data modeling and processing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : Real-world phenomena involve complex interactions between multiple signal modalities. As a consequence, humans are used to integrate at each instant perceptions from all their senses in order to enrich their understanding of the surrounding world. This paradigm can be also extremely useful in many signal processing and computer vision problems involving mutually related signals. The simultaneous processing of multimodal data can, in fact, reveal information that is otherwise hidden when considering the signals independently. However, in natural multimodal signals, the statistical dependencies between modalities are in general not obvious. Learning fundamental multimodal patterns could offer deep insight into the structure of such signals. In this paper, we present a novel model of multimodal signals based on their sparse decomposition over a dictionary of multimodal structures. An algorithm for iteratively learning multimodal generating functions that can be shifted at all positions in the signal is proposed, as well. The learning is defined in such a way that it can be accomplished by iteratively solving a generalized eigenvector problem, which makes the algorithm fast, flexible, and free of user-defined parameters. The proposed algorithm is applied to audiovisual sequences and it is able to discover underlying structures in the data. The detection of such audio-video patterns in audiovisual clips allows to effectively localize the sound source on the video in presence of substantial acoustic and visual distractors, outperforming state-of-the-art audiovisual localization algorithms.
Complete list of metadatas

Cited literature [35 references]  Display  Hide  Download

https://hal.inria.fr/inria-00544772
Contributor : Rémi Gribonval <>
Submitted on : Monday, February 7, 2011 - 4:35:27 PM
Last modification on : Thursday, June 27, 2019 - 12:22:15 PM
Long-term archiving on : Sunday, May 8, 2011 - 2:32:50 AM

File

2007_TIP_MonaciEtAl.pdf
Files produced by the author(s)

Identifiers

Citation

Gianluca Monaci, Philippe Jost, Pierre Vandergheynst, Boris Mailhé, Sylvain Lesage, et al.. Learning Multi-Modal Dictionaries. IEEE Transactions on Image Processing, Institute of Electrical and Electronics Engineers, 2007, 16 (9), pp.2272-2283. ⟨10.1109/TIP.2007.901813⟩. ⟨inria-00544772⟩

Share

Metrics

Record views

588

Files downloads

630