F. Maes, A. Collignon, D. Vandermeulen, G. Marchal, and P. Suetens, Multimodality image registration by maximization of mutual information, IEEE Transactions on Medical Imaging, vol.16, issue.2, pp.187-198, 1997.
DOI : 10.1109/42.563664

T. Butz and J. Thiran, From error probability to information theoretic (multi-modal) signal processing, Signal Processing, vol.85, issue.5, pp.875-902, 2005.
DOI : 10.1016/j.sigpro.2004.11.027

I. R. Farah, M. B. Ahmed, and M. R. Boussema, Multispectral satellite image analysis based on the method of blind separation and fusion of sources, IGARSS 2003. 2003 IEEE International Geoscience and Remote Sensing Symposium. Proceedings (IEEE Cat. No.03CH37477), pp.3638-3640, 2003.
DOI : 10.1109/IGARSS.2003.1294879

K. C. Partington, A data fusion algorithm for mapping sea-ice concentrations from Special Sensor Microwave/Imager data, IEEE Transactions on Geoscience and Remote Sensing, vol.38, issue.4
DOI : 10.1109/36.851776

. Geosci, Remote Sensing, pp.1947-1958, 2000.

E. Martínez-montes, P. A. Valdés-sosa, F. Miwakeichi, R. I. Goldman, and M. S. Cohen, Concurrent EEG/fMRI analysis by multiway Partial Least Squares, NeuroImage, vol.22, issue.3, pp.1023-1034, 2004.
DOI : 10.1016/j.neuroimage.2004.03.038

C. Carmona-moreno, A. Belward, J. Malingreau, M. Garcia-alegre, A. Hartley et al., Characterizing interannual variations in global fire calendar using data from Earth observing satellites, Global Change Biology, vol.16, issue.9, pp.1537-1555, 2005.
DOI : 10.1080/014311698216035

G. Potamianos, C. Neti, G. Gravier, A. Garg, and A. W. Senior, Recent advances in the automatic recognition of audiovisual speech, Proc. IEEE, pp.1306-1326, 2003.

S. Lucey, T. Chen, S. Sridharan, and V. Chandran, Integration strategies for audio-visual speech processing: applied to text-dependent speaker recognition, IEEE Transactions on Multimedia, vol.7, issue.3, pp.495-506, 2005.
DOI : 10.1109/TMM.2005.846777

E. Cosatto, J. Ostermann, H. Graf, and J. Schroeter, Lifelike talking faces for interactive services, Proc. IEEE, pp.1406-1429, 2003.
DOI : 10.1109/JPROC.2003.817141

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.295.2514

J. Hershey and J. Movellan, Audio-vision: Using audio-visual synchrony to locate sounds, Proc. of NIPS, 1999.

M. Slaney and M. Covell, FaceSync: A linear operator for measuring synchronization of video facial images and audio tracks, Proc. of NIPS, 2000.

P. Smaragdis and M. Casey, Audio/visual independent components, Proc. of ICA, pp.709-714, 2003.

J. W. Fisher, I. , and T. Darrell, Speaker Association With Signal-Level Audiovisual Fusion, IEEE Transactions on Multimedia, vol.6, issue.3, pp.406-413, 2004.
DOI : 10.1109/TMM.2004.827503

E. Kidron, Y. Schechner, and M. Elad, Pixels that Sound, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp.88-95, 2005.
DOI : 10.1109/CVPR.2005.274

H. J. Nock, G. Iyengar, and C. Neti, Speaker localisation using audiovisual synchrony: an empirical study, Proc. Int. Conf. on Image and Video Retrieval (CIVR), pp.488-499, 2003.

G. Monaci and P. Vandergheynst, Audiovisual Gestalts, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06), 2006.
DOI : 10.1109/CVPRW.2006.34

G. Monaci, O. D. Escoda, and P. Vandergheynst, Analysis of multimodal sequences using geometric video representations, Signal Processing, vol.86, issue.12, pp.3534-3548, 2006.
DOI : 10.1016/j.sigpro.2006.02.044

J. Driver, Enhancement of selective listening by illusory mislocation of speech sounds due to lip-reading, Nature, vol.381, issue.6577, pp.66-68, 1996.
DOI : 10.1038/381066a0

M. T. Wallace, G. E. Roberson, W. D. Hairston, B. E. Stein, J. W. Vaughan et al., Unifying multisensory signals across time and space, Experimental Brain Research, vol.158, issue.2, pp.252-258, 2004.
DOI : 10.1007/s00221-004-1899-9

S. Watkins, L. Shams, S. Tanaka, J. Haynes, and G. Rees, Sound alters activity in human V1 in association with illusory visual perception, NeuroImage, vol.31, issue.3, pp.1247-1256, 2006.
DOI : 10.1016/j.neuroimage.2006.01.016

A. Violentyev, S. Shimojo, and L. Shams, Touch-induced visual illusion, Neuroreport, vol.10, issue.16, pp.1107-1110, 2005.
DOI : 10.1167/5.8.754

URL : http://doi.org/10.1167/5.8.754

J. Bresciani, F. Dammeier, M. K. Ernst-]-e, S. Patterson, Z. Gurbuz et al., Vision and touch are automatically integrated for the perception of sequences of events Movingtalker , speaker-independent feature study, and baseline results using the CUAVE multimodal speech corpus, Journal of Vision EURASIP Journal on Applied Signal Processing, vol.6, issue.2002 11, pp.554-564, 2002.

G. Monaci, O. D. Escoda, and P. Vandergheynst, Analysis of multimodal signals using redundant representations, IEEE International Conference on Image Processing 2005, pp.46-49, 2005.
DOI : 10.1109/ICIP.2005.1530349

J. Tropp, A. Gilbert, and M. J. Strauss, Simultaneous sparse approximation via greedy pursuit, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., pp.721-724, 2005.
DOI : 10.1109/ICASSP.2005.1416405

URL : http://authors.library.caltech.edu/9043/1/TROicassp05.pdf

M. Lewicki and T. Sejnowski, Learning Overcomplete Representations, Neural Computation, vol.33, issue.2, pp.337-365, 2000.
DOI : 10.1109/18.119725

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.164.7690

S. Abdallah and M. Plumbley, If edges are the independent components of natural images, what are the independent components of natural sounds, Proc. of ICA, pp.534-539, 2001.

P. Jost, P. Vandergheynst, S. Lesage, and R. Gribonval, MoTIF: An Efficient Algorithm for Learning Translation Invariant Dictionaries, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, pp.857-860, 2006.
DOI : 10.1109/ICASSP.2006.1661411

URL : https://hal.archives-ouvertes.fr/inria-00544911

A. Bell and T. Sejnowski, The ???independent components??? of natural scenes are edge filters, Vision Research, vol.37, issue.23, pp.3327-3338, 1997.
DOI : 10.1016/S0042-6989(97)00121-1

B. A. Olshausen and D. J. Field, Sparse coding with an overcomplete basis set: A strategy employed by V1?, Vision Research, vol.37, issue.23, pp.3311-3327, 1997.
DOI : 10.1016/S0042-6989(97)00169-7

M. Lewicki and B. Olshausen, Probabilistic framework for the adaptation and comparison of image codes, Journal of the Optical Society of America A, vol.16, issue.7, 1999.
DOI : 10.1364/JOSAA.16.001587

K. Kreutz-delgado, J. Murray, B. Rao, K. Engan, T. Lee et al., Dictionary Learning Algorithms for Sparse Representation, Neural Computation, vol.15, issue.2, pp.349-396, 2003.
DOI : 10.1162/089976601300014385

B. A. Olshausen, Learning sparse, overcomplete representations of time-varying natural images, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429), pp.41-44, 2003.
DOI : 10.1109/ICIP.2003.1246893

D. Dong and J. Atick, Temporal decorrelation: a theory of lagged and nonlagged responses in the lateral geniculate nucleus, Network: Computation in Neural Systems, vol.6, issue.2, pp.159-178, 1995.
DOI : 10.1088/0954-898X_6_2_003

O. and D. Escoda, Toward sparse and geometry adapted video approximations, Ph.D. dissertation, EPFL, 2005.