M. Anthimopoulos, B. Gatos, and I. Pratikakis, A two-stage scheme for text detection in video images, Image and Vision Computing, vol.28, issue.9, pp.1413-1426, 2010.
DOI : 10.1016/j.imavis.2010.03.004

S. Ayache and G. Quenot, Evaluation of active learning strategies for video indexing, Signal Processing: Image Communication, pp.692-704, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00953887

M. Bauml, Multi-pose Face Recognition for Person Retrieval in Camera Networks, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance, pp.441-447, 2010.
DOI : 10.1109/AVSS.2010.42

S. Chen and P. Gopalakrishnan, Speaker, environment and channel change detection and clustering via the bayesian information criterion, Proc. DARPA Broadcast News Transcription and Understanding Workshop, p.8, 1998.

T. Choudhury, B. Clarkson, T. Jebara, and A. Pentland, Mulitmodal person recognition using unconstrained audio and video, Proceedings, International Conference on Audio-and Video-Based Person Authentication, pp.176-181, 1999.

A. Giraudel, The repere corpus: a multimodal corpus for person recognition, LREC, pp.1102-1107, 2012.

M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid, Face Recognition from Caption-Based Supervision, International Journal of Computer Vision, vol.57, issue.2, pp.64-82, 2012.
DOI : 10.1007/s11263-011-0447-x

URL : https://hal.archives-ouvertes.fr/inria-00522185

W. Hu, W. Hu, N. Xie, and S. Maybank, Unsupervised active learning based on hierarchical graph-theoretic clustering, IEEE Transactions on Systems, Man and Cybernetics ? Part B: Cybernetics, vol.39, issue.5, pp.1147-1161, 2009.

T. Hieu, A. Nguyen, and . Smeulders, Active learning using preclustering, ICML, p.79, 2004.

G. Zonta-pastorello, J. Daltio, and C. B. Medeiros, Multimedia semantic annotation propagation, Multimedia Tenth IEEE International Symposium on, pp.509-514, 2008.

T. Phi-the-pham, M. Tuytelaars, and . Moens, Naming People in News Videos with Label Propagation, IEEE Multimedia, vol.18, issue.3, pp.44-55, 2011.
DOI : 10.1109/MMUL.2011.22

J. Poignant, Identification non-supervisée de personnes dans les flux télévisés, 2013.

J. Poignant, L. Besacier, G. Quénot, and F. Thollard, From Text Detection in Videos to Person Identification, 2012 IEEE International Conference on Multimedia and Expo, pp.854-859, 2012.
DOI : 10.1109/ICME.2012.119

URL : https://hal.archives-ouvertes.fr/hal-00767383

J. Poignant, Towards, a better integration of written names for unsupervised speakers identification in videos, SLAM, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00953089

B. Safadi and G. Quénot, Active learning with multiple classifiers for multimedia indexing, Multimedia Tools and Applications, pp.403-417, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00767025

G. Schohn and D. Cohn, Less is more: Active learning with support vector machines, ICML, pp.839-846, 2000.