News image annotation on a large parallel text-image corpus

Pierre Tirilly 1 Vincent Claveau 1, * Patrick Gros 1
* Auteur correspondant
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : In this paper, we present a multimodal parallel text-image corpus, and propose an image annotation method that exploits the textual information associated with images. Our corpus contains news articles composed of a text, images and image captions, and is significantly larger than the other news corpora proposed in image annotation papers (27,041 articles and 42,568 captionned images). In our experiments, we use the text of the articles as a textual information source to annotate images, and image captions as a groundtruth to evaluate our annotation algorithm. Our annotation method identifies relevant named entities in the texts, and associates them with high-level visual concepts detected in the images (in this paper, faces and logos). The named entities most suited to image annotation are selected using an unsupervised score based on their statistics, inspired from the weights used in information retrieval. Our experiments show that, although it is very simple, our annotation method achieves an acceptable accuracy on our real-world news corpus.
Type de document :
Communication dans un congrès
ELRA. 7th Language Resources and Evaluation Conference, LREC'10, May 2010, Valletta, Malta. 2010, 〈http://www.lrec-conf.org/proceedings/lrec2010/pdf/772_Paper.pdf〉
Liste complète des métadonnées

https://hal.inria.fr/inria-00561763
Contributeur : Patrick Gros <>
Soumis le : mardi 1 février 2011 - 17:56:22
Dernière modification le : jeudi 11 janvier 2018 - 06:20:10

Identifiants

  • HAL Id : inria-00561763, version 1

Collections

Citation

Pierre Tirilly, Vincent Claveau, Patrick Gros. News image annotation on a large parallel text-image corpus. ELRA. 7th Language Resources and Evaluation Conference, LREC'10, May 2010, Valletta, Malta. 2010, 〈http://www.lrec-conf.org/proceedings/lrec2010/pdf/772_Paper.pdf〉. 〈inria-00561763〉

Partager

Métriques

Consultations de la notice

143