Meta-Data Extraction from Bibliographic Documents for Digital Library

Abdel Belaïd 1 Dominique Besagni 2
1 READ - READ
LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : This chapter addresses the problem of automatic metadata extraction within digitized documents by retro-conversion techniques. The focus is on bibliographic documents as they are by nature a source of such metadata. They are strongly structuring for a digital library (DL), their automatic recognition presents an obvious interest. However as their origin is very different (references, citations, tables of content, index cards), a generic methodology is proposed for their structure. Based on a first morphological labeling of the text, it looks for syntactic elements (syntagmas) revealing the bibliographic field nature (title, authors, date, publication source, etc.). Depending on the case, the syntax is validated either by a given grammar or by occurrence analysis in the different document elements (i.e. several references in a bibliography, or articles in a table of content). In the later, the bottom-up procedure generates a structure model from the well-recognized elements and applies it on the rest. The modeling requires taking into consideration the interand intra-fields relationships. The experiments performed on different types of documents confirm the interest of this approach.
Type de document :
Chapitre d'ouvrage
B. Chaudhuri. Digital Document Processing - Major Directions and Recent Advances, Springer, pp.329-350, 2007, Advances in Pattern Recognition, 978-1-84628-501-1. 〈10.1007/978-1-84628-726-8_15〉
Liste complète des métadonnées

Littérature citée [14 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00579640
Contributeur : Abdel Belaid <>
Soumis le : vendredi 25 mars 2011 - 08:28:50
Dernière modification le : jeudi 11 janvier 2018 - 06:19:59
Document(s) archivé(s) le : dimanche 26 juin 2011 - 02:31:39

Fichier

belaid-besagni-chaudhuri.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Abdel Belaïd, Dominique Besagni. Meta-Data Extraction from Bibliographic Documents for Digital Library. B. Chaudhuri. Digital Document Processing - Major Directions and Recent Advances, Springer, pp.329-350, 2007, Advances in Pattern Recognition, 978-1-84628-501-1. 〈10.1007/978-1-84628-726-8_15〉. 〈inria-00579640〉

Partager

Métriques

Consultations de la notice

188

Téléchargements de fichiers

77