Meta-Data Extraction from Bibliographic Documents for Digital Library - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Chapitre D'ouvrage Année : 2007

Meta-Data Extraction from Bibliographic Documents for Digital Library

Abdel Belaïd
  • Fonction : Auteur
  • PersonId : 830137

Résumé

This chapter addresses the problem of automatic metadata extraction within digitized documents by retro-conversion techniques. The focus is on bibliographic documents as they are by nature a source of such metadata. They are strongly structuring for a digital library (DL), their automatic recognition presents an obvious interest. However as their origin is very different (references, citations, tables of content, index cards), a generic methodology is proposed for their structure. Based on a first morphological labeling of the text, it looks for syntactic elements (syntagmas) revealing the bibliographic field nature (title, authors, date, publication source, etc.). Depending on the case, the syntax is validated either by a given grammar or by occurrence analysis in the different document elements (i.e. several references in a bibliography, or articles in a table of content). In the later, the bottom-up procedure generates a structure model from the well-recognized elements and applies it on the rest. The modeling requires taking into consideration the interand intra-fields relationships. The experiments performed on different types of documents confirm the interest of this approach.
Fichier principal
Vignette du fichier
belaid-besagni-chaudhuri.pdf (613.26 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

inria-00579640 , version 1 (25-03-2011)

Identifiants

Citer

Abdel Belaïd, Dominique Besagni. Meta-Data Extraction from Bibliographic Documents for Digital Library. Balarko Chaudhuri. Digital Document Processing - Major Directions and Recent Advances, Springer, pp.329-350, 2007, Advances in Pattern Recognition, 978-1-84628-501-1. ⟨10.1007/978-1-84628-726-8_15⟩. ⟨inria-00579640⟩
90 Consultations
178 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More