Citation recognition for scientific publications in digital libraries - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2004

Citation recognition for scientific publications in digital libraries

Abdel Belaïd
  • Fonction : Auteur
  • PersonId : 830137

Résumé

In this paper, a method based on part-of-speech tagging (PoS) is used for bibliographic reference structure. This method operates on a roughly structured ASCII file, produced by OCR.. Because of the heterogeneity of the reference structure, the method acts in a bottom-up way, without an a priori model, gathering structural elements from basic tags to sub-fields and fields. Significant tags are first grouped in homogeneous classes according to their categories and then reduced in canonical forms corresponding to record fields: ``authors'', title, conference name, date, etc. Non labeled tokens are integrated in one or another field by either applying PoS correction rules or using a inter- or intra-field model generated from well-detected records. The designed prototype operates with a great satisfaction on different record layouts and character recognition qualities. Without manual intervention, 96.6% words are correctly attributed, and about 75,9% references are completely segmented from 2,575 references.
Fichier principal
Vignette du fichier
Besagni2004.pdf (190.17 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

inria-00100181 , version 1 (15-02-2024)

Licence

Paternité

Identifiants

Citer

Dominique Besagni, Abdel Belaïd. Citation recognition for scientific publications in digital libraries. First International Workshop on Document Image Analysis for Libraries - DIAL'04, Jan 2004, Palo Alto, United States. pp.244-252, ⟨10.1109/DIAL.2004.1263253⟩. ⟨inria-00100181⟩
95 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More