HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Citation recognition for scientific publications in digital libraries

Dominique Besagni 1 Abdel Belaïd 2
LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In this paper, a method based on part-of-speech tagging (PoS) is used for bibliographic reference structure. This method operates on a roughly structured ASCII file, produced by OCR.. Because of the heterogeneity of the reference structure, the method acts in a bottom-up way, without an a priori model, gathering structural elements from basic tags to sub-fields and fields. Significant tags are first grouped in homogeneous classes according to their categories and then reduced in canonical forms corresponding to record fields: ``authors'', title, conference name, date, etc. Non labeled tokens are integrated in one or another field by either applying PoS correction rules or using a inter- or intra-field model generated from well-detected records. The designed prototype operates with a great satisfaction on different record layouts and character recognition qualities. Without manual intervention, 96.6% words are correctly attributed, and about 75,9% references are completely segmented from 2,575 references.
Document type :
Conference papers
Complete list of metadata

Contributor : Publications Loria Connect in order to contact the contributor
Submitted on : Tuesday, September 26, 2006 - 10:15:13 AM
Last modification on : Monday, May 10, 2021 - 4:22:03 PM


  • HAL Id : inria-00100181, version 1



Dominique Besagni, Abdel Belaïd. Citation recognition for scientific publications in digital libraries. First International Workshop on Document Image Analysis for Libraries - DIAL'04, Jan 2004, Palo Alto, California, France. pp.244-252. ⟨inria-00100181⟩



Record views