HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

A segmentation method for bibliographic references by contextual tagging of fields

Dominique Besagni 1 Abdel Belaïd 2 Nelly Benet 2
LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In this paper, a method based on part of speech tagging (PoS) is used for bibliographic reference structure. This method operates on a roughly structured ASCII file, produced by OCR. Because of the heterogeneity of the reference structure, the method acts in a bottom up way, without an a priori model, gathering structural elements from basic tags to sub-fields and fields. Significant tags are first grouped in homogeneous classes according to their grammar categories and then reduced in canonical forms corresponding to record fields: ``authors'', title, conference name, date, etc. Non labelled tokens are integrated in one or another field by either applying PoS correction rules or using a structure model generated from well detected records. The designed prototype operates with a great satisfaction on different record layouts and character recognition qualities. Without manual intervention, 96.6% words are correctly attributed, and about 75,9% references are completely segmented from 2500 references.
Document type :
Conference papers
Complete list of metadata

Cited literature [6 references]  Display  Hide  Download

Contributor : Publications Loria Connect in order to contact the contributor
Submitted on : Thursday, October 19, 2006 - 9:04:46 AM
Last modification on : Monday, May 10, 2021 - 4:22:03 PM
Long-term archiving on: : Wednesday, March 29, 2017 - 12:53:51 PM


  • HAL Id : inria-00107677, version 1



Dominique Besagni, Abdel Belaïd, Nelly Benet. A segmentation method for bibliographic references by contextual tagging of fields. Seventh International Conference on Document Analysis and Recognition, Aug 2003, Edinburgh, Scotland, France. 5 p. ⟨inria-00107677⟩



Record views


Files downloads