HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

A Framework for Multi-level Linguistic Annotation

Patrice Lopez 1 Laurent Romary 1
1 LANGUE ET DIALOGUE - Human-machine dialogue with a significant language component
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : This article presents a 3-step model for multi- layer annotations of corpora. Each kind of an- notation for a textual corporacorresponds to a dierent view on the same document. This prin- ciple can be expressed rst with a general re- lational model dedicated to the organisation of LR. This abstract model is then implemented as an application of the XML formalism for the en- coding of large corpora. The exploitation of this kind of annotated corpora requires ecient ma- nipulation processes and reversive access. We propose to use a third step representation based on a set of optimised FSA resulting from the parsing of the XML documents. These propo- sitions have been implemented in the rst ver- sion of a workbench dedicated to the French Le Monde corpus.
Document type :
Conference papers
Complete list of metadata

Cited literature [8 references]  Display  Hide  Download

Contributor : Laurent Romary Connect in order to contact the contributor
Submitted on : Monday, October 11, 2010 - 2:30:23 PM
Last modification on : Friday, February 4, 2022 - 3:17:21 AM
Long-term archiving on: : Thursday, June 30, 2011 - 1:32:36 PM


Files produced by the author(s)


  • HAL Id : inria-00525227, version 1



Patrice Lopez, Laurent Romary. A Framework for Multi-level Linguistic Annotation. LREC Workshop on Large Corpus Annotation and Software Standards, Data Architectures and Software Support for Large Corpora,, May 2000, Athens, Greece. ⟨inria-00525227⟩



Record views


Files downloads