A Framework for Multi-level Linguistic Annotation

Patrice Lopez 1 Laurent Romary 1
1 LANGUE ET DIALOGUE - Human-machine dialogue with a significant language component
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : This article presents a 3-step model for multi- layer annotations of corpora. Each kind of an- notation for a textual corporacorresponds to a dierent view on the same document. This prin- ciple can be expressed rst with a general re- lational model dedicated to the organisation of LR. This abstract model is then implemented as an application of the XML formalism for the en- coding of large corpora. The exploitation of this kind of annotated corpora requires ecient ma- nipulation processes and reversive access. We propose to use a third step representation based on a set of optimised FSA resulting from the parsing of the XML documents. These propo- sitions have been implemented in the rst ver- sion of a workbench dedicated to the French Le Monde corpus.
Document type :
Conference papers
Complete list of metadatas

Cited literature [8 references]  Display  Hide  Download

https://hal.inria.fr/inria-00525227
Contributor : Laurent Romary <>
Submitted on : Monday, October 11, 2010 - 2:30:23 PM
Last modification on : Friday, March 22, 2019 - 2:22:12 PM
Long-term archiving on : Thursday, June 30, 2011 - 1:32:36 PM

File

lopez-romary.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00525227, version 1

Collections

Citation

Patrice Lopez, Laurent Romary. A Framework for Multi-level Linguistic Annotation. LREC Workshop on Large Corpus Annotation and Software Standards, Data Architectures and Software Support for Large Corpora,, May 2000, Athens, Greece. ⟨inria-00525227⟩

Share

Metrics

Record views

185

Files downloads

93