CES/XML : An XML-based Standard for Linguistic Corpora

Abstract : The Corpus Encoding Standard (CES) is a part of the EAGLES Guidelines developed by the Expert Advisory Group on Language Engineering Standards (EAGLES) that provides a set of encoding standards for corpus-based work in natural language processing applications. We have instantiated the CES as an XML application called XCES, based on the same data architecture comprised of a primary encoded text and "standoff" annotation in separate documents. Conversion to XML enables use of some of the more powerful mechanisms provided in the XML framework, including the XSLT Transformation Language, XML Schemas, and support for inter-rescue reference together with an extensive path syntax for pointers. In this paper, we describe the differences between the CES and XCES DTDs and demonstrate how XML mechanisms can be used to select from and manipulate annotated corpora encoded according to XCES specifications. We also provide a general overview of XML and the XML mechanisms that are most relevant to language engineering research and applications.
Type de document :
Communication dans un congrès
LREC Conference, May 2000, Athens, Greece. 2000
Liste complète des métadonnées

Littérature citée [14 références]  Voir  Masquer  Télécharger

Contributeur : Laurent Romary <>
Soumis le : mardi 12 octobre 2010 - 15:11:21
Dernière modification le : jeudi 11 janvier 2018 - 06:19:48
Document(s) archivé(s) le : jeudi 13 janvier 2011 - 02:31:15


Fichiers produits par l'(les) auteur(s)


  • HAL Id : inria-00525250, version 1



Nancy Ide, Patrice Bonhomme, Laurent Romary. CES/XML : An XML-based Standard for Linguistic Corpora. LREC Conference, May 2000, Athens, Greece. 2000. 〈inria-00525250〉



Consultations de la notice


Téléchargements de fichiers