Représentation des données en XML pour l'analyse d'images de documents

Abdel Belaïd 1 Yves Rangoni 1 Ingrid Flak 2
1 READ - READ
LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
2 TALARIS - Natural Language Processing: representation, inference and semantics
Inria Nancy - Grand Est, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : This paper presents the use of XML format for document modelling and describing the results of document analysis and recognition steps. We have chosen ALTO for physical structure stemmed from OCR, TEI for logical structures and METS for the relationships between both. As the system tools representations are not homogeneous, we have proposed a series of XSL transforms for harmonization. The experiments performed on two kinds of documents: scientific with a macro-structure and historical with micro-structures show how this standard choice can maintain the coherence of data along all the processing chain.
Document type :
Conference papers
Liste complète des métadonnées

https://hal.inria.fr/inria-00618529
Contributor : Abdel Belaid <>
Submitted on : Friday, September 2, 2011 - 9:34:44 AM
Last modification on : Thursday, January 11, 2018 - 6:21:35 AM

Identifiers

  • HAL Id : inria-00618529, version 1

Collections

Citation

Abdel Belaïd, Yves Rangoni, Ingrid Flak. Représentation des données en XML pour l'analyse d'images de documents. Conférence Internationale sur l'Ecrit et le Document, Jul 2007, Nancy, France. ⟨inria-00618529⟩

Share

Metrics

Record views

96