A Platform for Storing, Visualizing, and Interpreting Collections of Noisy Documents

Bart Lamiroy 1, 2 Daniel Lopresti 3
1 QGAR - Querying Graphics through Analysis and Recognition
LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
2 ORPAILLEUR - Knowledge representation, reasonning
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : The goal of document image analysis is to produce interpretations that match those of a fluent and knowledgeable human when viewing the same input. Because computer vision techniques are not perfect, the text that results when processing scanned pages is frequently noisy. Building on previous work, we propose a new paradigm for handling the inevitable incomplete, partial, erroneous, or slightly orthogonal interpretations that commonly arise in document datasets. Starting from the observation that interpretations are dependent on application context or user viewpoint, we describe a platform now under development that is capable of managing multiple interpretations for a document and offers an unprecedented level of interaction so that users can freely build upon, extend, or correct existing interpretations. In this way, the system supports the creation of a continuously expanding and improving document analysis repository which can be used to support research in the field.
Type de document :
Communication dans un congrès
Fourth Workshop on Analytics for Noisy Unstructured Text Data - AND'10, Oct 2010, Toronto, Canada. ACM, 2010, ACM International Conference Proceeding Series. 〈10.1145/1871840.1871844〉
Liste complète des métadonnées

Littérature citée [20 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00516678
Contributeur : Bart Lamiroy <>
Soumis le : vendredi 10 septembre 2010 - 18:09:03
Dernière modification le : jeudi 11 janvier 2018 - 06:23:16
Document(s) archivé(s) le : samedi 11 décembre 2010 - 02:51:14

Fichier

and16-lamiroy-HAL.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Bart Lamiroy, Daniel Lopresti. A Platform for Storing, Visualizing, and Interpreting Collections of Noisy Documents. Fourth Workshop on Analytics for Noisy Unstructured Text Data - AND'10, Oct 2010, Toronto, Canada. ACM, 2010, ACM International Conference Proceeding Series. 〈10.1145/1871840.1871844〉. 〈inria-00516678〉

Partager

Métriques

Consultations de la notice

200

Téléchargements de fichiers

191