A Platform for Storing, Visualizing, and Interpreting Collections of Noisy Documents

Bart Lamiroy 1, 2 Daniel Lopresti 3
1 QGAR - Querying Graphics through Analysis and Recognition
LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
2 ORPAILLEUR - Knowledge representation, reasonning
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : The goal of document image analysis is to produce interpretations that match those of a fluent and knowledgeable human when viewing the same input. Because computer vision techniques are not perfect, the text that results when processing scanned pages is frequently noisy. Building on previous work, we propose a new paradigm for handling the inevitable incomplete, partial, erroneous, or slightly orthogonal interpretations that commonly arise in document datasets. Starting from the observation that interpretations are dependent on application context or user viewpoint, we describe a platform now under development that is capable of managing multiple interpretations for a document and offers an unprecedented level of interaction so that users can freely build upon, extend, or correct existing interpretations. In this way, the system supports the creation of a continuously expanding and improving document analysis repository which can be used to support research in the field.
Complete list of metadatas

Cited literature [20 references]  Display  Hide  Download

https://hal.inria.fr/inria-00516678
Contributor : Bart Lamiroy <>
Submitted on : Friday, September 10, 2010 - 6:09:03 PM
Last modification on : Saturday, February 9, 2019 - 12:54:06 PM
Long-term archiving on : Saturday, December 11, 2010 - 2:51:14 AM

File

and16-lamiroy-HAL.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Bart Lamiroy, Daniel Lopresti. A Platform for Storing, Visualizing, and Interpreting Collections of Noisy Documents. Fourth Workshop on Analytics for Noisy Unstructured Text Data - AND'10, IAPR, Oct 2010, Toronto, Canada. ⟨10.1145/1871840.1871844⟩. ⟨inria-00516678⟩

Share

Metrics

Record views

248

Files downloads

296