Skip to Main content Skip to Navigation
Conference papers

Document Information Extraction and its Evaluation based on Client's Relevance

Santosh K.C. 1 Abdel Belaïd 1
1 READ - Recognition of writing and analysis of documents
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : In this paper, we present a model-based document information content extraction approach and perform in-depth evaluation based on clients' relevance. Real-world users i.e., clients first provide a set of key fields from the document image which they think are important. These are used to represent a graph where nodes (i.e., fields) are labelled with dynamic semantics including other features and edges are attributed with spatial relations. Such an attributed relational graph (ARG) is then used to mine similar graphs from a document image that are used to reinforce or update the initial graph iteratively each time we extract them, in order to produce a model. Models therefore, can be employed in the absence of clients. We have validated the concept and evaluated its scientific impact on real-world industrial problem, where table extraction is found to be the best suited application.
Document type :
Conference papers
Complete list of metadata

Cited literature [12 references]  Display  Hide  Download

https://hal.inria.fr/hal-00822479
Contributor : Santosh K.C. <>
Submitted on : Tuesday, May 14, 2013 - 5:09:11 PM
Last modification on : Friday, January 15, 2021 - 5:42:02 PM
Long-term archiving on: : Monday, August 19, 2013 - 4:05:16 PM

File

kc_ICDAR2013_5.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Santosh K.C., Abdel Belaïd. Document Information Extraction and its Evaluation based on Client's Relevance. ICDAR - International Conference on Document Analysis and Recognition - 2013, Aug 2013, Washington DC, United States. ⟨10.1109/ICDAR.2013.16⟩. ⟨hal-00822479⟩

Share

Metrics

Record views

256

Files downloads

521