Skip to Main content Skip to Navigation
Conference papers

Document Information Extraction and its Evaluation based on Client's Relevance

Santosh K.C. 1 Abdel Belaïd 1 
1 READ - Recognition of writing and analysis of documents
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : In this paper, we present a model-based document information content extraction approach and perform in-depth evaluation based on clients' relevance. Real-world users i.e., clients first provide a set of key fields from the document image which they think are important. These are used to represent a graph where nodes (i.e., fields) are labelled with dynamic semantics including other features and edges are attributed with spatial relations. Such an attributed relational graph (ARG) is then used to mine similar graphs from a document image that are used to reinforce or update the initial graph iteratively each time we extract them, in order to produce a model. Models therefore, can be employed in the absence of clients. We have validated the concept and evaluated its scientific impact on real-world industrial problem, where table extraction is found to be the best suited application.
Document type :
Conference papers
Complete list of metadata

Cited literature [12 references]  Display  Hide  Download
Contributor : Santosh K.C. Connect in order to contact the contributor
Submitted on : Tuesday, May 14, 2013 - 5:09:11 PM
Last modification on : Saturday, October 16, 2021 - 11:26:09 AM
Long-term archiving on: : Monday, August 19, 2013 - 4:05:16 PM


Files produced by the author(s)




Santosh K.C., Abdel Belaïd. Document Information Extraction and its Evaluation based on Client's Relevance. ICDAR - International Conference on Document Analysis and Recognition - 2013, Aug 2013, Washington DC, United States. ⟨10.1109/ICDAR.2013.16⟩. ⟨hal-00822479⟩



Record views


Files downloads