Document Information Extraction and its Evaluation based on Client's Relevance

Santosh K.C. 1 Abdel Belaïd 1
1 READ - Recognition of writing and analysis of documents
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : In this paper, we present a model-based document information content extraction approach and perform in-depth evaluation based on clients' relevance. Real-world users i.e., clients first provide a set of key fields from the document image which they think are important. These are used to represent a graph where nodes (i.e., fields) are labelled with dynamic semantics including other features and edges are attributed with spatial relations. Such an attributed relational graph (ARG) is then used to mine similar graphs from a document image that are used to reinforce or update the initial graph iteratively each time we extract them, in order to produce a model. Models therefore, can be employed in the absence of clients. We have validated the concept and evaluated its scientific impact on real-world industrial problem, where table extraction is found to be the best suited application.
Type de document :
Communication dans un congrès
ICDAR - International Conference on Document Analysis and Recognition - 2013, Aug 2013, Washington DC, United States. IEEE, 2013, 〈10.1109/ICDAR.2013.16〉
Liste complète des métadonnées

Littérature citée [12 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00822479
Contributeur : Santosh K.C. <>
Soumis le : mardi 14 mai 2013 - 17:09:11
Dernière modification le : jeudi 11 janvier 2018 - 06:25:25
Document(s) archivé(s) le : lundi 19 août 2013 - 16:05:16

Fichier

kc_ICDAR2013_5.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Santosh K.C., Abdel Belaïd. Document Information Extraction and its Evaluation based on Client's Relevance. ICDAR - International Conference on Document Analysis and Recognition - 2013, Aug 2013, Washington DC, United States. IEEE, 2013, 〈10.1109/ICDAR.2013.16〉. 〈hal-00822479〉

Partager

Métriques

Consultations de la notice

170

Téléchargements de fichiers

204