Inexact graph matching for entity recognition in OCRed documents

Nihel Kooli 1 Abdel Belaid 1
1 READ - Recognition of writing and analysis of documents
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : This paper proposes an entity recognition system in image documents recognized by OCR. The system is based on a graph matching technique and is guided by a database describing the entities in its records. The input of the system is a document which is labeled by the entity attributes. A first grouping of those labels based on a function score leads to a selected set of candidate entities. The entity labels which are locally close are modeled by a structure graph. This graph is matched with model graphs learned for this purpose. The graph matching technique relies on a specific cost function that integrates the feature dissimilarities. The matching results are exploited to correct the mislabeling errors and then validate the entity recognition task. The system evaluation on three datasets which treat different kind of entities shows a variation between 88.3% and 95% for recall and 94.3% and 95.7% for precision.
Liste complète des métadonnées

Littérature citée [16 références]  Voir  Masquer  Télécharger
Contributeur : Nihel Kooli <>
Soumis le : jeudi 27 avril 2017 - 17:25:47
Dernière modification le : mardi 18 décembre 2018 - 16:38:02
Document(s) archivé(s) le : vendredi 28 juillet 2017 - 13:07:52


Fichiers produits par l'(les) auteur(s)




Nihel Kooli, Abdel Belaid. Inexact graph matching for entity recognition in OCRed documents. ICPR, Dec 2016, Mexico, Mexico. pp.4071 - 4076, 2016, 〈10.1109/ICPR.2016.7900271〉. 〈hal-01515412〉



Consultations de la notice


Téléchargements de fichiers