Automatic and interactive rule inference without ground truth

Abstract : Dealing with non annotated documents for the design of a document recognition system is not an easy task. In general, statistical methods cannot learn without an annotated ground truth, unlike syntactical methods. However their ability to deal with non annotated data comes from the fact that the description is manually made by a user. The adaptation to a new kind of document is then tedious as the whole manual process of extraction of knowledge has to be redone. In this paper, we propose a method to extract knowledge and generate rules without any ground truth. Using large volume of non annotated documents, it is possible to study redundancies of some extracted elements in the document images. The redundancy is exploited through an automatic clustering algorithm. An interaction with the user brings semantic to the detected clusters. In this work, the extracted elements are some keywords extracted with word spotting. This approach has been applied to old marriage record field detection on the FamilySearch HIP2013 competition database. The results demonstrate that we successfully automatically infer rules from non annotated documents using the redundancy of extracted elements of the documents.
Type de document :
Communication dans un congrès
International Conference on Document Analysis and Recognition (ICDAR), Aug 2015, Nancy, France. 2015
Liste complète des métadonnées

Littérature citée [13 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01197470
Contributeur : Cérès Carton <>
Soumis le : vendredi 11 septembre 2015 - 17:33:17
Dernière modification le : vendredi 16 novembre 2018 - 01:35:40
Document(s) archivé(s) le : mardi 29 décembre 2015 - 00:45:34

Fichier

icdar_2015_ccarton_hal.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01197470, version 1

Citation

Cérès Carton, Aurélie Lemaitre, Bertrand Coüasnon. Automatic and interactive rule inference without ground truth. International Conference on Document Analysis and Recognition (ICDAR), Aug 2015, Nancy, France. 2015. 〈hal-01197470〉

Partager

Métriques

Consultations de la notice

339

Téléchargements de fichiers

161