Automatic and interactive rule inference without ground truth - Archive ouverte HAL Access content directly
Conference Papers Year : 2015

Automatic and interactive rule inference without ground truth

(1, 2) , (3, 1) , (1, 2)
1
2
3

Abstract

Dealing with non annotated documents for the design of a document recognition system is not an easy task. In general, statistical methods cannot learn without an annotated ground truth, unlike syntactical methods. However their ability to deal with non annotated data comes from the fact that the description is manually made by a user. The adaptation to a new kind of document is then tedious as the whole manual process of extraction of knowledge has to be redone. In this paper, we propose a method to extract knowledge and generate rules without any ground truth. Using large volume of non annotated documents, it is possible to study redundancies of some extracted elements in the document images. The redundancy is exploited through an automatic clustering algorithm. An interaction with the user brings semantic to the detected clusters. In this work, the extracted elements are some keywords extracted with word spotting. This approach has been applied to old marriage record field detection on the FamilySearch HIP2013 competition database. The results demonstrate that we successfully automatically infer rules from non annotated documents using the redundancy of extracted elements of the documents.
Fichier principal
Vignette du fichier
icdar_2015_ccarton_hal.pdf (2.02 Mo) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-01197470 , version 1 (11-09-2015)

Identifiers

  • HAL Id : hal-01197470 , version 1

Cite

Cérès Carton, Aurélie Lemaitre, Bertrand Coüasnon. Automatic and interactive rule inference without ground truth. International Conference on Document Analysis and Recognition (ICDAR), Aug 2015, Nancy, France. ⟨hal-01197470⟩
258 View
222 Download

Share

Gmail Facebook Twitter LinkedIn More