Iterative analysis of document collections enables efficient human-initiated interaction

Abstract : Document analysis and recognition systems often fail to produce results with a sufficient quality level when processing old and damaged documents sets, and require manual corrections to improve results. This paper presents how, using the iterative analysis of document pages we recently proposed, we can implement a spontaneous interaction model, suitable for mass document processing. It enables human operators to detect and correct errors made by the automatic system, and reintegrates the corrections they made into subsequent analysis steps of the iterative analysis process. Thus, a page analyzer can reprocess erroneous parts and those which depend on them, avoiding the necessity to manually fix during post-processing all the consequences of errors made by the automatic system. After presenting the global system architecture and a prototype implementation of our proposal, we show that document model can be simply enriched to enable the spontaneous interaction model we propose. We present how to use it in a practical example to correct under-segmentation issues during the localization of numbers in documents from the 18th century. Evaluations we conducted on the example case show, on 50 pages containing 1637 numbers to localize, that the interaction model we propose can reduce human workload (29.8% less elements to provide) for a given target quality level when compared to a manual post-processing.
Type de document :
Communication dans un congrès
DRR - Document Recognition and Retrieval XIX, Part of the IS&T/SPIE 24th Annual Symposium on Electronic Imaging, Jan 2012, San Francisco, United States. 8297, pp.82970L, 2012, 〈10.1117/12.911995〉
Liste complète des métadonnées

https://hal.inria.fr/hal-00686858
Contributeur : Joseph Chazalon <>
Soumis le : mercredi 11 avril 2012 - 14:13:35
Dernière modification le : mercredi 16 mai 2018 - 11:23:35
Document(s) archivé(s) le : lundi 26 novembre 2012 - 13:20:16

Identifiants

Citation

Joseph Chazalon, Bertrand Couasnon. Iterative analysis of document collections enables efficient human-initiated interaction. DRR - Document Recognition and Retrieval XIX, Part of the IS&T/SPIE 24th Annual Symposium on Electronic Imaging, Jan 2012, San Francisco, United States. 8297, pp.82970L, 2012, 〈10.1117/12.911995〉. 〈hal-00686858〉

Partager

Métriques

Consultations de la notice

272

Téléchargements de fichiers

130