Skip to Main content Skip to Navigation
Conference papers

Tools for Semi-automatic Preparation of Training Data for OCR

Abstract : This work aims at data preparation for OCR systems based on recurrent neural networks. Precisely annotated data are necessary for training a network as well as for evaluation of OCR methods. It is possible to synthesize the data, however such data are not that realistic as the real ones. Manual annotation is thus still needed in many cases, especially in the case of historical documents we are focusing on. Although there are several complex systems for historical document processing, to the best of our knowledge, a simple annotation tool for OCR data is completely missing. Therefore, we propose and implement a set of tools utilizing artificial intelligence that simplify the annotation process. These tools create ground truths for line images that are used for training of nowadays OCR systems. Another contribution of this paper is making these tools freely available for research purposes.
Document type :
Conference papers
Complete list of metadata

Cited literature [15 references]  Display  Hide  Download

https://hal.inria.fr/hal-02331328
Contributor : Hal Ifip <>
Submitted on : Thursday, October 24, 2019 - 12:51:25 PM
Last modification on : Thursday, October 24, 2019 - 12:54:38 PM
Long-term archiving on: : Saturday, January 25, 2020 - 3:27:24 PM

File

 Restricted access
To satisfy the distribution rights of the publisher, the document is embargoed until : 2022-01-01

Please log in to resquest access to the document

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Citation

Ladislav Lenc, Jiří Martínek, Pavel Král. Tools for Semi-automatic Preparation of Training Data for OCR. 15th IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI), May 2019, Hersonissos, Greece. pp.351-361, ⟨10.1007/978-3-030-19823-7_29⟩. ⟨hal-02331328⟩

Share

Metrics

Record views

68