Skip to Main content Skip to Navigation
Conference papers

Training Strategies for OCR Systems for Historical Documents

Abstract : This paper presents an overview of training strategies for optical character recognition of historical documents. The main issue is the lack of the annotated data and its quality. We summarize several ways of synthetic data preparation. The main goal of this paper is to show and compare possibilities how to train a convolutional recurrent neural network classifier using the synthetic data and its combination with a real annotated dataset.
Document type :
Conference papers
Complete list of metadata

Cited literature [15 references]  Display  Hide  Download

https://hal.inria.fr/hal-02331288
Contributor : Hal Ifip <>
Submitted on : Thursday, October 24, 2019 - 12:49:35 PM
Last modification on : Thursday, October 24, 2019 - 12:54:48 PM
Long-term archiving on: : Saturday, January 25, 2020 - 3:20:16 PM

File

 Restricted access
To satisfy the distribution rights of the publisher, the document is embargoed until : 2022-01-01

Please log in to resquest access to the document

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Citation

Jiří Martínek, Ladislav Lenc, Pavel Král. Training Strategies for OCR Systems for Historical Documents. 15th IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI), May 2019, Hersonissos, Greece. pp.362-373, ⟨10.1007/978-3-030-19823-7_30⟩. ⟨hal-02331288⟩

Share

Metrics

Record views

64