Training Strategies for OCR Systems for Historical Documents - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2019

Training Strategies for OCR Systems for Historical Documents

Résumé

This paper presents an overview of training strategies for optical character recognition of historical documents. The main issue is the lack of the annotated data and its quality. We summarize several ways of synthetic data preparation. The main goal of this paper is to show and compare possibilities how to train a convolutional recurrent neural network classifier using the synthetic data and its combination with a real annotated dataset.
Fichier principal
Vignette du fichier
483292_1_En_30_Chapter.pdf (968.48 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02331288 , version 1 (24-10-2019)

Licence

Paternité

Identifiants

Citer

Jiří Martínek, Ladislav Lenc, Pavel Král. Training Strategies for OCR Systems for Historical Documents. 15th IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI), May 2019, Hersonissos, Greece. pp.362-373, ⟨10.1007/978-3-030-19823-7_30⟩. ⟨hal-02331288⟩
38 Consultations
45 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More