Handwritten/printed text separation Using pseudo- lines for contextual re-labeling

Ahmad-Montaser Awal 1 Abdel Belaïd 1 Vincent Poulain d'Andecy 2
1 READ - Recognition of writing and analysis of documents
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : —This paper addresses the problem of machine printed and handwritten text separation in real noisy documents. We have proposed in a previous work a robust separation system relying on a proximity string segmentation algorithm. The extracted pseudo-lines and pseudo-words are used as basic blocks for classification. A multi-class support vector machine (SVM) with Gaussian kernel associates first an appropriate label to each pseudo-word. Then, the local neighborhood of each pseudo-word is studied in order to propagate the context and correct the classification errors. In this work, we first propose to model the separation problem by conditional random fields considering the horizontal neighborhood. As the considered neighborhood is too local to solve certain error cases, we have enhanced this method by using a more global context based on class dominance in the pseudo-line. The method has been evaluated on business documents. It separates handwritten and printed text with better scores (99.1% and 99.2% respectively), contrary to noise which is very random in these documents (90.1%).
Type de document :
Communication dans un congrès
International Conference on Frontiers on Handwriting Recognition, Sep 2014, Crète, Greece. 〈10.1109/ICFHR.2014.13〉
Liste complète des métadonnées

Littérature citée [16 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01111715
Contributeur : Abdel Belaid <>
Soumis le : vendredi 30 janvier 2015 - 18:25:35
Dernière modification le : jeudi 11 janvier 2018 - 02:01:49
Document(s) archivé(s) le : samedi 15 avril 2017 - 23:53:32

Fichier

4334a029.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

Collections

Citation

Ahmad-Montaser Awal, Abdel Belaïd, Vincent Poulain d'Andecy. Handwritten/printed text separation Using pseudo- lines for contextual re-labeling. International Conference on Frontiers on Handwriting Recognition, Sep 2014, Crète, Greece. 〈10.1109/ICFHR.2014.13〉. 〈hal-01111715〉

Partager

Métriques

Consultations de la notice

148

Téléchargements de fichiers

102