Handwritten and Printed Text Separation in Real Document

Abdel Belaïd 1 Santosh K.C. 1 Vincent Poulain d'Andecy 2
1 READ - Recognition of writing and analysis of documents
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : The aim of the paper is to separate handwritten and printed text from a real document embedded with noise, graphics including annotations. Relying on run-length smoothing algorithm (RLSA), the extracted pseudo-lines and pseudo-words are used as basic blocks for classification. To handle this, a multi-class support vector machine (SVM) with Gaussian kernel performs a first labelling of each pseudo-word including the study of local neighbourhood. It then propagates the context between neighbours so that we can correct possible labelling errors. Considering running time complexity issue, we propose linear complexity methods where we use k-NN with constraint. When using a kd-tree, it is almost linearly proportional to the number of pseudo-words. The performance of our system is close to 90%, even when very small learning dataset are used, where samples are basically composed of complex administrative documents.
Type de document :
Communication dans un congrès
The Thirteenth IAPR International Conference on Machine Vision Applications - 2013, May 2013, Kyoto, Japan. 2013
Liste complète des métadonnées

https://hal.inria.fr/hal-00799331
Contributeur : Santosh K.C. <>
Soumis le : mardi 19 mars 2013 - 14:24:39
Dernière modification le : mardi 24 avril 2018 - 13:36:37
Document(s) archivé(s) le : dimanche 2 avril 2017 - 14:44:13

Fichiers

kc_vincent_belaidMVA.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00799331, version 2
  • ARXIV : 1303.4614

Collections

Citation

Abdel Belaïd, Santosh K.C., Vincent Poulain d'Andecy. Handwritten and Printed Text Separation in Real Document. The Thirteenth IAPR International Conference on Machine Vision Applications - 2013, May 2013, Kyoto, Japan. 2013. 〈hal-00799331v2〉

Partager

Métriques

Consultations de la notice

271

Téléchargements de fichiers

1360