Handwritten and Printed Text Separation in Real Document

Abdel Belaïd; Santosh K.C.; Vincent Poulain d'Andecy

Communication Dans Un Congrès Année : 2013

Handwritten and Printed Text Separation in Real Document

(1) , (1) , (2)

1
2

Abdel Belaïd

Fonction : Auteur
PersonId : 856537

Recognition of writing and analysis of documents

Santosh K.C.

Fonction : Auteur
PersonId : 856231

Recognition of writing and analysis of documents

Vincent Poulain d'Andecy

Fonction : Auteur

Itesoft R&D

Résumé

The aim of the paper is to separate handwritten and printed text from a real document embedded with noise, graphics including annotations. Relying on run-length smoothing algorithm (RLSA), the extracted pseudo-lines and pseudo-words are used as basic blocks for classification. To handle this, a multi-class support vector machine (SVM) with Gaussian kernel performs a first labelling of each pseudo-word including the study of local neighbourhood. It then propagates the context between neighbours so that we can correct possible labelling errors. Considering running time complexity issue, we propose linear complexity methods where we use k-NN with constraint. When using a kd-tree, it is almost linearly proportional to the number of pseudo-words. The performance of our system is close to 90%, even when very small learning dataset are used, where samples are basically composed of complex administrative documents.

Domaines

Traitement du texte et du document Traitement des images [eess.IV] Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

kc_vincent_belaidMVA.pdf (939.81 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Santosh K.C. : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00799331

Soumis le : mardi 19 mars 2013-14:24:39

Dernière modification le : lundi 11 septembre 2023-17:41:19

Archivage à long terme le : dimanche 2 avril 2017-14:44:13

Dates et versions

hal-00799331 , version 1 (12-03-2013)

hal-00799331 , version 2 (19-03-2013)

Identifiants

HAL Id : hal-00799331 , version 2
ARXIV : 1303.4614

Citer

Abdel Belaïd, Santosh K.C., Vincent Poulain d'Andecy. Handwritten and Printed Text Separation in Real Document. The Thirteenth IAPR International Conference on Machine Vision Applications - 2013, May 2013, Kyoto, Japan. ⟨hal-00799331v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE LORIA LORIA-NLPKD

222 Consultations

1562 Téléchargements

Handwritten and Printed Text Separation in Real Document

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager