Skip to Main content Skip to Navigation
Conference papers

Séparation manuscrit et imprimé dans des documents administratifs complexes par utilisation de SVM et regroupement

Didier Grzejszczak 1 Yves Rangoni 1 Abdel Belaïd 1
1 READ - Recognition of writing and analysis of documents
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : This paper proposes a methodology for the segmentation of printed and handwritten zones in document images. The documents are mainly of administrative type in an unconstrained industrial framework. We have to deal with a large number each day. They can come from different clients so as to their content, layout and digitization quality vary a lot. The goal is to isolate handwritten notes from the other parts, in order to apply in a second time some dedicated processing on the printed and the handwritten layers. To achieve that, we propose a four step procedure: preprocessing, geometrical layout analysis at pseudo-word level, classification using a SVM, then post-correction with context integration allowing a better quality. The classification rates are around 90% for segmenting printed, handwritten and noisy zones.
Complete list of metadata

Cited literature [9 references]  Display  Hide  Download

https://hal.inria.fr/hal-00779237
Contributor : Abdel Belaid <>
Submitted on : Wednesday, January 23, 2013 - 6:04:18 PM
Last modification on : Friday, January 15, 2021 - 5:42:02 PM
Long-term archiving on: : Wednesday, April 24, 2013 - 3:54:31 AM

File

cifed_version_publiee_didier.p...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00779237, version 1

Collections

Citation

Didier Grzejszczak, Yves Rangoni, Abdel Belaïd. Séparation manuscrit et imprimé dans des documents administratifs complexes par utilisation de SVM et regroupement. CIFED-CORIA, Mar 2012, Bordeaux, France. ⟨hal-00779237⟩

Share

Metrics

Record views

310

Files downloads

534