Text/graphic separation using a sparse representation with multi-learned dictionaries

Thanh Ha Do; Salvatore Tabbone; Oriol Ramos Terrades

Communication Dans Un Congrès Année : 2012

Text/graphic separation using a sparse representation with multi-learned dictionaries

(1) , (1) , (2)

1
2

Thanh Ha Do

Fonction : Auteur correspondant
PersonId : 934484

Connectez-vous pour contacter l'auteur

Querying Graphics through Analysis and Recognition

Salvatore Tabbone

Fonction : Auteur
PersonId : 740104
IdHAL : salvatore-tabbone
ORCID : 0000-0002-0024-1280

Querying Graphics through Analysis and Recognition

Oriol Ramos Terrades

Fonction : Auteur
PersonId : 934485

Computer Vision Center (Centre de visio per computador)

Résumé

In this paper, we propose a new approach to extract text regions from graphical documents. In our method, we first empirically construct two sequences of learned dictionaries for the text and graphical parts respectively. Then, we compute the sparse representations of all different sizes and non-overlapped document patches in these learned dictionaries. Based on these representations, each patch can be classified into the text or graphic category by comparing its reconstruction errors. Same-sized patches in one category are then merged together to define the corresponding text or graphic layers which are combined to createfinal text/graphic layer. Finally, in a post-processing step, text regions are further filtered out by using some learned thresholds.

Mots clés

Graphics Recognition Layout Analysis Document Understanding

Domaines

Traitement des images [eess.IV]

Fichier principal

DO_extractText_SparseRepresentation.pdf (357.98 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Thanh Ha Do : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00759554

Soumis le : mardi 4 décembre 2012-09:15:57

Dernière modification le : mercredi 6 mars 2024-09:16:32

Archivage à long terme le : mardi 5 mars 2013-03:49:25

Dates et versions

hal-00759554 , version 1 (04-12-2012)

Identifiants

HAL Id : hal-00759554 , version 1

Citer

Thanh Ha Do, Salvatore Tabbone, Oriol Ramos Terrades. Text/graphic separation using a sparse representation with multi-learned dictionaries. 21st International Conference on Pattern Recognition - ICPR 2012, Nov 2012, Tsukuba, Japan. ⟨hal-00759554⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE LORIA LORIA-NLPKD

203 Consultations

278 Téléchargements

Text/graphic separation using a sparse representation with multi-learned dictionaries

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager