Enhancing Energy Minimization Framework for Scene Text Recognition with Top-Down Cues

Anand Mishra; Karteek Alahari; C.V. Jawahar

doi:10.1016/j.cviu.2016.01.002

Article Dans Une Revue Computer Vision and Image Understanding Année : 2016

Enhancing Energy Minimization Framework for Scene Text Recognition with Top-Down Cues

(1) , (2) , (1)

1
2

Anand Mishra

Fonction : Auteur

Center for Visual Information Technology [Hyderabad]

Karteek Alahari

Fonction : Auteur
PersonId : 19670
IdHAL : karteek
ORCID : 0000-0002-1838-5936
IdRef : 196283892

Apprentissage de modèles à partir de données massives

C.V. Jawahar

Fonction : Auteur

Center for Visual Information Technology [Hyderabad]

Résumé

Recognizing scene text is a challenging problem, even more so than the recognition of scanned documents. This problem has gained significant attention from the computer vision community in recent years, and several methods based on energy minimization frameworks and deep learning approaches have been proposed. In this work, we focus on the energy minimization framework and propose a model that exploits both bottom-up and top-down cues for recognizing cropped words extracted from street images. The bottom-up cues are derived from individual character detections from an image. We build a conditional random field model on these detections to jointly model the strength of the detections and the interactions between them. These interactions are top-down cues obtained from a lexicon-based prior, i.e., language statistics. The optimal word represented by the text image is obtained by minimizing the energy function corresponding to the random field model. We evaluate our proposed algorithm extensively on a number of cropped scene text benchmark datasets, namely Street View Text, ICDAR 2003, 2011 and 2013 datasets, and IIIT 5K-word, and show better performance than comparable methods. We perform a rigorous analysis of all the steps in our approach and analyze the results. We also show that state-of-the-art convolutional neural network features can be integrated in our framework to further improve the recognition performance.

Mots clés

Random field models Scene text understanding Text recognition Lexicon priors Character recognition

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

mishraDraftRevised2.pdf (581.64 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Karteek Alahari : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01263322

Soumis le : mercredi 27 janvier 2016-16:24:05

Dernière modification le : jeudi 4 avril 2024-20:52:23

Archivage à long terme le : jeudi 28 avril 2016-11:22:09

Dates et versions

hal-01263322 , version 1 (27-01-2016)

Identifiants

HAL Id : hal-01263322 , version 1
ARXIV : 1601.03128
DOI : 10.1016/j.cviu.2016.01.002

Citer

Anand Mishra, Karteek Alahari, C.V. Jawahar. Enhancing Energy Minimization Framework for Scene Text Recognition with Top-Down Cues. Computer Vision and Image Understanding, 2016, 145, pp.30-42. ⟨10.1016/j.cviu.2016.01.002⟩. ⟨hal-01263322⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA LJK LJK_GI INRIA2 LJK-GI-THOTH

264 Consultations

146 Téléchargements

Enhancing Energy Minimization Framework for Scene Text Recognition with Top-Down Cues

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager