Recognition-based Approach of Numeral Extraction in Handwritten Chemistry Documents using Contextual Knowledge - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2016

Recognition-based Approach of Numeral Extraction in Handwritten Chemistry Documents using Contextual Knowledge

Résumé

This paper presents a complete procedure that uses contextual and syntactic information to identify and recognize amount fields in the table regions of chemistry documents. The proposed method is composed of two main modules. Firstly, a structural analysis based on connected component (CC) dimensions and positions identifies some special symbols and clusters other CCs into three groups: fragment of characters, isolated characters or connected characters. Then, a specific processing is performed on each group of CCs. The fragment of characters are merged with the nearest character or string using geometric relationship based rules. The characters are sent to a recognition module to identify the numeral components. For the connected characters, the final decision on the string nature (numeric or non-numeric) is made based on a global score computed on the full string using the height regularity property and the recognition probabilities of its segmented fragments. Finally, a simple syntactic verification at table row level is conducted in order to correct eventual errors. The experimental tests are carried out on real-world chemistry documents provided by our industrial partner eNovalys. The obtained results show the effectiveness of the proposed system in extracting amount fields.
Fichier principal
Vignette du fichier
GHANMI_DAS-1.pdf (987.68 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01321269 , version 1 (25-05-2016)

Identifiants

  • HAL Id : hal-01321269 , version 1

Citer

Nabil Ghanmi, Abdel Belaid. Recognition-based Approach of Numeral Extraction in Handwritten Chemistry Documents using Contextual Knowledge. 11th IAPR International workshop on Document Analysis Systems, Apr 2016, Santorini, Greece. ⟨hal-01321269⟩
134 Consultations
189 Téléchargements

Partager

Gmail Facebook X LinkedIn More