HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Recognition-based Approach of Numeral Extraction in Handwritten Chemistry Documents using Contextual Knowledge

Nabil Ghanmi 1 Abdel Belaid 1
1 READ - Recognition of writing and analysis of documents
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : This paper presents a complete procedure that uses contextual and syntactic information to identify and recognize amount fields in the table regions of chemistry documents. The proposed method is composed of two main modules. Firstly, a structural analysis based on connected component (CC) dimensions and positions identifies some special symbols and clusters other CCs into three groups: fragment of characters, isolated characters or connected characters. Then, a specific processing is performed on each group of CCs. The fragment of characters are merged with the nearest character or string using geometric relationship based rules. The characters are sent to a recognition module to identify the numeral components. For the connected characters, the final decision on the string nature (numeric or non-numeric) is made based on a global score computed on the full string using the height regularity property and the recognition probabilities of its segmented fragments. Finally, a simple syntactic verification at table row level is conducted in order to correct eventual errors. The experimental tests are carried out on real-world chemistry documents provided by our industrial partner eNovalys. The obtained results show the effectiveness of the proposed system in extracting amount fields.
Document type :
Conference papers
Complete list of metadata

Cited literature [13 references]  Display  Hide  Download

https://hal.inria.fr/hal-01321269
Contributor : Nabil Ghanmi Connect in order to contact the contributor
Submitted on : Wednesday, May 25, 2016 - 12:14:01 PM
Last modification on : Wednesday, November 3, 2021 - 7:57:27 AM
Long-term archiving on: : Friday, August 26, 2016 - 10:45:14 AM

File

GHANMI_DAS-1.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01321269, version 1

Collections

Citation

Nabil Ghanmi, Abdel Belaid. Recognition-based Approach of Numeral Extraction in Handwritten Chemistry Documents using Contextual Knowledge. 11th IAPR International workshop on Document Analysis Systems, Apr 2016, Santorini, Greece. ⟨hal-01321269⟩

Share

Metrics

Record views

130

Files downloads

167