Skip to Main content Skip to Navigation

Line and Word Segmentation of Arabic handwritten documents using Neural Networks

Ahlem Belabiod 1 Abdel Belaïd 1
1 READ - Recognition of writing and analysis of documents
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Segmenting documents into lines and words is a very critical step before the recognition task. It is even more difficult for ancient and calligraphic writings, as is often the case in Arabic manuscript documents. In this work, we propose a new attempt to segment documents into lines and words, using deep learning. For line segmentation, we use an RU-net which allows a pixel-wise classification, thus separating pixels of lines from the background pixels. For segmenting lines into words, not having a ground truth for the word segmentation (at the image level), we use the line transcription to find the words. A BLSTM-CTC is used to achieve this mapping directly between the transcription and line image, without any segmentation. A CNN precedes this sequence to extract the features and feeds the BLSTM with the essential of the line image. Tested on KHATT Arabic database, the system achieves good performance that is of the order of 96.7\% correct lines and 80.1\% correct words.
Document type :
Complete list of metadata

Cited literature [29 references]  Display  Hide  Download
Contributor : Abdel Belaid <>
Submitted on : Monday, November 5, 2018 - 3:39:00 PM
Last modification on : Thursday, June 17, 2021 - 3:02:33 AM
Long-term archiving on: : Wednesday, February 6, 2019 - 2:59:19 PM


Files produced by the author(s)


  • HAL Id : hal-01910559, version 1



Ahlem Belabiod, Abdel Belaïd. Line and Word Segmentation of Arabic handwritten documents using Neural Networks. [Research Report] LORIA - Université de Lorraine; READ. 2018. ⟨hal-01910559⟩



Record views


Files downloads