Line and Word Segmentation of Arabic handwritten documents using Neural Networks - Archive ouverte HAL Access content directly
Reports (Research Report) Year : 2018

Line and Word Segmentation of Arabic handwritten documents using Neural Networks

Segmentation en lignes et en mots de documents arabes manuscrits utilisant des modèles neuronnaux

(1) , (1)
1

Abstract

Segmenting documents into lines and words is a very critical step before the recognition task. It is even more difficult for ancient and calligraphic writings, as is often the case in Arabic manuscript documents. In this work, we propose a new attempt to segment documents into lines and words, using deep learning. For line segmentation, we use an RU-net which allows a pixel-wise classification, thus separating pixels of lines from the background pixels. For segmenting lines into words, not having a ground truth for the word segmentation (at the image level), we use the line transcription to find the words. A BLSTM-CTC is used to achieve this mapping directly between the transcription and line image, without any segmentation. A CNN precedes this sequence to extract the features and feeds the BLSTM with the essential of the line image. Tested on KHATT Arabic database, the system achieves good performance that is of the order of 96.7\% correct lines and 80.1\% correct words.
Fichier principal
Vignette du fichier
ArabicSegmentation.pdf (1.13 Mo) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-01910559 , version 1 (05-11-2018)

Identifiers

  • HAL Id : hal-01910559 , version 1

Cite

Ahlem Belabiod, Abdel Belaïd. Line and Word Segmentation of Arabic handwritten documents using Neural Networks. [Research Report] LORIA - Université de Lorraine; READ. 2018. ⟨hal-01910559⟩
230 View
786 Download

Share

Gmail Facebook Twitter LinkedIn More