Line and Word Segmentation of Arabic handwritten documents using Neural Networks - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Rapport (Rapport De Recherche) Année : 2018

Line and Word Segmentation of Arabic handwritten documents using Neural Networks

Segmentation en lignes et en mots de documents arabes manuscrits utilisant des modèles neuronnaux

Résumé

Segmenting documents into lines and words is a very critical step before the recognition task. It is even more difficult for ancient and calligraphic writings, as is often the case in Arabic manuscript documents. In this work, we propose a new attempt to segment documents into lines and words, using deep learning. For line segmentation, we use an RU-net which allows a pixel-wise classification, thus separating pixels of lines from the background pixels. For segmenting lines into words, not having a ground truth for the word segmentation (at the image level), we use the line transcription to find the words. A BLSTM-CTC is used to achieve this mapping directly between the transcription and line image, without any segmentation. A CNN precedes this sequence to extract the features and feeds the BLSTM with the essential of the line image. Tested on KHATT Arabic database, the system achieves good performance that is of the order of 96.7\% correct lines and 80.1\% correct words.
Fichier principal
Vignette du fichier
ArabicSegmentation.pdf (1.13 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01910559 , version 1 (05-11-2018)

Identifiants

  • HAL Id : hal-01910559 , version 1

Citer

Ahlem Belabiod, Abdel Belaïd. Line and Word Segmentation of Arabic handwritten documents using Neural Networks. [Research Report] LORIA - Université de Lorraine; READ. 2018. ⟨hal-01910559⟩
277 Consultations
991 Téléchargements

Partager

Gmail Facebook X LinkedIn More