Segmentation of ancient Arabic documents

Abdel Belaïd 1 Nazih Ouwayed 1
LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : This chapter addresses the problem of ancient Arabic document segmentation. As ancient documents neither have a real physical structure nor logical one, the segmentation will be limited to textual area or to line extraction in the areas. Although this type of segmentation appears quite simple, its implementation remains a challenging task. This is due to the state of the old document where the image is of low quality, the lines are not straight, sinuous and connected. Given the failure of traditional methods, we proposed a method for line extraction in multi-oriented documents. The method is based on an image meshing that allows it to detect locally and safely the orientations. These orientations are then extended to larger areas. The orientation estimation uses the energy distribution of Cohen's class, more accurate than the projection method. Then, the method exploits the projection peaks to follow the connected components forming text lines. The approach ends with a final separation of connected lines, based on the exploitation of the morphology of terminal letters.
Type de document :
Chapitre d'ouvrage
Volker Märgner and Haikal El Abed. Guide to OCR for Arabic Scripts, Springer, 2011
Liste complète des métadonnées

Littérature citée [48 références]  Voir  Masquer  Télécharger
Contributeur : Abdel Belaid <>
Soumis le : vendredi 25 mars 2011 - 10:22:25
Dernière modification le : mardi 24 avril 2018 - 13:36:09
Document(s) archivé(s) le : dimanche 26 juin 2011 - 02:38:57


Fichiers produits par l'(les) auteur(s)


  • HAL Id : inria-00579840, version 1



Abdel Belaïd, Nazih Ouwayed. Segmentation of ancient Arabic documents. Volker Märgner and Haikal El Abed. Guide to OCR for Arabic Scripts, Springer, 2011. 〈inria-00579840〉



Consultations de la notice


Téléchargements de fichiers