Skip to Main content Skip to Navigation
Book sections

Segmentation of ancient Arabic documents

Abdel Belaïd 1 Nazih Ouwayed 1 
1 READ - READ
LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : This chapter addresses the problem of ancient Arabic document segmentation. As ancient documents neither have a real physical structure nor logical one, the segmentation will be limited to textual area or to line extraction in the areas. Although this type of segmentation appears quite simple, its implementation remains a challenging task. This is due to the state of the old document where the image is of low quality, the lines are not straight, sinuous and connected. Given the failure of traditional methods, we proposed a method for line extraction in multi-oriented documents. The method is based on an image meshing that allows it to detect locally and safely the orientations. These orientations are then extended to larger areas. The orientation estimation uses the energy distribution of Cohen's class, more accurate than the projection method. Then, the method exploits the projection peaks to follow the connected components forming text lines. The approach ends with a final separation of connected lines, based on the exploitation of the morphology of terminal letters.
Complete list of metadata

Cited literature [48 references]  Display  Hide  Download

https://hal.inria.fr/inria-00579840
Contributor : Abdel Belaid Connect in order to contact the contributor
Submitted on : Friday, March 25, 2011 - 10:22:25 AM
Last modification on : Friday, February 26, 2021 - 3:28:07 PM
Long-term archiving on: : Sunday, June 26, 2011 - 2:38:57 AM

File

author.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00579840, version 1

Collections

Citation

Abdel Belaïd, Nazih Ouwayed. Segmentation of ancient Arabic documents. Volker Märgner and Haikal El Abed. Guide to OCR for Arabic Scripts, Springer, 2011. ⟨inria-00579840⟩

Share

Metrics

Record views

177

Files downloads

319