Skip to Main content Skip to Navigation
Journal articles

A Texture-based Method for Document Segmentation and Classification

Abstract : In this paper we present a hybrid approach to segment and classify contents of document images. A Document Image is segmented into three types of regions: Graphics, Text and Space. The image of a document is subdivided into blocks and for each block five GLCM (Grey Level Co-occurrence Matrix) features are extracted. Based on these features, blocks are then clustered into three groups using K-Means algorithm; connected blocks that belong to the same group are merged. The classification of groups is done using pre-learned heuristic rules. Experiments were conducted on scanned newspapers and images from MediaTeam Document Database
Document type :
Journal articles
Complete list of metadata

Cited literature [24 references]  Display  Hide  Download

https://hal.inria.fr/hal-01262352
Contributor : Coordination Episciences Iam <>
Submitted on : Tuesday, January 26, 2016 - 4:05:16 PM
Last modification on : Wednesday, October 30, 2019 - 4:34:07 PM
Long-term archiving on: : Wednesday, April 27, 2016 - 1:21:31 PM

File

arima00604.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-01262352, version 1

Collections

Citation

Ming-Wei Lin, Jules-Raymond Tapamo, Baird Ndovie. A Texture-based Method for Document Segmentation and Classification. Revue Africaine de la Recherche en Informatique et Mathématiques Appliquées, INRIA, 2007, 6, pp.49-56. ⟨hal-01262352⟩

Share

Metrics

Record views

265

Files downloads

1120