Automated Error Detection in Digitized Cultural Heritage Documents

Kata Gábor 1 Benoît Sagot 1
1 ALPAGE - Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing
Inria Paris-Rocquencourt, UPD7 - Université Paris Diderot - Paris 7
Abstract : The work reported in this paper aims at performance optimization in the digitization of documents pertaining to the cultural heritage domain. A hybrid method is roposed, combining statistical classification algorithms and linguistic knowledge to automatize post-OCR error detection and correction. The current paper deals with the integration of linguistic modules and their impact on error detection.
Complete list of metadatas

Cited literature [29 references]  Display  Hide  Download

https://hal.inria.fr/hal-01022402
Contributor : Kata Gábor <>
Submitted on : Thursday, July 10, 2014 - 1:07:53 PM
Last modification on : Thursday, August 29, 2019 - 2:24:09 PM
Long-term archiving on : Friday, October 10, 2014 - 11:41:16 AM

File

W14-0608.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01022402, version 1

Collections

Citation

Kata Gábor, Benoît Sagot. Automated Error Detection in Digitized Cultural Heritage Documents. EACL 2014 Workshop on Language Technology for Cultural Heritage, Apr 2014, Göteborg, Sweden. ⟨hal-01022402⟩

Share

Metrics

Record views

543

Files downloads

486