Abstract : The work reported in this paper aims at performance optimization in the digitization of documents pertaining to the cultural heritage domain. A hybrid method is roposed, combining statistical classification algorithms and linguistic knowledge to automatize post-OCR error detection and correction. The current paper deals with the integration of linguistic modules and their impact on error detection.
https://hal.inria.fr/hal-01022402
Contributor : Kata Gábor <>
Submitted on : Thursday, July 10, 2014 - 1:07:53 PM Last modification on : Saturday, March 28, 2020 - 2:19:40 AM Long-term archiving on: : Friday, October 10, 2014 - 11:41:16 AM
Kata Gábor, Benoît Sagot. Automated Error Detection in Digitized Cultural Heritage Documents. EACL 2014 Workshop on Language Technology for Cultural Heritage, Apr 2014, Göteborg, Sweden. ⟨hal-01022402⟩