95 articles 

inria-00104895, version 1

Context Processing to Read Text on Damaged Wooden Tablets

Akihito Kitadai () 1, Kazu Nishijima () 1, Kei Saito () 1, Masaki Nakagawa 1, Hajime Baba 2, Akihiro Watanabe 2

Tenth International Workshop on Frontiers in Handwriting Recognition (2006)

Abstract: This paper describes context processing to present candidates for damaged scripts on wooden tablets (mokkans). Since mokkans excavated from old strata have been damaged, even archeologists can hardly read scripts on mokkans. Very often, ink in several areas are faded out or completely lost, some characters might be misrecognized based on which other characters must be read. The context processing extends the Aho-Corasick method to allow self-transition and presents candidates even for scripts with lost ink and misrecognized characters. For evaluation, we employed 4,041 place names in Japan at the 8th century as the vocabulary. Each place name consists of 9 to 11 characters. Test keywords were prepared with 1 to 6 characters lost and 0 to 2 characters replaced by others from the vocabulary. Even for those with 5 characters lost and one character is replaced, the method nominates correct names in the top 10 candidates with 71.7% correctness.

  • 1:  Tokyo University of Agriculture and Technology
  • Tokyo University of Agriculture and Technology
  • 2:  National Research Institute for Cultural Properties
  • National Research Institute for Cultural Properties
  • Domain : Computer Science/Document and Text Processing
    Computer Science/Computer Vision and Pattern Recognition
  • Keywords : Context analysis – Historical document – Information retrieval method – Aho-Corasick method
  • Comment : http://www.suvisoft.com
 
  • inria-00104895, version 1
  • oai:hal.inria.fr:inria-00104895
  • From: 
  • Submitted on: Monday, 9 October 2006 16:23:01
  • Updated on: Monday, 9 October 2006 16:28:16