Skip to Main content Skip to Navigation
New interface
Conference papers

Modernizing historical Slovene words with character-based SMT

Abstract : We propose a language-independent word normalization method exemplified on modernizing historical Slovene words. Our method relies on character-based statistical machine translation and uses only shallow knowledge. We present the relevant lexicons and two experiments. In one, we use a lexicon of historical word--contemporary word pairs and a list of contemporary words; in the other, we only use a list of historical words and one of contemporary ones. We show that both methods produce significantly better results than the baseline.
Complete list of metadata

Cited literature [19 references]  Display  Hide  Download
Contributor : Yves Scherrer Connect in order to contact the contributor
Submitted on : Wednesday, June 26, 2013 - 9:23:23 AM
Last modification on : Tuesday, October 25, 2022 - 6:46:25 PM
Long-term archiving on: : Wednesday, April 5, 2017 - 4:38:49 AM


Files produced by the author(s)


  • HAL Id : hal-00838575, version 1


Yves Scherrer, Tomaž Erjavec. Modernizing historical Slovene words with character-based SMT. BSNLP 2013 - 4th Biennial Workshop on Balto-Slavic Natural Language Processing, Aug 2013, Sofia, Bulgaria. ⟨hal-00838575⟩



Record views


Files downloads