Modernizing historical Slovene words with character-based SMT

Abstract : We propose a language-independent word normalization method exemplified on modernizing historical Slovene words. Our method relies on character-based statistical machine translation and uses only shallow knowledge. We present the relevant lexicons and two experiments. In one, we use a lexicon of historical word--contemporary word pairs and a list of contemporary words; in the other, we only use a list of historical words and one of contemporary ones. We show that both methods produce significantly better results than the baseline.
Complete list of metadatas

Cited literature [19 references]  Display  Hide  Download

https://hal.inria.fr/hal-00838575
Contributor : Yves Scherrer <>
Submitted on : Wednesday, June 26, 2013 - 9:23:23 AM
Last modification on : Tuesday, July 9, 2019 - 1:16:21 AM
Long-term archiving on : Wednesday, April 5, 2017 - 4:38:49 AM

File

13-scherrer-modernize.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00838575, version 1

Collections

Citation

Yves Scherrer, Tomaž Erjavec. Modernizing historical Slovene words with character-based SMT. BSNLP 2013 - 4th Biennial Workshop on Balto-Slavic Natural Language Processing, Aug 2013, Sofia, Bulgaria. ⟨hal-00838575⟩

Share

Metrics

Record views

539

Files downloads

1012