Modernizing historical Slovene words with character-based SMT - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2013

Modernizing historical Slovene words with character-based SMT

Résumé

We propose a language-independent word normalization method exemplified on modernizing historical Slovene words. Our method relies on character-based statistical machine translation and uses only shallow knowledge. We present the relevant lexicons and two experiments. In one, we use a lexicon of historical word--contemporary word pairs and a list of contemporary words; in the other, we only use a list of historical words and one of contemporary ones. We show that both methods produce significantly better results than the baseline.
Fichier principal
Vignette du fichier
13-scherrer-modernize.pdf (127.16 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00838575 , version 1 (26-06-2013)

Identifiants

  • HAL Id : hal-00838575 , version 1

Citer

Yves Scherrer, Tomaž Erjavec. Modernizing historical Slovene words with character-based SMT. BSNLP 2013 - 4th Biennial Workshop on Balto-Slavic Natural Language Processing, Aug 2013, Sofia, Bulgaria. ⟨hal-00838575⟩
365 Consultations
968 Téléchargements

Partager

Gmail Facebook X LinkedIn More