An empirical study of maximum entropy approach for part-of-speech tagging of Vietnamese texts

Abstract : This paper presents an empirical study on the application of the maximum entropy approach for part-of-speech tagging of Vietnamese text, a language with special characteristics which largely distinguish it from occidental languages. Our best tagger explores and includes useful knowledge sources for tagging Vietnamese text and gives a 93.40%overall accuracy and a 80.69%unknown word accuracy on a test set of the Vietnamese treebank. Our tagger significantly outperforms the tagger that is being used for building the Vietnamese treebank, and as far as we are aware, this is the best tagging result ever published for the Vietnamese language.
Document type :
Conference papers
Complete list of metadatas

Cited literature [19 references]  Display  Hide  Download

https://hal.inria.fr/inria-00526139
Contributor : Phuong Le-Hong <>
Submitted on : Wednesday, October 13, 2010 - 6:29:40 PM
Last modification on : Saturday, November 17, 2018 - 12:12:02 PM
Long-term archiving on : Friday, January 14, 2011 - 3:12:50 AM

File

vnTagger.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00526139, version 1

Collections

Citation

Phuong Le-Hong, Azim Roussanaly, Thi Minh Huyen Nguyen, Mathias Rossignol. An empirical study of maximum entropy approach for part-of-speech tagging of Vietnamese texts. Traitement Automatique des Langues Naturelles - TALN 2010, ATALA (Association pour le Traitement Automatique des Langues), Jul 2010, Montréal, Canada. pp.12. ⟨inria-00526139⟩

Share

Metrics

Record views

1005

Files downloads

884