Skip to Main content Skip to Navigation
Conference papers

A Maximum Entropy Approach to Sentence Boundary Detection of Vietnamese Texts

Abstract : We present for the first time a sentence boundary detection system for identifying sentence boundaries in Vietnamese texts. The system is based on a maximum entropy model. The training procedure requires no hand-crafted rules, lexicon, or domain-specific information. Given a corpus annotated with sentence boundaries, the model learns to classify each occurrence of potential end-of-sentence punctuations as either a valid or invalid sentence boundary. Performance of the system on a Vietnamese corpus achieved a good recall ratio of about 95%. The approach has been implemented to create a software tool named vnSentDetector, a plug-in of the open source software framework vnToolkit which is intended to be a general framework integrating useful tools for processing of Vietnamese texts.
Document type :
Conference papers
Complete list of metadata

Cited literature [24 references]  Display  Hide  Download

https://hal.inria.fr/inria-00334762
Contributor : Phuong Le-Hong <>
Submitted on : Monday, October 27, 2008 - 5:47:57 PM
Last modification on : Friday, February 26, 2021 - 3:28:08 PM
Long-term archiving on: : Monday, June 7, 2010 - 7:30:46 PM

File

rivf2008.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : inria-00334762, version 1

Collections

Citation

Hong Phuong Le, Tuong Vinh Ho. A Maximum Entropy Approach to Sentence Boundary Detection of Vietnamese Texts. IEEE International Conference on Research, Innovation and Vision for the Future - RIVF 2008, Jul 2008, Ho Chi Minh City, Vietnam. ⟨inria-00334762⟩

Share

Metrics

Record views

323

Files downloads

1006