Skip to Main content Skip to Navigation
New interface
Conference papers

Can MDL Improve Unsupervised Chinese Word Segmentation?

Pierre Magistry 1 Benoît Sagot 1 
1 ALPAGE - Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing
Inria Paris-Rocquencourt, UPD7 - Université Paris Diderot - Paris 7
Abstract : It is often assumed that Minimum Descrip- tion Length (MDL) is a good criterion for unsupervised word segmentation. In this paper, we introduce a new approach to unsupervised word segmentation of Man- darin Chinese, that leads to segmentations whose Description Length is lower than what can be obtained using other algo- rithms previously proposed in the litera- ture. Suprisingly, we show that this lower Description Length does not necessarily corresponds to better segmentation results. Finally, we show that we can use very basic linguistic knowledge to coerce the MDL towards a linguistically plausible hypoth- esis and obtain better results than any pre- viously proposed method for unsupervised Chinese word segmentation with minimal human effort.
Document type :
Conference papers
Complete list of metadata

Cited literature [13 references]  Display  Hide  Download
Contributor : Pierre Magistry Connect in order to contact the contributor
Submitted on : Thursday, October 24, 2013 - 1:58:13 PM
Last modification on : Tuesday, October 25, 2022 - 6:57:35 PM
Long-term archiving on: : Friday, April 7, 2017 - 6:15:46 PM


Files produced by the author(s)


  • HAL Id : hal-00876389, version 1


Pierre Magistry, Benoît Sagot. Can MDL Improve Unsupervised Chinese Word Segmentation?. Sixth International Joint Conference on Natural Language Processing: Sighan workshop, Oct 2013, Nagoya, Japan. pp.2. ⟨hal-00876389⟩



Record views


Files downloads