Skip to Main content Skip to Navigation
New interface
Conference papers

Probabilistic lexical generalization for French dependency parsing

Abstract : This paper investigates the impact on French dependency parsing of lexical generalization methods beyond lemmatization and morphological analysis. A distributional thesaurus is created from a large text corpus and used for distributional clustering and WordNet automatic sense ranking. The standard approach for lexical generalization in parsing is to map a word to a single generalized class, either replacing the word with the class or adding a new feature for the class. We use a richer framework that allows for probabilistic generalization, with a word represented as a probability distribution over a space of generalized classes: lemmas, clusters, or synsets. Probabilistic lexical information is introduced into parser feature vectors by modifying the weights of lexical features. We obtain improvements in parsing accuracy with some lexical generalization configurations in experiments run on the French Treebank and two out-of-domain treebanks, with slightly better performance for the probabilistic lexical generalization approach compared to the standard single-mapping approach.
Document type :
Conference papers
Complete list of metadata

Cited literature [21 references]  Display  Hide  Download
Contributor : Enrique Henestroza Anguiano Connect in order to contact the contributor
Submitted on : Tuesday, July 2, 2013 - 5:15:27 PM
Last modification on : Wednesday, October 26, 2022 - 5:20:56 PM
Long-term archiving on: : Thursday, October 3, 2013 - 2:20:09 AM


Files produced by the author(s)


  • HAL Id : hal-00699675, version 1


Enrique Henestroza Anguiano, Marie Candito. Probabilistic lexical generalization for French dependency parsing. SP-Sem-MRL 2012 - Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages, Jul 2012, Jeju Island, South Korea. ⟨hal-00699675⟩



Record views


Files downloads