Probabilistic lexical generalization for French dependency parsing

Abstract : This paper investigates the impact on French dependency parsing of lexical generalization methods beyond lemmatization and morphological analysis. A distributional thesaurus is created from a large text corpus and used for distributional clustering and WordNet automatic sense ranking. The standard approach for lexical generalization in parsing is to map a word to a single generalized class, either replacing the word with the class or adding a new feature for the class. We use a richer framework that allows for probabilistic generalization, with a word represented as a probability distribution over a space of generalized classes: lemmas, clusters, or synsets. Probabilistic lexical information is introduced into parser feature vectors by modifying the weights of lexical features. We obtain improvements in parsing accuracy with some lexical generalization configurations in experiments run on the French Treebank and two out-of-domain treebanks, with slightly better performance for the probabilistic lexical generalization approach compared to the standard single-mapping approach.
Document type :
Conference papers
Complete list of metadatas

Cited literature [21 references]  Display  Hide  Download

https://hal.inria.fr/hal-00699675
Contributor : Enrique Henestroza Anguiano <>
Submitted on : Tuesday, July 2, 2013 - 5:15:27 PM
Last modification on : Friday, January 4, 2019 - 5:33:24 PM
Long-term archiving on : Thursday, October 3, 2013 - 2:20:09 AM

File

henestroza2012probabilistic.pd...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00699675, version 1

Collections

Citation

Enrique Henestroza Anguiano, Marie Candito. Probabilistic lexical generalization for French dependency parsing. SP-Sem-MRL 2012 - Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages, Jul 2012, Jeju Island, South Korea. ⟨hal-00699675⟩

Share

Metrics

Record views

179

Files downloads

130