Probabilistic lexical generalization for French dependency parsing - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2012

Probabilistic lexical generalization for French dependency parsing

Résumé

This paper investigates the impact on French dependency parsing of lexical generalization methods beyond lemmatization and morphological analysis. A distributional thesaurus is created from a large text corpus and used for distributional clustering and WordNet automatic sense ranking. The standard approach for lexical generalization in parsing is to map a word to a single generalized class, either replacing the word with the class or adding a new feature for the class. We use a richer framework that allows for probabilistic generalization, with a word represented as a probability distribution over a space of generalized classes: lemmas, clusters, or synsets. Probabilistic lexical information is introduced into parser feature vectors by modifying the weights of lexical features. We obtain improvements in parsing accuracy with some lexical generalization configurations in experiments run on the French Treebank and two out-of-domain treebanks, with slightly better performance for the probabilistic lexical generalization approach compared to the standard single-mapping approach.
Fichier principal
Vignette du fichier
henestroza2012probabilistic.pdf (133.08 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00699675 , version 1 (02-07-2013)

Identifiants

  • HAL Id : hal-00699675 , version 1

Citer

Enrique Henestroza Anguiano, Marie Candito. Probabilistic lexical generalization for French dependency parsing. SP-Sem-MRL 2012 - Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages, Jul 2012, Jeju Island, South Korea. ⟨hal-00699675⟩
91 Consultations
58 Téléchargements

Partager

Gmail Facebook X LinkedIn More