Probabilistic lexical generalization for French dependency parsing

Abstract : This paper investigates the impact on French dependency parsing of lexical generalization methods beyond lemmatization and morphological analysis. A distributional thesaurus is created from a large text corpus and used for distributional clustering and WordNet automatic sense ranking. The standard approach for lexical generalization in parsing is to map a word to a single generalized class, either replacing the word with the class or adding a new feature for the class. We use a richer framework that allows for probabilistic generalization, with a word represented as a probability distribution over a space of generalized classes: lemmas, clusters, or synsets. Probabilistic lexical information is introduced into parser feature vectors by modifying the weights of lexical features. We obtain improvements in parsing accuracy with some lexical generalization configurations in experiments run on the French Treebank and two out-of-domain treebanks, with slightly better performance for the probabilistic lexical generalization approach compared to the standard single-mapping approach.
Type de document :
Communication dans un congrès
SP-Sem-MRL 2012 - Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages, Jul 2012, Jeju Island, South Korea. 2012
Liste complète des métadonnées

Littérature citée [21 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00699675
Contributeur : Enrique Henestroza Anguiano <>
Soumis le : mardi 2 juillet 2013 - 17:15:27
Dernière modification le : vendredi 25 mai 2018 - 12:02:05
Document(s) archivé(s) le : jeudi 3 octobre 2013 - 02:20:09

Fichier

henestroza2012probabilistic.pd...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00699675, version 1

Collections

Citation

Enrique Henestroza Anguiano, Marie Candito. Probabilistic lexical generalization for French dependency parsing. SP-Sem-MRL 2012 - Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages, Jul 2012, Jeju Island, South Korea. 2012. 〈hal-00699675〉

Partager

Métriques

Consultations de la notice

160

Téléchargements de fichiers

98