Boosting for Model Selection in Syntactic Parsing

Abstract : In this work we present our approach to model selection for statistical parsing via boosting. The method is used to target the inefficiency of current feature selection methods, in that it allows a constant feature selection time at each iteration rather than the increasing selection time of current standard forward wrapper methods. With the aim of performing feature selection on very high dimensional data, in particular for parsing morphologically rich languages, we test the approach, which uses the multiclass AdaBoost algorithm SAMME (Zhu et al., 2006), on French data from the French Treebank, using a multilingual discriminative constituency parser (Crabbé, 2014). Current results show that the method is indeed far more efficient than a naïve method, and the performance of the models produced is promising, with F-scores comparable to carefully selected manual models. We provide some perspectives to improve on these performances in future work.
Type de document :
Mémoires d'étudiants -- Hal-inria+
Machine Learning [cs.LG]. 2015
Liste complète des métadonnées

Littérature citée [59 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01258945
Contributeur : Rachel Bawden <>
Soumis le : mardi 19 janvier 2016 - 16:40:44
Dernière modification le : mardi 9 janvier 2018 - 09:50:02
Document(s) archivé(s) le : vendredi 11 novembre 2016 - 13:13:04

Identifiants

  • HAL Id : hal-01258945, version 1

Collections

Citation

Rachel Bawden. Boosting for Model Selection in Syntactic Parsing. Machine Learning [cs.LG]. 2015. 〈hal-01258945〉

Partager

Métriques

Consultations de la notice

88

Téléchargements de fichiers

31