Structured learning with latent trees: a joint approach to coreference resolution

Emmanuel Lassalle 1
1 ALPAGE - Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing
Inria Paris-Rocquencourt, UPD7 - Université Paris Diderot - Paris 7
Résumé : This thesis explores ways to define automated coreference resolution systems by using structured machine learning techniques. We design supervised models that learn to build coreference clusters from raw text: our main objective is to get model able to process documents globally, in a structured fashion, to ensure coherent outputs. Our models are trained and evaluated on the English part of the CoNLL-2012 Shared Task annotated corpus with standard metrics. We carry out detailed comparisons of different settings so as to refine our models and design a complete end-to-end coreference resolver. Specifically, we first carry out a preliminary work on improving the way features are employed by linear models for classification: we extend existing work on separating different types of mention pairs to define more accurate classifiers of coreference links. We then define various structured models based on latent trees to learn to build clusters globally, and not only from the predictions of a mention pair classifier. We study different latent representations (various shapes and sparsity) and show empirically that the best suited structure is some restricted class of trees related to the best-first rule for selecting coreference links. We further improve this latent representation by integrating anaphoricity modelling jointly with coreference, designing a global (structured at the document level) and joint model outperforming existing models on gold mentions evaluation. We finally design a complete end-to-end resolver and evaluate the improvement obtained by our new models on detected mentions, a more realistic setting for coreference resolution.
Type de document :
Thèse
Computation and Language [cs.CL]. Univeristé Paris Diderot Paris 7, 2015. English
Liste complète des métadonnées

Littérature citée [167 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/tel-01331425
Contributeur : Emmanuel Lassalle <>
Soumis le : lundi 13 juin 2016 - 19:18:21
Dernière modification le : vendredi 25 mai 2018 - 12:02:05

Identifiants

  • HAL Id : tel-01331425, version 1

Collections

Citation

Emmanuel Lassalle. Structured learning with latent trees: a joint approach to coreference resolution. Computation and Language [cs.CL]. Univeristé Paris Diderot Paris 7, 2015. English. 〈tel-01331425〉

Partager

Métriques

Consultations de la notice

402

Téléchargements de fichiers

234