Skip to Main content Skip to Navigation

Structured learning with latent trees: a joint approach to coreference resolution

Emmanuel Lassalle 1
1 ALPAGE - Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing
Inria Paris-Rocquencourt, UPD7 - Université Paris Diderot - Paris 7
Résumé : This thesis explores ways to define automated coreference resolution systems by using structured machine learning techniques. We design supervised models that learn to build coreference clusters from raw text: our main objective is to get model able to process documents globally, in a structured fashion, to ensure coherent outputs. Our models are trained and evaluated on the English part of the CoNLL-2012 Shared Task annotated corpus with standard metrics. We carry out detailed comparisons of different settings so as to refine our models and design a complete end-to-end coreference resolver. Specifically, we first carry out a preliminary work on improving the way features are employed by linear models for classification: we extend existing work on separating different types of mention pairs to define more accurate classifiers of coreference links. We then define various structured models based on latent trees to learn to build clusters globally, and not only from the predictions of a mention pair classifier. We study different latent representations (various shapes and sparsity) and show empirically that the best suited structure is some restricted class of trees related to the best-first rule for selecting coreference links. We further improve this latent representation by integrating anaphoricity modelling jointly with coreference, designing a global (structured at the document level) and joint model outperforming existing models on gold mentions evaluation. We finally design a complete end-to-end resolver and evaluate the improvement obtained by our new models on detected mentions, a more realistic setting for coreference resolution.
Document type :
Complete list of metadata

Cited literature [167 references]  Display  Hide  Download
Contributor : Emmanuel Lassalle Connect in order to contact the contributor
Submitted on : Monday, June 13, 2016 - 7:18:21 PM
Last modification on : Wednesday, January 12, 2022 - 3:46:25 AM


  • HAL Id : tel-01331425, version 1



Emmanuel Lassalle. Structured learning with latent trees: a joint approach to coreference resolution. Computation and Language [cs.CL]. Univeristé Paris Diderot Paris 7, 2015. English. ⟨tel-01331425⟩



Les métriques sont temporairement indisponibles