Identification automatique des relations discursives "implicites" à partir de données annotées et de corpus bruts

Chloé Braud 1 Pascal Denis 2
1 ALPAGE - Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing
Inria Paris-Rocquencourt, UPD7 - Université Paris Diderot - Paris 7
2 MAGNET - Machine Learning in Information Networks
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe
Abstract : This paper presents a system for identifying \og implicit\fg discourse relations (that is, relations that are not marked by a discourse connective). Given the little amount of available annotated data for this task, our system also resorts to additional automatically labeled data wherein unambiguous connectives have been suppressed and used as relation labels, a method introduced by [Marcu & Echihabi 2002]. As shown by [Sporleder & Lascarides 2008] for English, this approach doesn't generalize well to implicit relations as annotated by humans. We show that the same conclusion applies to French due to important distribution differences between the two types of data. In consequence, we propose various simple methods, all inspired from work on domain adaptation, with the aim of better combining annotated data and artificial data. We evaluate these methods through various experiments carried out on the ANNODIS corpus: our best system reaches a labeling accuracy of 45.6%, corresponding to a 5.9% significant gain over a system solely trained on manually labeled data.
Complete list of metadatas

Cited literature [19 references]  Display  Hide  Download

https://hal.inria.fr/hal-00830983
Contributor : Chloé Braud <>
Submitted on : Thursday, June 6, 2013 - 11:09:40 AM
Last modification on : Thursday, February 21, 2019 - 10:52:55 AM
Long-term archiving on : Tuesday, April 4, 2017 - 5:46:43 PM

File

identificationAuto-Braud-Denis...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00830983, version 1

Citation

Chloé Braud, Pascal Denis. Identification automatique des relations discursives "implicites" à partir de données annotées et de corpus bruts. TALN - 20ème conférence du Traitement Automatique du Langage Naturel 2013, Jun 2013, Sables d'Olonne, France. pp.104-117. ⟨hal-00830983⟩

Share

Metrics

Record views

769

Files downloads

609