Identification automatique des relations discursives implicites à partir de corpus annotés et de données brutes

Chloé Braud 1
1 ALPAGE - Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing
Inria Paris-Rocquencourt, UPD7 - Université Paris Diderot - Paris 7
Abstract : Building discourse parsers is currently a major challenge in Natural Language Processing. The identification of the relations (such as Explanation, Contrast ...) linking spans of text in the document is the main difficulty. Especially, identifying the so-called implicit relations, that is the relations that lack a discourse connective (such as but, because . . .), is known as an hard task since it requires to take into account various factors, and because it leads to specific difficulties in a classification system. In this thesis, we use raw data to improve automatic identification of implicit relations. First, we propose to use discourse markers in order to automatically annotate new data. We use domain adaptation methods to deal with the distributional differences between automatically and manually annotated data : we report improvements for systems built on the French corpus ANNODIS and on the English corpus Penn Discourse Treebank. Then, we propose to use word representations built from raw data, which may be automatically annotated with discourse markers, in order to feed a representation of the data based on the words found in the spans of text to be linked. We report improvements on the English corpus Penn Discourse Treebank, and especially we show that this method alleviates the need for rich resources, available but for a few languages.
Document type :
Theses
Liste complète des métadonnées

Cited literature [216 references]  Display  Hide  Download

https://hal.inria.fr/tel-01256884
Contributor : Chloé Braud <>
Submitted on : Friday, January 15, 2016 - 2:39:37 PM
Last modification on : Friday, January 4, 2019 - 5:33:24 PM
Document(s) archivé(s) le : Friday, November 11, 2016 - 7:48:26 AM

Identifiers

  • HAL Id : tel-01256884, version 1

Collections

Citation

Chloé Braud. Identification automatique des relations discursives implicites à partir de corpus annotés et de données brutes. Linguistique. Universite Paris Diderot-Paris VII, 2015. Français. ⟨tel-01256884⟩

Share

Metrics

Record views

604

Files downloads

775