Active Learning for Interactive Relation Extraction in a French Newspaper's Articles - Inria - Institut national de recherche en sciences et technologies du numérique Access content directly
Conference Papers Year : 2021

Active Learning for Interactive Relation Extraction in a French Newspaper's Articles

Abstract

Relation extraction is a subtask of natural language processing that has seen many improvements in recent years, with the advent of complex pre-trained architectures. Many of these state-of-the-art approaches are tested against benchmarks with labelled sentences containing tagged entities, and require important pretraining and fine-tuning on task-specific data. However, in a real use-case scenario such as in a newspaper company mostly dedicated to local information, relations are of varied, highly specific type, with virtually no annotated data for such relations, and many entities co-occur in a sentence without being related. We question the use of supervised state-of-the-art models in such a context, where resources such as time, computing power and human annotators are limited. To adapt to these constraints, we experiment with an active-learning based relation extraction pipeline, consisting of a binary LSTM-based lightweight model for detecting the relations that do exist, and a state-of-the-art model for relation classification. We compare several choices for classification models in this scenario, from basic word embedding averaging, to graph neural networks and Bert-based ones, as well as several active learning acquisition strategies, in order to find the most costefficient yet accurate approach in our French largest daily newspaper company's use case.
Fichier principal
Vignette du fichier
ranlp2021.pdf (691.28 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-03371917 , version 1 (09-10-2021)

Identifiers

  • HAL Id : hal-03371917 , version 1

Cite

Cyrielle Mallart, Michel Le Nouy, Guillaume Gravier, Pascale Sébillot. Active Learning for Interactive Relation Extraction in a French Newspaper's Articles. RANLP 2021 - Recent Advances in Natural Language Processing, Sep 2021, Online, Bulgaria. pp.886-894. ⟨hal-03371917⟩
104 View
209 Download

Share

Gmail Facebook X LinkedIn More