Information Extraction with Active Learning: A Case Study in Legal Text

Cristian Cardellino; Serena Villata; Laura Alonso Alemany; Elena Cabrio

doi:10.1007/978-3-319-18117-2_36

Conference Papers Year : 2015

Information Extraction with Active Learning: A Case Study in Legal Text

(1) , (2) , (1) , (2)

1
2

Cristian Cardellino

Function : Correspondent author
PersonId : 1323423
IdHAL : ccardellino
ORCID : 0009-0000-1129-8330

Connectez-vous pour contacter l'auteur

Universidad Nacional de Córdoba [Argentina]

Serena Villata

Function : Author
PersonId : 9409
IdHAL : serena-villata
ORCID : 0000-0003-3495-493X
IdRef : 200242911

Web-Instrumented Man-Machine Interactions, Communities and Semantics

Laura Alonso Alemany

Function : Author

Universidad Nacional de Córdoba [Argentina]

Elena Cabrio

Function : Author

Web-Instrumented Man-Machine Interactions, Communities and Semantics

Abstract

Active learning has been successfully applied to a number of NLP tasks. In this paper, we present a study on Information Extraction for natural language licenses that need to be translated to RDF. The final purpose of our work is to automatically extract from a natural language document specifying a certain license a machine-readable description of the terms of use and reuse identified in such license. This task presents some peculiarities that make it specially interesting to study: highly repetitive text, few annotated or unannotated examples available, and very fine precision needed. In this paper we compare different active learning settings for this particular application. We show that the most straightforward approach to instance selection, uncertainty sampling, does not provide a good performance in this setting, performing even worse than passive learning. Density-based methods are the usual alternative to uncertainty sampling, in contexts with very few labelled instances. We show that we can obtain a similar effect to that of density-based methods using uncertainty sampling, by just reversing the ranking criterion, and choosing the most certain instead of the most uncertain instances.

Keywords

active learning information extraction RDF licences

Domains

Computer Science [cs] Artificial Intelligence [cs.AI] Document and Text Processing

Fichier principal

paper_237.pdf (899.01 Ko)

Origin : Files produced by the author(s)

Elena Cabrio : Connect in order to contact the contributor

https://inria.hal.science/hal-01171856

Submitted on : Wednesday, July 15, 2015-11:02:15 AM

Last modification on : Monday, February 26, 2024-11:22:08 AM

Long-term archiving on: Wednesday, April 26, 2017-12:00:43 AM

Dates and versions

hal-01171856 , version 1 (15-07-2015)

Identifiers

HAL Id : hal-01171856 , version 1
DOI : 10.1007/978-3-319-18117-2_36

Cite

Cristian Cardellino, Serena Villata, Laura Alonso Alemany, Elena Cabrio. Information Extraction with Active Learning: A Case Study in Legal Text. CICLing 2015 - Proceedings of the 16th International Conference on Intelligence Text Processing and Computational Linguistics, Apr 2015, Il Cairo, Egypt. ⟨10.1007/978-3-319-18117-2_36⟩. ⟨hal-01171856⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA I3S WIMMICS INRIA2 UNIV-COTEDAZUR

348 View

520 Download

Information Extraction with Active Learning: A Case Study in Legal Text

Abstract

Keywords

Domains

Dates and versions

Identifiers

Cite

Export

Collections

Altmetric

Share