Improvements in Information Extraction in Legal Text by Active Learning

Abstract : Managing licensing information and data rights is becoming a crucial issue in the Linked (Open) Data scenario. An open problem in this scenario is how to associate machine-readable licenses specifications to the data, so that automated approaches to treat such information can be fruitfully exploited to avoid data misuse. This means that we need a way to automatically extract from a natural language document specifying a certain license a machine-readable description of the terms of use and reuse identified in such license. Ontology-based Information Extraction is crucial to translate natural language documents into Linked Data. This connection supports consumers in navigating documents and semantically related data. However , the performances of automated information extraction systems are far from being perfect, and rely heavily on human intervention, either to create heuristics, to annotate examples for inferring models, or to interpret or validate patterns emerging from data. In this paper, we apply different Active Learning strategies to Information Extraction (IE) from licenses in English, with highly repetitive text, few annotated or unannotated examples available, and very fine precision needed. We show that the most popular approach to active learning, i.e., uncertainty sampling for instance selection, does not provide a good performance in this setting. We show that we can obtain a similar effect to that of density-based methods using uncertainty sampling , by just reversing the ranking criterion, and choosing the most certain instead of the most uncertain instances.
Type de document :
Communication dans un congrès
Proceedings of the 28th Annual Conference on Legal Knowledge and Information Systems, Dec 2015, Braga, Portugal. Frontiers in Artificial Intelligence and Applications 279, pp.21-30, Legal Knowledge and Information Systems - JURIX 2015: The Twenty-Eighth Annual Conference
Liste complète des métadonnées

https://hal.inria.fr/hal-01236697
Contributeur : Serena Villata <>
Soumis le : mercredi 2 décembre 2015 - 10:17:38
Dernière modification le : vendredi 11 décembre 2015 - 01:06:14
Document(s) archivé(s) le : jeudi 3 mars 2016 - 11:30:52

Fichier

activeNLL2RDF.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01236697, version 1

Collections

Citation

Cristian Cardellino, Laura Alonso Alemany, Serena Villata, Elena Cabrio. Improvements in Information Extraction in Legal Text by Active Learning. Proceedings of the 28th Annual Conference on Legal Knowledge and Information Systems, Dec 2015, Braga, Portugal. Frontiers in Artificial Intelligence and Applications 279, pp.21-30, Legal Knowledge and Information Systems - JURIX 2015: The Twenty-Eighth Annual Conference. <hal-01236697>

Partager

Métriques

Consultations de
la notice

391

Téléchargements du document

233