Information Extraction with Active Learning: A Case Study in Legal Text

Cristian Cardellino 1, * Serena Villata 2, 3 Laura Alonso Alemany 1 Elena Cabrio 2, 3
* Auteur correspondant
2 WIMMICS - Web-Instrumented Man-Machine Interactions, Communities and Semantics
CRISAM - Inria Sophia Antipolis - Méditerranée , SPARKS - Scalable and Pervasive softwARe and Knowledge Systems
Abstract : Active learning has been successfully applied to a number of NLP tasks. In this paper, we present a study on Information Extraction for natural language licenses that need to be translated to RDF. The final purpose of our work is to automatically extract from a natural language document specifying a certain license a machine-readable description of the terms of use and reuse identified in such license. This task presents some peculiarities that make it specially interesting to study: highly repetitive text, few annotated or unannotated examples available, and very fine precision needed. In this paper we compare different active learning settings for this particular application. We show that the most straightforward approach to instance selection, uncertainty sampling, does not provide a good performance in this setting, performing even worse than passive learning. Density-based methods are the usual alternative to uncertainty sampling, in contexts with very few labelled instances. We show that we can obtain a similar effect to that of density-based methods using uncertainty sampling, by just reversing the ranking criterion, and choosing the most certain instead of the most uncertain instances.
Type de document :
Communication dans un congrès
Proceedings of the 16th International Conference on Intelligence Text Processing and Computational Linguistics (CICLing 2015), Apr 2015, Il Cairo, Egypt. 2015, <10.1007/978-3-319-18117-2_36>
Liste complète des métadonnées

https://hal.inria.fr/hal-01171856
Contributeur : Elena Cabrio <>
Soumis le : mercredi 15 juillet 2015 - 11:02:15
Dernière modification le : mardi 13 décembre 2016 - 15:42:45
Document(s) archivé(s) le : mercredi 26 avril 2017 - 00:00:43

Fichier

paper_237.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Cristian Cardellino, Serena Villata, Laura Alonso Alemany, Elena Cabrio. Information Extraction with Active Learning: A Case Study in Legal Text. Proceedings of the 16th International Conference on Intelligence Text Processing and Computational Linguistics (CICLing 2015), Apr 2015, Il Cairo, Egypt. 2015, <10.1007/978-3-319-18117-2_36>. <hal-01171856>

Partager

Métriques

Consultations de
la notice

203

Téléchargements du document

214