Developing a large semantically annotated corpus

Abstract : What would be a good method to provide a large collection of semantically annotated texts with formal, deep semantics rather than shallow? We argue that a bootstrapping approach comprising state-of-the-art NLP tools for parsing and semantic interpretation, in combination with a wiki-like interface for collaborative annotation of experts, and a game with a purpose for crowdsourcing, are the starting ingredients for fulfilling this enterprise. The result is a semantic resource that anyone can edit and that integrates various phenomena, including predicate-argument structure, scope, tense, thematic roles, rhetorical relations and presuppositions, into a single semantic formalism: Discourse Representation Theory. Taking texts rather than sentences as the units of annotation results in deep semantic representations that incorporate discourse structure and dependencies. To manage the various (possibly conflicting) annotations provided by experts and non-experts, we introduce a method that stores " Bits of Wisdom " in a database as stand-off annotations.
Type de document :
Communication dans un congrès
LREC 2012, Eighth International Conference on Language Resources and Evaluation, May 2012, Istanbul, Turkey
Liste complète des métadonnées

Littérature citée [19 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01389432
Contributeur : Valerio Basile <>
Soumis le : vendredi 28 octobre 2016 - 14:01:36
Dernière modification le : lundi 9 octobre 2017 - 13:18:03

Fichier

534_Paper.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01389432, version 1

Collections

Citation

Valerio Basile, Johan Bos, Kilian Evang, Noortje Venhuizen. Developing a large semantically annotated corpus. LREC 2012, Eighth International Conference on Language Resources and Evaluation, May 2012, Istanbul, Turkey. 〈hal-01389432〉

Partager

Métriques

Consultations de la notice

183

Téléchargements de fichiers

189