Semantic annotation of French corpora: Animacy and verb semantic classes

Abstract : This paper presents a first corpus of French annotated for animacy and for verb semantic classes. The resource consists of 1,346 sentences extracted from three different corpora: the French Treebank (Abeillé and Barrier, 2004), the Est-Républicain corpus (CNRTL) and the ESTER corpus (ELRA). It is a set of parsed sentences, containing a verbal head subcategorizing two complements, with annotations on the verb and on both complements, in the TIGER XML format (Mengel and Lezius, 2000). The resource was manually annotated and manually corrected by three annotators. Animacy has been annotated following the categories of Zaenen et al. (2004). Measures of inter-annotator agreement are good (Multi-pi = 0.82 and Multi-kappa = 0.86 (k = 3, N = 2360)). As for verb semantic classes, we used three of the five levels of classification of an existing dictionary: "Les Verbes du Français" (Dubois and Dubois-Charlier, 1997). For the higher level (generic classes), the measures of agreement are Multi-pi = 0.84 and Multi-kappa = 0.87 (k = 3, N = 1346). The inter-annotator agreements show that the annotated data are reliable for both animacy and verbal semantic classes.
Type de document :
Communication dans un congrès
Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet UÄŸur DoÄŸan and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis. LREC 2012 - The eighth international conference on Language Resources and Evaluation, May 2012, Istanbul, Turkey. European Language Resources Association (ELRA), 2012, Proceedings of the Eigth International Conference on Language Resources and Evaluation - LREC 2012
Liste complète des métadonnées

Littérature citée [15 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00698907
Contributeur : Juliette Thuilier <>
Soumis le : vendredi 18 mai 2012 - 10:47:15
Dernière modification le : jeudi 15 novembre 2018 - 20:27:26
Document(s) archivé(s) le : dimanche 19 août 2012 - 02:24:40

Fichier

lrecVF.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00698907, version 1

Collections

Citation

Juliette Thuilier, Laurence Danlos. Semantic annotation of French corpora: Animacy and verb semantic classes. Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet UÄŸur DoÄŸan and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis. LREC 2012 - The eighth international conference on Language Resources and Evaluation, May 2012, Istanbul, Turkey. European Language Resources Association (ELRA), 2012, Proceedings of the Eigth International Conference on Language Resources and Evaluation - LREC 2012. 〈hal-00698907〉

Partager

Métriques

Consultations de la notice

357

Téléchargements de fichiers

243