Semantic annotation of French corpora: Animacy and verb semantic classes

Abstract : This paper presents a first corpus of French annotated for animacy and for verb semantic classes. The resource consists of 1,346 sentences extracted from three different corpora: the French Treebank (Abeillé and Barrier, 2004), the Est-Républicain corpus (CNRTL) and the ESTER corpus (ELRA). It is a set of parsed sentences, containing a verbal head subcategorizing two complements, with annotations on the verb and on both complements, in the TIGER XML format (Mengel and Lezius, 2000). The resource was manually annotated and manually corrected by three annotators. Animacy has been annotated following the categories of Zaenen et al. (2004). Measures of inter-annotator agreement are good (Multi-pi = 0.82 and Multi-kappa = 0.86 (k = 3, N = 2360)). As for verb semantic classes, we used three of the five levels of classification of an existing dictionary: "Les Verbes du Français" (Dubois and Dubois-Charlier, 1997). For the higher level (generic classes), the measures of agreement are Multi-pi = 0.84 and Multi-kappa = 0.87 (k = 3, N = 1346). The inter-annotator agreements show that the annotated data are reliable for both animacy and verbal semantic classes.
Document type :
Conference papers
Liste complète des métadonnées

Cited literature [15 references]  Display  Hide  Download

https://hal.inria.fr/hal-00698907
Contributor : Juliette Thuilier <>
Submitted on : Friday, May 18, 2012 - 10:47:15 AM
Last modification on : Friday, January 4, 2019 - 5:33:24 PM
Document(s) archivé(s) le : Sunday, August 19, 2012 - 2:24:40 AM

File

lrecVF.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00698907, version 1

Collections

Citation

Juliette Thuilier, Laurence Danlos. Semantic annotation of French corpora: Animacy and verb semantic classes. LREC 2012 - The eighth international conference on Language Resources and Evaluation, May 2012, Istanbul, Turkey. ⟨hal-00698907⟩

Share

Metrics

Record views

367

Files downloads

307