Skip to Main content Skip to Navigation
New interface
Conference papers

Semantic annotation of French corpora: Animacy and verb semantic classes

Juliette Thuilier 1 Laurence Danlos 1 
1 ALPAGE - Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing
Inria Paris-Rocquencourt, UPD7 - Université Paris Diderot - Paris 7
Abstract : This paper presents a first corpus of French annotated for animacy and for verb semantic classes. The resource consists of 1,346 sentences extracted from three different corpora: the French Treebank (Abeillé and Barrier, 2004), the Est-Républicain corpus (CNRTL) and the ESTER corpus (ELRA). It is a set of parsed sentences, containing a verbal head subcategorizing two complements, with annotations on the verb and on both complements, in the TIGER XML format (Mengel and Lezius, 2000). The resource was manually annotated and manually corrected by three annotators. Animacy has been annotated following the categories of Zaenen et al. (2004). Measures of inter-annotator agreement are good (Multi-pi = 0.82 and Multi-kappa = 0.86 (k = 3, N = 2360)). As for verb semantic classes, we used three of the five levels of classification of an existing dictionary: "Les Verbes du Français" (Dubois and Dubois-Charlier, 1997). For the higher level (generic classes), the measures of agreement are Multi-pi = 0.84 and Multi-kappa = 0.87 (k = 3, N = 1346). The inter-annotator agreements show that the annotated data are reliable for both animacy and verbal semantic classes.
Document type :
Conference papers
Complete list of metadata

Cited literature [15 references]  Display  Hide  Download
Contributor : Juliette Thuilier Connect in order to contact the contributor
Submitted on : Friday, May 18, 2012 - 10:47:15 AM
Last modification on : Wednesday, November 2, 2022 - 10:46:12 AM
Long-term archiving on: : Sunday, August 19, 2012 - 2:24:40 AM


Files produced by the author(s)


  • HAL Id : hal-00698907, version 1


Juliette Thuilier, Laurence Danlos. Semantic annotation of French corpora: Animacy and verb semantic classes. LREC 2012 - The eighth international conference on Language Resources and Evaluation, May 2012, Istanbul, Turkey. ⟨hal-00698907⟩



Record views


Files downloads