C-structures and f-structures for the British National Corpus

Abstract : We describe how the British National Corpus (BNC), a one hundred million word balanced corpus of British English, was parsed into Lexical Functional Grammar (LFG) c-structures and f-structures, using a treebank-based parsing architecture. The parsing architecture uses a state-of-the-art statistical parser and reranker trained on the Penn Treebank to produce context-free phrase structure trees, and an annotation algorithm to automatically annotate these trees into LFG f-structures. We describe the pre-processing steps which were taken to accommodate the differences between the Penn Treebank and the BNC. Some of the issues encountered in applying the parsing architecture on such a large scale are discussed. The process of annotating a gold standard set of 1,000 parse trees is described. We present evaluation results obtained by evaluating the c-structures produced by the statistical parser against the c-structure gold standard. We also present the results obtained by evaluating the f-structures produced by the annotation algorithm against an automatically constructed f-structure gold standard. The c-structures achieve an f-score of 83.7% and the f-structures an f-score of 91.2%.
Type de document :
Communication dans un congrès
Butt, M. and King, T.H. Proceedings of the Twelfth International Lexical Functional Grammar Conference, 2007, Stanford, CA, United States. CSLI Publications, 2007
Liste complète des métadonnées

Littérature citée [38 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00545440
Contributeur : Brigitte Briot <>
Soumis le : vendredi 10 décembre 2010 - 11:19:56
Dernière modification le : vendredi 12 janvier 2018 - 15:34:02
Document(s) archivé(s) le : vendredi 11 mars 2011 - 03:20:54

Fichier

lfg07wagneretal.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00545440, version 1

Collections

Citation

Joachim Wagner, Djamé Seddah, Jennifer Foster, Josef Van Genabith. C-structures and f-structures for the British National Corpus. Butt, M. and King, T.H. Proceedings of the Twelfth International Lexical Functional Grammar Conference, 2007, Stanford, CA, United States. CSLI Publications, 2007. 〈inria-00545440〉

Partager

Métriques

Consultations de la notice

156

Téléchargements de fichiers

163