Skip to Main content Skip to Navigation
Conference papers

C-structures and f-structures for the British National Corpus

Abstract : We describe how the British National Corpus (BNC), a one hundred million word balanced corpus of British English, was parsed into Lexical Functional Grammar (LFG) c-structures and f-structures, using a treebank-based parsing architecture. The parsing architecture uses a state-of-the-art statistical parser and reranker trained on the Penn Treebank to produce context-free phrase structure trees, and an annotation algorithm to automatically annotate these trees into LFG f-structures. We describe the pre-processing steps which were taken to accommodate the differences between the Penn Treebank and the BNC. Some of the issues encountered in applying the parsing architecture on such a large scale are discussed. The process of annotating a gold standard set of 1,000 parse trees is described. We present evaluation results obtained by evaluating the c-structures produced by the statistical parser against the c-structure gold standard. We also present the results obtained by evaluating the f-structures produced by the annotation algorithm against an automatically constructed f-structure gold standard. The c-structures achieve an f-score of 83.7% and the f-structures an f-score of 91.2%.
Document type :
Conference papers
Complete list of metadata

Cited literature [38 references]  Display  Hide  Download

https://hal.inria.fr/inria-00545440
Contributor : Brigitte Briot Connect in order to contact the contributor
Submitted on : Friday, December 10, 2010 - 11:19:56 AM
Last modification on : Sunday, June 26, 2022 - 10:02:21 AM
Long-term archiving on: : Friday, March 11, 2011 - 3:20:54 AM

File

lfg07wagneretal.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00545440, version 1

Citation

Joachim Wagner, Djamé Seddah, Jennifer Foster, Josef van Genabith. C-structures and f-structures for the British National Corpus. Proceedings of the Twelfth International Lexical Functional Grammar Conference, 2007, Stanford, CA, United States. ⟨inria-00545440⟩

Share

Metrics

Record views

90

Files downloads

231