Skip to Main content Skip to Navigation
Conference papers

Adapting WSJ-trained parsers to the British National Corpus using in-domain self-training

Abstract : We introduce a set of 1,000 gold standard parse trees for the British National Corpus (BNC) and perform a series of self-training experiments with Charniak and Johnson's reranking parser and BNC sentences. We show that retraining this parser with a combination of one million BNC parse trees (produced by the same parser) and the original WSJ training data yields improvements of 0.4% on WSJ Section 23 and 1.7% on the new BNC gold standard set.
Document type :
Conference papers
Complete list of metadata

Cited literature [12 references]  Display  Hide  Download

https://hal.inria.fr/inria-00545429
Contributor : Brigitte Briot <>
Submitted on : Friday, December 10, 2010 - 10:58:15 AM
Last modification on : Monday, December 14, 2020 - 9:48:04 AM
Long-term archiving on: : Friday, March 11, 2011 - 3:16:18 AM

File

jfoster_et_al_07.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00545429, version 1

Citation

Jennifer Foster, Joachim Wagner, Djamé Seddah, Josef van Genabith. Adapting WSJ-trained parsers to the British National Corpus using in-domain self-training. Proceedings of the 10th International Conference on Parsing Technologies : IWPT '07, Association for Computational Linguistics, 2007, Prague, Czech Republic. pp.33--35. ⟨inria-00545429⟩

Share

Metrics

Record views

163

Files downloads

550