Adapting WSJ-trained parsers to the British National Corpus using in-domain self-training
Résumé
We introduce a set of 1,000 gold standard parse trees for the British National Corpus (BNC) and perform a series of self-training experiments with Charniak and Johnson's reranking parser and BNC sentences. We show that retraining this parser with a combination of one million BNC parse trees (produced by the same parser) and the original WSJ training data yields improvements of 0.4% on WSJ Section 23 and 1.7% on the new BNC gold standard set.
Domaines
Traitement du texte et du document
Origine : Fichiers produits par l'(les) auteur(s)
Loading...