Identifying Sources of Weakness in Syntactic Lexicon Extraction

Claire Gardent 1 Alejandra Lorenzo 1
1 TALARIS - Natural Language Processing: representation, inference and semantics
Inria Nancy - Grand Est, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : Previous work has shown that large scale subcategorisation lexicons could be extracted from parsed corpora with reasonably high precision. In this paper, we apply a standard extraction procedure to a 100 millions words parsed corpus of french and obtain rather poor results. We investigate different factors likely to improve performance such as in particular, the specific extraction procedure and the parser used; the size of the input corpus; and the type of frames learned. We try out different ways of interleaving the output of several parsers with the lexicon extraction process and show that none of them improves the results. Conversely, we show that increasing the size of the input corpus and modifying the extraction procedure to better differentiate prepositional arguments from prepositional modifiers improves performance. In conclusion, we suggest that a more sophisticated approach to parser combination and better probabilistic models of the various types of prepositional objects in French are likely ways to get better results.
Type de document :
Communication dans un congrès
Nicoletta Calzolari and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias. The seventh international conference on Language Resources and Evaluation - LREC'10, May 2010, Malta, Malta. European Language Resources Association (ELRA), 2010, Proceeding of The seventh international conference on Language Resources and Evaluation (LREC)
Liste complète des métadonnées

https://hal.inria.fr/inria-00537150
Contributeur : Claire Gardent <>
Soumis le : mercredi 17 novembre 2010 - 18:00:24
Dernière modification le : jeudi 11 janvier 2018 - 06:21:35

Identifiants

  • HAL Id : inria-00537150, version 1

Collections

Citation

Claire Gardent, Alejandra Lorenzo. Identifying Sources of Weakness in Syntactic Lexicon Extraction. Nicoletta Calzolari and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias. The seventh international conference on Language Resources and Evaluation - LREC'10, May 2010, Malta, Malta. European Language Resources Association (ELRA), 2010, Proceeding of The seventh international conference on Language Resources and Evaluation (LREC). 〈inria-00537150〉

Partager

Métriques

Consultations de la notice

198