Identifying Sources of Weakness in Syntactic Lexicon Extraction

Claire Gardent; Alejandra Lorenzo

Communication Dans Un Congrès Année : 2010

Identifying Sources of Weakness in Syntactic Lexicon Extraction

(1) , (1)

Claire Gardent

Fonction : Auteur
PersonId : 3949
IdHAL : claire-gardent
ORCID : 0000-0002-3805-6662
IdRef : 034104593

Natural Language Processing: representation, inference and semantics

Alejandra Lorenzo

Fonction : Auteur
PersonId : 879084

Natural Language Processing: representation, inference and semantics

Résumé

Previous work has shown that large scale subcategorisation lexicons could be extracted from parsed corpora with reasonably high precision. In this paper, we apply a standard extraction procedure to a 100 millions words parsed corpus of french and obtain rather poor results. We investigate different factors likely to improve performance such as in particular, the specific extraction procedure and the parser used; the size of the input corpus; and the type of frames learned. We try out different ways of interleaving the output of several parsers with the lexicon extraction process and show that none of them improves the results. Conversely, we show that increasing the size of the input corpus and modifying the extraction procedure to better differentiate prepositional arguments from prepositional modifiers improves performance. In conclusion, we suggest that a more sophisticated approach to parser combination and better probabilistic models of the various types of prepositional objects in French are likely ways to get better results.

Mots clés

Subcategorisation lexicon acquisition french

Domaines

Informatique et langage [cs.CL]

Claire Gardent : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00537150

Soumis le : mercredi 17 novembre 2010-18:00:24

Dernière modification le : vendredi 24 mars 2023-14:52:53

Dates et versions

inria-00537150 , version 1 (17-11-2010)

Identifiants

HAL Id : inria-00537150 , version 1

Citer

Claire Gardent, Alejandra Lorenzo. Identifying Sources of Weakness in Syntactic Lexicon Extraction. The seventh international conference on Language Resources and Evaluation - LREC'10, May 2010, Malta, Malta. ⟨inria-00537150⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA ANR

104 Consultations

0 Téléchargements

Identifying Sources of Weakness in Syntactic Lexicon Extraction

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager