Building a Morphosyntactic Lexicon and a Pre-syntactic Processing Chain for Polish - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2009

Building a Morphosyntactic Lexicon and a Pre-syntactic Processing Chain for Polish

Résumé

This paper introduces a new set of tools and resources for Polish which cover all the steps required to transform a raw unrestricted text into a reasonable input for a parser. This includes (1) a large-coverage morphological lexicon, developed thanks to the IPI PAN corpus as well as a lexical acquisition techique, and (2) multiple tools for spelling correction, segmentation, tokenization and named entity recognition. This processing chain is also able to deal with the XCES format both as input and output, hence allowing to improve XCES corpora such as the IPI PAN corpus itself. This allows us to give a brief qualitative evaluation of the lexicon and of the processing chain.
Fichier principal
Vignette du fichier
LNAI09pl.pdf (296.86 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

inria-00614709 , version 1 (15-08-2011)

Identifiants

Citer

Benoît Sagot. Building a Morphosyntactic Lexicon and a Pre-syntactic Processing Chain for Polish. Language and Technology Conference, 2007, Poznań, Poland. ⟨10.1007/978-3-642-04235-5_8⟩. ⟨inria-00614709⟩
70 Consultations
152 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More