Building a Morphosyntactic Lexicon and a Pre-syntactic Processing Chain for Polish

Abstract : This paper introduces a new set of tools and resources for Polish which cover all the steps required to transform a raw unrestricted text into a reasonable input for a parser. This includes (1) a large-coverage morphological lexicon, developed thanks to the IPI PAN corpus as well as a lexical acquisition techique, and (2) multiple tools for spelling correction, segmentation, tokenization and named entity recognition. This processing chain is also able to deal with the XCES format both as input and output, hence allowing to improve XCES corpora such as the IPI PAN corpus itself. This allows us to give a brief qualitative evaluation of the lexicon and of the processing chain.
Type de document :
Communication dans un congrès
Zygmunt Vetulani and Hans Huszkoreit. Language and Technology Conference, 2007, Poznań, Poland. Springer, 5603, 2009, Lecture Notes in Computer Science. 〈10.1007/978-3-642-04235-5_8〉
Liste complète des métadonnées

https://hal.inria.fr/inria-00614709
Contributeur : Benoît Sagot <>
Soumis le : lundi 15 août 2011 - 11:51:34
Dernière modification le : samedi 9 juin 2018 - 10:30:06
Document(s) archivé(s) le : lundi 12 novembre 2012 - 15:25:50

Fichier

LNAI09pl.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Benoît Sagot. Building a Morphosyntactic Lexicon and a Pre-syntactic Processing Chain for Polish. Zygmunt Vetulani and Hans Huszkoreit. Language and Technology Conference, 2007, Poznań, Poland. Springer, 5603, 2009, Lecture Notes in Computer Science. 〈10.1007/978-3-642-04235-5_8〉. 〈inria-00614709〉

Partager

Métriques

Consultations de la notice

95

Téléchargements de fichiers

115