Building a Morphosyntactic Lexicon and a Pre-syntactic Processing Chain for Polish

Abstract : This paper introduces a new set of tools and resources for Polish which cover all the steps required to transform a raw unrestricted text into a reasonable input for a parser. This includes (1) a large-coverage morphological lexicon, developed thanks to the IPI PAN corpus as well as a lexical acquisition techique, and (2) multiple tools for spelling correction, segmentation, tokenization and named entity recognition. This processing chain is also able to deal with the XCES format both as input and output, hence allowing to improve XCES corpora such as the IPI PAN corpus itself. This allows us to give a brief qualitative evaluation of the lexicon and of the processing chain.
Document type :
Conference papers
Complete list of metadatas

https://hal.inria.fr/inria-00614709
Contributor : Benoît Sagot <>
Submitted on : Monday, August 15, 2011 - 11:51:34 AM
Last modification on : Friday, January 4, 2019 - 5:33:24 PM
Long-term archiving on : Monday, November 12, 2012 - 3:25:50 PM

File

LNAI09pl.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Benoît Sagot. Building a Morphosyntactic Lexicon and a Pre-syntactic Processing Chain for Polish. Language and Technology Conference, 2007, Poznań, Poland. ⟨10.1007/978-3-642-04235-5_8⟩. ⟨inria-00614709⟩

Share

Metrics

Record views

130

Files downloads

199