French text preprocessing with TTL

Amalia Todirascu 1 Radu Ion Mirabela Navlea 1 Laurence Longo 1
1 LiLPa
LILPA - Linguistique, Langues et Parole
Abstract : In this paper we present some experiments on the building of French resources for the TTL POS tagger (Ion, 2007). TTL is a collection of interconnected text preprocessing modules (sentence splitter, tokenizer, tagger, le mmatizer and chunker) with resources for Romanian and English but with no resources available for French. We show how we develop the required POS tagging training corpus and that the average POS tagging accuracy for French exceeds 97% when TTL is trained on this corpus.
Type de document :
Article dans une revue
Proceedings of Romanian Academy - Series A (Mathematics, Physics, Technical Sciences, Information Science), The Publishing House of the Romanian Academy, 2011, 12 (2), pp. 151-158
Liste complète des métadonnées

https://hal.inria.fr/hal-00867452
Contributeur : Amalia Todirascu <>
Soumis le : dimanche 29 septembre 2013 - 23:47:55
Dernière modification le : jeudi 15 mars 2018 - 01:25:50

Identifiants

  • HAL Id : hal-00867452, version 1

Collections

Citation

Amalia Todirascu, Radu Ion, Mirabela Navlea, Laurence Longo. French text preprocessing with TTL. Proceedings of Romanian Academy - Series A (Mathematics, Physics, Technical Sciences, Information Science), The Publishing House of the Romanian Academy, 2011, 12 (2), pp. 151-158. 〈hal-00867452〉

Partager

Métriques

Consultations de la notice

118