Skip to Main content Skip to Navigation
Conference papers

Normalisation de textes par analogie: le cas des mots inconnus

Abstract : Analogy-based Text Normalization : the case of unknowns words. In this paper, we describe and evaluate a system for improving the quality of noisy texts containing non-word errors. It is meant to be integrated into a full information extraction architecture, and aims at improving its results. For each word unknown to a reference lexicon which is neither a named entity nor a neologism, our system suggests one or several normalization candidates (any known word which has the same lemma as the spell-corrected form is a valid candidate). For this purpose, we use an analogy-based approach for acquiring normalisation rules and use them in the same way as lexical spelling correction rules.
Document type :
Conference papers
Complete list of metadata

Cited literature [24 references]  Display  Hide  Download
Contributor : Marion Baranes Connect in order to contact the contributor
Submitted on : Monday, July 7, 2014 - 3:45:44 PM
Last modification on : Wednesday, January 12, 2022 - 3:46:24 AM
Long-term archiving on: : Monday, October 12, 2015 - 11:35:45 AM


Publisher files allowed on an open archive


  • HAL Id : hal-01019998, version 1



Marion Baranes, Benoît Sagot. Normalisation de textes par analogie: le cas des mots inconnus. TALN - Traitement Automatique du Langage Naturel, Jul 2014, Marseille, France. pp.137-148. ⟨hal-01019998⟩



Les métriques sont temporairement indisponibles