Normalisation de textes par analogie: le cas des mots inconnus

Abstract : Analogy-based Text Normalization : the case of unknowns words. In this paper, we describe and evaluate a system for improving the quality of noisy texts containing non-word errors. It is meant to be integrated into a full information extraction architecture, and aims at improving its results. For each word unknown to a reference lexicon which is neither a named entity nor a neologism, our system suggests one or several normalization candidates (any known word which has the same lemma as the spell-corrected form is a valid candidate). For this purpose, we use an analogy-based approach for acquiring normalisation rules and use them in the same way as lexical spelling correction rules.
Document type :
Conference papers
Liste complète des métadonnées

Cited literature [24 references]  Display  Hide  Download

https://hal.inria.fr/hal-01019998
Contributor : Marion Baranes <>
Submitted on : Monday, July 7, 2014 - 3:45:44 PM
Last modification on : Friday, January 4, 2019 - 5:33:24 PM
Document(s) archivé(s) le : Monday, October 12, 2015 - 11:35:45 AM

File

Paper_O-E.3.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-01019998, version 1

Collections

Citation

Marion Baranes, Benoît Sagot. Normalisation de textes par analogie: le cas des mots inconnus. TALN - Traitement Automatique du Langage Naturel, Jul 2014, Marseille, France. pp.137-148. ⟨hal-01019998⟩

Share

Metrics

Record views

559

Files downloads

289