Normalisation de textes par analogie: le cas des mots inconnus

Abstract : Analogy-based Text Normalization : the case of unknowns words. In this paper, we describe and evaluate a system for improving the quality of noisy texts containing non-word errors. It is meant to be integrated into a full information extraction architecture, and aims at improving its results. For each word unknown to a reference lexicon which is neither a named entity nor a neologism, our system suggests one or several normalization candidates (any known word which has the same lemma as the spell-corrected form is a valid candidate). For this purpose, we use an analogy-based approach for acquiring normalisation rules and use them in the same way as lexical spelling correction rules.
Document type :
Conference papers
Complete list of metadatas

Cited literature [24 references]  Display  Hide  Download

https://hal.inria.fr/hal-01019998
Contributor : Marion Baranes <>
Submitted on : Monday, July 7, 2014 - 3:45:44 PM
Last modification on : Thursday, August 29, 2019 - 2:24:09 PM
Long-term archiving on : Monday, October 12, 2015 - 11:35:45 AM

File

Paper_O-E.3.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-01019998, version 1

Collections

Citation

Marion Baranes, Benoît Sagot. Normalisation de textes par analogie: le cas des mots inconnus. TALN - Traitement Automatique du Langage Naturel, Jul 2014, Marseille, France. pp.137-148. ⟨hal-01019998⟩

Share

Metrics

Record views

579

Files downloads

313