Vers la correction automatique de textes bruités: Architecture générale et détermination de la langue d'un mot inconnu

Marion Baranes 1, 2, *
* Corresponding author
1 ALPAGE - Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing
Inria Paris-Rocquencourt, UPD7 - Université Paris Diderot - Paris 7
Abstract : Towards Automatic Spell-Checking of Noisy Texts : General Architecture and Language Identification for Unknown Words. This paper deals with the problem of spell checking on degraded-quality corpora such as blogs, review sites and social networks. We propose a first architecture of correction which aims at reducing overcorrection, and we describe its implementation. We also report and discuss the results obtained thanks to the module that detects whether an unknown word from a sentence in a known language belongs to this language or not.
Document type :
Conference papers
Complete list of metadatas

Cited literature [22 references]  Display  Hide  Download

https://hal.inria.fr/hal-00701400
Contributor : Marion Baranes <>
Submitted on : Friday, May 25, 2012 - 12:50:45 PM
Last modification on : Friday, January 4, 2019 - 5:33:24 PM
Long-term archiving on : Friday, November 30, 2012 - 12:35:36 PM

File

recital12marion.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00701400, version 1

Collections

Citation

Marion Baranes. Vers la correction automatique de textes bruités: Architecture générale et détermination de la langue d'un mot inconnu. RECITAL'2012 - Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues, Jun 2012, Grenoble, France. pp.95-108. ⟨hal-00701400⟩

Share

Metrics

Record views

412

Files downloads

1048