Trouver et confondre les coupables : un processus sophistiqué de correction de lexique

Abstract : The coverage of a parser depends mostly on the quality of the underlying grammar and lexicon. The development of a lexicon both complete and accurate is an intricate and demanding task, overall when achieving a certain level of quality and coverage. We introduce an automatic process able to detect missing or incomplete entries in a lexicon, and to suggest corrections hypotheses for these entries. The detection of dubious lexical entries is tackled by two techniques relying either on a specific statistical model, or on the information provided by a part-of-speech tagger. The generation of correction hypotheses for the detected entries is achieved by studying which modifications could improve the parse rate of the sentences in which the entries occur. This process brings together various techniques based on different tools such as taggers, parsers and entropy classifiers. Applying it on the Lefff, a large-coverage morphologi- cal and syntactic French lexicon, has already allowed us to perfom noticeable improvements.
Document type :
Conference papers
Liste complète des métadonnées

Cited literature [14 references]  Display  Hide  Download

https://hal.inria.fr/inria-00553257
Contributor : Eric Villemonte de La Clergerie <>
Submitted on : Thursday, January 6, 2011 - 9:58:39 PM
Last modification on : Thursday, February 7, 2019 - 2:36:24 PM
Document(s) archivé(s) le : Thursday, April 7, 2011 - 2:33:28 AM

File

lexfix-taln09.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00553257, version 1

Citation

Lionel Nicolas, Benoît Sagot, Miguel Molinero, Jacques Farré, Éric Villemonte de La Clergerie. Trouver et confondre les coupables : un processus sophistiqué de correction de lexique. 16ème conférence sur le Traitement Automatique des Langues Naturelles : TALN'09, ATALA ; LIPN, Jun 2009, Senlis, France. ⟨inria-00553257⟩

Share

Metrics

Record views

645

Files downloads

141