Extension dynamique de lexiques morphologiques pour le français à partir d'un flux textuel

Abstract : Lexical incompleteness is a recurring problem when dealing with natural language and its variability. It seems indeed necessary today to regularly validate and extend lexica used by tools processing large amounts of textual data. This is even more true when processing real-time text flows. In this context, our paper introduces techniques aimed at addressing words unknown to a lexicon. We first study neology (from a theoretic and corpus-based point of view) and describe the modules we have developed for detecting them and inferring information about them (lemma, category, inflectional class). We show that we are able, using various modules for analyzing derived and compound neologisms, to generate lexical entries candidates in real-time and with a good precision.
Document type :
Conference papers
Complete list of metadatas

Cited literature [21 references]  Display  Hide  Download

https://hal.inria.fr/hal-00832078
Contributor : Marion Baranes <>
Submitted on : Monday, June 10, 2013 - 1:12:22 PM
Last modification on : Friday, January 4, 2019 - 5:33:24 PM
Long-term archiving on : Tuesday, April 4, 2017 - 6:52:20 PM

File

taln13edylex.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00832078, version 1

Collections

Citation

Benoît Sagot, Damien Nouvel, Virginie Mouilleron, Marion Baranes. Extension dynamique de lexiques morphologiques pour le français à partir d'un flux textuel. TALN - Traitement Automatique du Langage Naturel, Jun 2013, Les sables d'Olonne, France. pp.407-420. ⟨hal-00832078⟩

Share

Metrics

Record views

344

Files downloads

357