From Non Word to New Word: Automatically Identifying Neologisms in French Newspapers

Ingrid Falk 1, * Delphine Bernhard 1 Christophe Gérard 1
* Auteur correspondant
1 LiLPa
EA 1339 - UMB - Linguistique, Langues et Parole (LILPA)
Abstract : In this paper we present a statistical machine learning approach to neologism detection going some way beyond the use of exclusion lists. We explore the impact of three groups of features: form related, morpho-lexical and thematic features. The latter type of features has not yet been used in this kind of application and represents a way to access the semantic context of new words. The results suggest that form related features are helpful at the overall classification task, while morpho-lexical and thematic features better single out true neologisms.
Type de document :
Communication dans un congrès
LREC - The 9th edition of the Language Resources and Evaluation Conference, May 2014, Reykjavik, Iceland. 2014, Proceedings of the International Conference on Language Resources and Evaluation
Liste complète des métadonnées

Littérature citée [24 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00959079
Contributeur : Ingrid Falk <>
Soumis le : lundi 2 juin 2014 - 17:59:15
Dernière modification le : mardi 28 octobre 2014 - 18:08:39
Document(s) archivé(s) le : mardi 2 septembre 2014 - 10:35:55

Fichiers

Identifiants

  • HAL Id : hal-00959079, version 1

Collections

Citation

Ingrid Falk, Delphine Bernhard, Christophe Gérard. From Non Word to New Word: Automatically Identifying Neologisms in French Newspapers. LREC - The 9th edition of the Language Resources and Evaluation Conference, May 2014, Reykjavik, Iceland. 2014, Proceedings of the International Conference on Language Resources and Evaluation. 〈hal-00959079〉

Partager

Métriques

Consultations de la notice

298

Téléchargements de fichiers

706