From Non Word to New Word: Automatically Identifying Neologisms in French Newspapers
Résumé
In this paper we present a statistical machine learning approach to neologism detection going some way beyond the use of exclusion lists. We explore the impact of three groups of features: form related, morpho-lexical and thematic features. The latter type of features has not yet been used in this kind of application and represents a way to access the semantic context of new words. The results suggest that form related features are helpful at the overall classification task, while morpho-lexical and thematic features better single out true neologisms.
Fichier principal
logo.pdf (93.95 Ko)
Télécharger le fichier
main.pdf (363.06 Ko)
Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Format : Autre
Loading...