Enhancing BERT for Lexical Normalization - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2019

Enhancing BERT for Lexical Normalization

Résumé

Language model-based pre-trained representations have become ubiquitous in natural language processing. They have been shown to significantly improve the performance of neu-ral models on a great variety of tasks. However , it remains unclear how useful those general models can be in handling non-canonical text. In this article, focusing on User Generated Content (UGC) in a resource-scarce scenario , we study the ability of BERT (Devlin et al., 2018) to perform lexical normalisation. Our contribution is simple: by framing lexical normalisation as a token prediction task, by enhancing its architecture and by carefully fine-tuning it, we show that BERT can be a competitive lexical normalisation model without the need of any UGC resources aside from 3,000 training sentences. To the best of our knowledge , it is the first work done in adapting and analysing the ability of this model to handle noisy UGC data.
Fichier principal
Vignette du fichier
Enhancing_BERT_for_lexical_normalisation_WNUT2019_proceeding-5.pdf (207.86 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02294316 , version 1 (30-09-2019)

Identifiants

  • HAL Id : hal-02294316 , version 1

Citer

Benjamin Muller, Benoît Sagot, Djamé Seddah. Enhancing BERT for Lexical Normalization. The 5th Workshop on Noisy User-generated Text (W-NUT), Nov 2019, Hong Kong, China. ⟨hal-02294316⟩
569 Consultations
2178 Téléchargements

Partager

Gmail Facebook X LinkedIn More