Skip to Main content Skip to Navigation
Conference papers

Enhancing BERT for Lexical Normalization

Abstract : Language model-based pre-trained representations have become ubiquitous in natural language processing. They have been shown to significantly improve the performance of neu-ral models on a great variety of tasks. However , it remains unclear how useful those general models can be in handling non-canonical text. In this article, focusing on User Generated Content (UGC) in a resource-scarce scenario , we study the ability of BERT (Devlin et al., 2018) to perform lexical normalisation. Our contribution is simple: by framing lexical normalisation as a token prediction task, by enhancing its architecture and by carefully fine-tuning it, we show that BERT can be a competitive lexical normalisation model without the need of any UGC resources aside from 3,000 training sentences. To the best of our knowledge , it is the first work done in adapting and analysing the ability of this model to handle noisy UGC data.
Document type :
Conference papers
Complete list of metadata

Cited literature [28 references]  Display  Hide  Download

https://hal.inria.fr/hal-02294316
Contributor : Benoît Sagot <>
Submitted on : Monday, September 30, 2019 - 6:33:53 PM
Last modification on : Tuesday, October 1, 2019 - 8:33:33 AM
Long-term archiving on: : Monday, February 10, 2020 - 2:14:32 AM

File

Enhancing_BERT_for_lexical_nor...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02294316, version 1

Collections

Citation

Benjamin Muller, Benoît Sagot, Djamé Seddah. Enhancing BERT for Lexical Normalization. The 5th Workshop on Noisy User-generated Text (W-NUT), Nov 2019, Hong Kong, China. ⟨hal-02294316⟩

Share

Metrics

Record views

291

Files downloads

2115