Monolingual and cross-lingual intent detection without training data in target languages - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Article Dans Une Revue Electronics Année : 2021

Monolingual and cross-lingual intent detection without training data in target languages

Résumé

Due to recent DNN advancements, many NLP problems can be effectively solved using transformer-based models and supervised data. Unfortunately, such data is not available in some languages. This research is based on assumptions that (1) training data can be obtained by the machine translating it from another language; (2) there are cross-lingual solutions that work without the training data in the target language. Consequently, in this research, we use the English dataset and solve the intent detection problem for five target languages (German, French, Lithuanian, Latvian, and Portuguese). When seeking the most accurate solutions, we investigate BERT-based word and sentence transformers together with eager learning classifiers (CNN, BERT fine-tuning, FFNN) and lazy learning approach (Cosine similarity as the memory-based method). We offer and evaluate several strategies to overcome the data scarcity problem with machine translation, crosslingual models, and a combination of the previous two. The experimental investigation revealed the robustness of sentence transformers under various cross-lingual conditions. The accuracy equal to ~0.842 is achieved with the English dataset with completely monolingual models is considered our top-line. However, cross-lingual approaches demonstrate similar accuracy levels reaching ~0.831, ~0.829, ~0.853, ~0.831, and ~0.813 on German, French, Lithuanian, Latvian, and Portuguese languages.
Fichier principal
Vignette du fichier
kapociute-dzikiene_Electronics2021.pdf (911.84 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03351013 , version 1 (21-09-2021)

Identifiants

Citer

Jurgita Kapočiūtė-Dzikienė, Askars Salimbajevs, Raivis Skadiņš. Monolingual and cross-lingual intent detection without training data in target languages. Electronics, 2021, 10, ⟨10.3390/electronics10121412⟩. ⟨hal-03351013⟩
23 Consultations
248 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More