STOP: A dataset for spoken task oriented semantic parsing

Paden Tomasello; Akshat Shrivastava; Daniel Lazar; Po-Chun Hsu; Duc Le; Adithya Sagar; Ali Elkahky; Jade Copet; Wei-Ning Hsu; Yossi Adi; Robin Algayres; Tu Ahn Nguyen; Emmanuel Dupoux; Luke Zettlemoyer; Abdelrahman Mohamed

Communication Dans Un Congrès Année : 2023

STOP: A dataset for spoken task oriented semantic parsing

(1) , (1) , (1) , (1) , (2, 1) , (1) , (1) , (1) , (1) , (1) , (3, 4) , (5, 1, 4) , (3, 4, 6, 7, 1) , (8, 1) , (1)

1
2
3
4
5
6
7
8

Paden Tomasello

Fonction : Auteur

Meta AI

Akshat Shrivastava

Fonction : Auteur

Meta AI

Daniel Lazar

Fonction : Auteur

Meta AI

Po-Chun Hsu

Fonction : Auteur

Meta AI

Duc Le

Fonction : Auteur

University of Tampere [Finland]

Meta AI

Adithya Sagar

Fonction : Auteur

Meta AI

Ali Elkahky

Fonction : Auteur

Meta AI

Jade Copet

Fonction : Auteur

Meta AI

Wei-Ning Hsu

Fonction : Auteur

Meta AI

Yossi Adi

Fonction : Auteur

Meta AI

Robin Algayres

Fonction : Auteur

Laboratoire de sciences cognitives et psycholinguistique

Apprentissage machine et développement cognitif

Tu Ahn Nguyen

Fonction : Auteur

Dalhousie University [Halifax]

Meta AI

Apprentissage machine et développement cognitif

Emmanuel Dupoux

Fonction : Auteur

Laboratoire de sciences cognitives et psycholinguistique

Apprentissage machine et développement cognitif

Inria de Paris

École des hautes études en sciences sociales

Meta AI

Luke Zettlemoyer

Fonction : Auteur

Department of Computer Science & Engineering

Meta AI

Abdelrahman Mohamed

Fonction : Auteur

Meta AI

Résumé

End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model. It promises to improve the performance of assistant systems by leveraging acoustic information lost in the intermediate textual representation and preventing cascading errors from Automatic Speech Recognition (ASR). Further, having one unified model has efficiency advantages when deploying assistant systems on-device. However, the limited number of public audio datasets with semantic parse labels hinders the research progress in this area. In this paper, we release the Spoken Task-Oriented semantic Parsing (STOP) dataset 1 , the largest and most complex SLU dataset publicly available. Additionally, we define low-resource splits to establish a benchmark for improving SLU when limited labeled data is available. Furthermore, in addition to the human-recorded audio, we are releasing a TTS-generated versions to benchmark the performance for low-resource and domain adaptation of end-to-end SLU systems.

Mots clés

Spoken language understanding Assistant Domain adaptation

Domaines

Linguistique

Fichier principal

2207.10643.pdf (791.11 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Sabrina Zermani : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03989829

Soumis le : mercredi 15 février 2023-09:25:04

Dernière modification le : jeudi 25 avril 2024-03:13:03

Archivage à long terme le : mardi 16 mai 2023-18:16:34

Dates et versions

hal-03989829 , version 1 (15-02-2023)

Identifiants

HAL Id : hal-03989829 , version 1

Citer

Paden Tomasello, Akshat Shrivastava, Daniel Lazar, Po-Chun Hsu, Duc Le, et al.. STOP: A dataset for spoken task oriented semantic parsing. SLT-2022 - IEEE Spoken Language Technology Workshop, Jan 2023, Doha, Qatar. ⟨hal-03989829⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS INRIA EHESS LSCP DEC INRIA2 PSL

33 Consultations

82 Téléchargements

STOP: A dataset for spoken task oriented semantic parsing

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager