Comparing stochastic approaches to spoken language understanding in multiple languages

One of the first steps in building a spoken language understanding (SLU) module for dialogue systems is the extraction of flat concepts out of a given word sequence, usually provided by an automatic speech recognition (ASR) system. In this paper, six different modeling approaches are investigated to tackle the task of concept tagging. These methods include classical, well-known generative and discriminative methods like Finite State Transducers (FSTs), Statistical Machine Translation (SMT), Maximum Entropy Markov Models (MEMMs), or Support Vector Machines (SVMs) as well as techniques recently applied to natural language processing such as Conditional Random Fields (CRFs) or Dynamic Bayesian Networks (DBNs). Following a detailed description of the models, experimental and comparative results are presented on three corpora in different languages and with different complexity. The French MEDIA corpus has already been exploited during an evaluation campaign and so a direct comparison with existing benchmarks is possible. Recently collected Italian and Polish corpora are used to test the robustness and portability of the modeling approaches. For all tasks, manual transcriptions as well as ASR inputs are considered. Additionally to single systems, methods for system combination are investigated. The best performing model on all tasks is based on conditional random fields. On the MEDIA evaluation corpus, a concept error rate of 12.6% could be achieved. Here, additionally to attribute names, attribute values have been extracted using a combination of a rule-based and a statistical approach. Applying system combination using weighted ROVER with all six systems, the concept error rate (CER) drops to 12.0%.

Domaines

Autre [cs.OH]

Fichier principal

plugin-05639034.pdf (837.07 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Ist Rennes : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00746965

Soumis le : mardi 30 octobre 2012-11:07:08

Dernière modification le : mercredi 20 mars 2024-03:24:09

Archivage à long terme le : jeudi 31 janvier 2013-03:45:49

Dates et versions

hal-00746965 , version 1 (30-10-2012)

Identifiants

HAL Id : hal-00746965 , version 1

Citer

Stefan Hahn, Marco Dinarelli, Christian Raymond, Fabrice Lefèvre, Patrick Lehnen, et al.. Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Transactions on Audio, Speech and Language Processing, 2011, 19 (6), pp.1569-1583. ⟨hal-00746965⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-AVIGNON EC-PARIS UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA LIMSI IRISA-INSA-R IRISA-D6 INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC LIA UNIV-RENNES INSA-GROUPE SORBONNE-UNIVERSITE UR1-MATH-NUM LISN GS-SPORT-HUMAN-MOVEMENT

591 Consultations

798 Téléchargements