Zero-resource audio-only spoken term detection based on a combination of template matching techniques

Armando Muscariello 1 Guillaume Gravier 1 Frédéric Bimbot 1
1 METISS - Speech and sound data modeling and processing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : Spoken term detection is a well-known information retrieval task that seeks to extract contentful information from audio by locating occurrences of known query words of interest. This paper describes a zero-resource approach to such task based on pattern matching of spoken term queries at the acoustic level. The template matching module comprises the cascade of a segmental variant of dynamic time warping and a self-similarity matrix comparison to further improve robustness to speech variability. This solution notably differs from more traditional train and test methods that, while shown to be very accurate, rely upon the availability of large amounts of linguistic resources. We evaluate our framework on different parameterizations of the speech templates: raw MFCC features and Gaussian posteriorgrams, French and English phonetic posteriorgrams output by two different state of the art phoneme recognizers.
Type de document :
Communication dans un congrès
INTERSPEECH 2011: 12th Annual Conference of the International Speech Communication Association, Aug 2011, Florence, Italy. 2011
Liste complète des métadonnées

Littérature citée [9 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00597907
Contributeur : Armando Muscariello <>
Soumis le : lundi 8 août 2011 - 13:51:55
Dernière modification le : vendredi 16 novembre 2018 - 01:23:47
Document(s) archivé(s) le : mercredi 9 novembre 2011 - 02:21:23

Fichier

IS2011_kspot_2ND.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00597907, version 1

Citation

Armando Muscariello, Guillaume Gravier, Frédéric Bimbot. Zero-resource audio-only spoken term detection based on a combination of template matching techniques. INTERSPEECH 2011: 12th Annual Conference of the International Speech Communication Association, Aug 2011, Florence, Italy. 2011. 〈inria-00597907〉

Partager

Métriques

Consultations de la notice

756

Téléchargements de fichiers

480