Zero-resource audio-only spoken term detection based on a combination of template matching techniques

Armando Muscariello; Guillaume Gravier; Frédéric Bimbot

Communication Dans Un Congrès Année : 2011

Zero-resource audio-only spoken term detection based on a combination of template matching techniques

(1) , (1) , (1)

Armando Muscariello

Fonction : Auteur
PersonId : 894465

Speech and sound data modeling and processing

Guillaume Gravier

Fonction : Auteur
PersonId : 1046
IdHAL : guig
ORCID : 0000-0002-2266-5682
IdRef : 110355415

Speech and sound data modeling and processing

Frédéric Bimbot

Fonction : Auteur
PersonId : 830967

Speech and sound data modeling and processing

Résumé

Spoken term detection is a well-known information retrieval task that seeks to extract contentful information from audio by locating occurrences of known query words of interest. This paper describes a zero-resource approach to such task based on pattern matching of spoken term queries at the acoustic level. The template matching module comprises the cascade of a segmental variant of dynamic time warping and a self-similarity matrix comparison to further improve robustness to speech variability. This solution notably differs from more traditional train and test methods that, while shown to be very accurate, rely upon the availability of large amounts of linguistic resources. We evaluate our framework on different parameterizations of the speech templates: raw MFCC features and Gaussian posteriorgrams, French and English phonetic posteriorgrams output by two different state of the art phoneme recognizers.

Domaines

Traitement du signal et de l'image [eess.SP] Traitement du signal et de l'image [eess.SP]

Fichier principal

IS2011_kspot_2ND.pdf (93.44 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Armando Muscariello : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00597907

Soumis le : lundi 8 août 2011-13:51:55

Dernière modification le : vendredi 24 mars 2023-14:52:54

Archivage à long terme le : mercredi 9 novembre 2011-02:21:23

Dates et versions

inria-00597907 , version 1 (08-08-2011)

Identifiants

HAL Id : inria-00597907 , version 1

Citer

Armando Muscariello, Guillaume Gravier, Frédéric Bimbot. Zero-resource audio-only spoken term detection based on a combination of template matching techniques. INTERSPEECH 2011: 12th Annual Conference of the International Speech Communication Association, Aug 2011, Florence, Italy. ⟨inria-00597907⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

EC-PARIS UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA IRISA-D5 INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES INSA-GROUPE UR1-MATH-NUM

462 Consultations

905 Téléchargements

Zero-resource audio-only spoken term detection based on a combination of template matching techniques

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager