Unsupervised Motif Acquisition in Speech via Seeded Discovery and Template Matching Combination

Armando Muscariello; Frédéric Bimbot; Guillaume Gravier

doi:10.1109/TASL.2012.2194283

Article Dans Une Revue IEEE Transactions on Audio, Speech and Language Processing Année : 2012

Unsupervised Motif Acquisition in Speech via Seeded Discovery and Template Matching Combination

(1) , (1) , (2)

1
2

Armando Muscariello

Fonction : Auteur
PersonId : 894465

Speech and sound data modeling and processing

Frédéric Bimbot

Fonction : Auteur
PersonId : 830967

Speech and sound data modeling and processing

Guillaume Gravier

Fonction : Auteur
PersonId : 1046
IdHAL : guig
ORCID : 0000-0002-2266-5682
IdRef : 110355415

Multimedia content-based indexing

Résumé

This paper describes and evaluates a computational architecture to discover and collect occurrences of speech repetitions, or motifs, in a totally unsupervised fashion, that is in the absence of acoustic, lexical or pronunciation modeling and training material. In the last few years, this task has known an increasing interest from the speech community because of a) its potential applicability in spoken document processing (as a preliminary step to summarization, topic clustering, etc.) and b) its novel methodology, that defines a new paradigm to speech processing that circumvents the issues common to all supervised, trained technologies. The contributions implied by the proposed system are two-fold: 1) the design of a discovery strategy that detects repetitions by extending matches of motif fragments, called seeds; 2) the implementation of template matching techniques to detect acoustically close segments, based on dynamic time warping (DTW) and self-similarity matrix (SSM) comparison of speech templates, in contrast to the decoding procedures of model-based recognition systems. The architecture is thoroughly evaluated on several hours of French broadcast news shows according to various parameter settings and acoustic features, namely mel-frequency cepstral coefficients (MFCCs) and different types of posteriorgrams: Gaussian mixture model (GMM)-based, and phone-based posteriors, in both language-matched and mismatched conditions. The evaluation highlights a) the improved robustness of the system that jointly employs DTW and SSM and b) the relevant impact of language-specific features to acoustic similarity detection based on template matching.

Domaines

Traitement du signal et de l'image [eess.SP] Traitement du signal et de l'image [eess.SP]

Patrick Gros : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00740978

Soumis le : jeudi 11 octobre 2012-14:55:42

Dernière modification le : vendredi 24 mars 2023-14:52:56

Dates et versions

hal-00740978 , version 1 (11-10-2012)

Identifiants

HAL Id : hal-00740978 , version 1
DOI : 10.1109/TASL.2012.2194283

Citer

Armando Muscariello, Frédéric Bimbot, Guillaume Gravier. Unsupervised Motif Acquisition in Speech via Seeded Discovery and Template Matching Combination. IEEE Transactions on Audio, Speech and Language Processing, 2012, 20 (7), pp.2031 - 2044. ⟨10.1109/TASL.2012.2194283⟩. ⟨hal-00740978⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

EC-PARIS UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA IRISA-D5 INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES INSA-GROUPE UR1-MATH-NUM

196 Consultations

0 Téléchargements

Unsupervised Motif Acquisition in Speech via Seeded Discovery and Template Matching Combination

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager