Phoneme segmentation and Voice activity detection

Abstract : This internship was intended to be a continuation of my work last year with the same team, whose focus is non-linear methods for complex signal analysis using concepts of scale invariance and particularly the development of a new multiscale microcanonical formalism (MMF). While the fields of application of this new formalism are diverse, one of them is speech processing. My contribution was exploratory research into innovative methods for text-independent phoneme segmentation which conform to a "linear" model, the goal being to provide a performance comparison with the "non-linear" MMF-based methods under development by the other team members. This year I focused on two areas: a continuation of last year's work in phoneme segmentation, and implementation of voice activity detection algorithms. For the continuation of last year's work, I performed experiments with more rigor in order to better understand the results I obtained last year. I re-examined the algorithms I implemented last year and corrected discrepancies, and brought the implementations closer into line with standard practice. Some of the work to this end is described in a section in the Appendix A. I performed the requisite experiments to evaluate the performance of these methods on a standard database used for phoneme segmentation. I continued past this point with experiments on two other segmentation methods, in preparation for publication of a comprehensive journal paper. I made improvements to the functioning some of these methods, and in some instances I was able to improve the performance of the algorithms. In addition to phoneme segmentation, the team is interested in applying the MMF to the field of Voice Activity Detection (VAD). It was desired that I implement several so-called "classical" VAD algorithms to serve as a basis for comparison for the new, non-linear algorithms which will be developed by the team in the future. As such I implemented four VAD algorithms commonly used as references in the literature to function as a standard reference for the new methods being developed. Further, I implemented a framework for evaluation of VAD algorithms. This consisted in devising methods for generating test databases for use in evaluating the performance of VAD algorithms and implementing them in code. Also under this effort, I wrote programs for scoring the output of these algorithms. I adapted existing code for two standard VADs to function within this framework, and finally evaluated these VADs under different conditions.
Liste complète des métadonnées

Littérature citée [27 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00647986
Contributeur : Khalid Daoudi <>
Soumis le : lundi 5 décembre 2011 - 14:19:11
Dernière modification le : samedi 17 septembre 2016 - 01:36:45
Document(s) archivé(s) le : vendredi 16 novembre 2012 - 14:20:37

Fichier

Winebarger-report-2011.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00647986, version 1

Collections

Citation

Joshua Winebarger. Phoneme segmentation and Voice activity detection. [Intership report] 2011. 〈hal-00647986〉

Partager

Métriques

Consultations de la notice

319

Téléchargements de fichiers

1323