Phoneme segmentation and Voice activity detection

Joshua Winebarger

Résumé

This internship was intended to be a continuation of my work last year with the same team, whose focus is non-linear methods for complex signal analysis using concepts of scale invariance and particularly the development of a new multiscale microcanonical formalism (MMF). While the fields of application of this new formalism are diverse, one of them is speech processing. My contribution was exploratory research into innovative methods for text-independent phoneme segmentation which conform to a "linear" model, the goal being to provide a performance comparison with the "non-linear" MMF-based methods under development by the other team members. This year I focused on two areas: a continuation of last year's work in phoneme segmentation, and implementation of voice activity detection algorithms. For the continuation of last year's work, I performed experiments with more rigor in order to better understand the results I obtained last year. I re-examined the algorithms I implemented last year and corrected discrepancies, and brought the implementations closer into line with standard practice. Some of the work to this end is described in a section in the Appendix A. I performed the requisite experiments to evaluate the performance of these methods on a standard database used for phoneme segmentation. I continued past this point with experiments on two other segmentation methods, in preparation for publication of a comprehensive journal paper. I made improvements to the functioning some of these methods, and in some instances I was able to improve the performance of the algorithms. In addition to phoneme segmentation, the team is interested in applying the MMF to the field of Voice Activity Detection (VAD). It was desired that I implement several so-called "classical" VAD algorithms to serve as a basis for comparison for the new, non-linear algorithms which will be developed by the team in the future. As such I implemented four VAD algorithms commonly used as references in the literature to function as a standard reference for the new methods being developed. Further, I implemented a framework for evaluation of VAD algorithms. This consisted in devising methods for generating test databases for use in evaluating the performance of VAD algorithms and implementing them in code. Also under this effort, I wrote programs for scoring the output of these algorithms. I adapted existing code for two standard VADs to function within this framework, and finally evaluated these VADs under different conditions.

Phoneme segmentation and Voice activity detection

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager