Skip to Main content Skip to Navigation
New interface
Master thesis

Online learning for audio clustering and segmentation

Alberto Bietti 1, 2 
1 MuTant - Synchronous Realtime Processing and Programming of Music Signals
IRCAM - Institut de Recherche et Coordination Acoustique/Musique, Inria Paris-Rocquencourt, UPMC - Université Pierre et Marie Curie - Paris 6, CNRS - Centre National de la Recherche Scientifique
2 SIERRA - Statistical Machine Learning and Parsimony
DI-ENS - Département d'informatique - ENS Paris, Inria Paris-Rocquencourt, CNRS - Centre National de la Recherche Scientifique : UMR8548
Abstract : Audio segmentation is an essential problem in many audio signal processing tasks which tries to segment an audio signal into homogeneous chunks, or segments. Most current approaches rely on a change-point detection phase for finding segment boundaries, followed by a similarity matching phase which identifies similar segments. In this thesis, we focus instead on joint segmentation and clustering algorithms which solve both tasks simultaneously, through the use of unsupervised learning techniques in sequential models. Hidden Markov and semi-Markov models are a natural choice for this modeling task, and we present their use in the context of audio segmentation. We then explore the use of online learning techniques in sequential models and their application to real-time audio segmentation tasks. We present an existing online EM algorithm for hidden Markov models and extend it to hidden semi-Markov models by introducing a different parameterization of semi-Markov chains. Finally, we develop new online learning algorithms for sequential models based on incremental optimization of surrogate functions.
Complete list of metadata

Cited literature [43 references]  Display  Hide  Download
Contributor : Alberto Bietti Connect in order to contact the contributor
Submitted on : Thursday, October 9, 2014 - 2:21:32 AM
Last modification on : Thursday, March 17, 2022 - 10:08:44 AM
Long-term archiving on: : Saturday, January 10, 2015 - 10:10:36 AM


Files produced by the author(s)


  • HAL Id : hal-01064672, version 2



Alberto Bietti. Online learning for audio clustering and segmentation. Machine Learning [cs.LG]. 2014. ⟨hal-01064672v2⟩



Record views


Files downloads