Sound source separation

Abstract : When processing a sound recording, sound engineers often face the need to apply specific digital audio effects to certain sounds only. For instance, the remastering of a music recording may require to correct the tuning of a mistuned instrument or relocate that instrument in space without affecting the sound of other instruments. This operation is straightforward when these sounds are available as separate tracks but becomes quite difficult otherwise. Indeed, the digital audio effects reviewed in this book all apply to the recording as a whole. Source separation refers to the range of techniques aiming to extract the signals of individual sound sources from a given recording. The input recording is called mixture signal. The estimated source signals can then be separately processed and added back together for remastering purposes. In this scenario, the number of mixture channels is typically equal to one or two or more rarely up to five, while the number of sources ranges from two to ten or more. The need for source separation also arises in many other application scenarios, such as speech enhancement for hearing aids, high-quality upmixing of mono or stereo content to 3D sound formats and automatic speech and speaker recognition in multi-talker environments. Source separation is a recent field of research compared to the other audio effects reviewed in this book, so that most techniques are less mature and cannot address the above applications scenarios to date. Yet, some established techniques are gradually finding their way to the industry and will soon be part of professional or general consumer software. This chapter will provide an overview of these established techniques as well as more recent ones.
