Time-frequency processing - Spectral properties

Abstract : Many audio signal processing algorithms typically do not operate on raw time-domain audio signals, but rather on time-frequency representations. A raw audio signal encodes the amplitude of a sound as a function of time. Its Fourier spectrum represents it as a function of frequency, but does not represent variations over time. A time-frequency representation presents the amplitude of a sound as a function of both time and frequency, and is able to jointly account for its temporal and spectral characteristics (Gröchenig, 2001). Time-frequency representations are appropriate for three reasons in our context. First, separation and enhancement often require modeling the structure of sound sources. Natural sound sources have a prominent structure both in time and frequency , which can be easily modeled in the time-frequency domain. Second, the sound sources are often mixed convolutively, and this convolutive mixing process can be approximated with simpler operations in the time-frequency domain. Third natural sounds are more sparsely distributed and overlap less with each other in the time-frequency domain than in the time or frequency domain, which facilitates their separation. In this chapter we introduce the most common time-frequency representations used for source separation and speech enhancement. Section 2.1 describes the procedure for calculating a time-frequency representation and converting it back to the time domain, using the short-time Fourier transform (STFT) as an example. It also presents other common time-frequency representations and their relevance for separation and enhancement. Section 2.2 discusses the properties of sound sources in the time-frequency domain, including sparsity, disjointness, and more complex structures such as harmonicity. Section 2.3 explains how to achieve separation by time-varying filtering in the time-frequency domain. We summarize the main concepts and provide links to other chapters and more advanced topics in Section 2.4.
Document type :
Book sections
Liste complète des métadonnées

Contributor : Emmanuel Vincent <>
Submitted on : Tuesday, September 25, 2018 - 9:29:43 PM
Last modification on : Wednesday, April 3, 2019 - 1:23:14 AM
Document(s) archivé(s) le : Wednesday, December 26, 2018 - 5:17:31 PM


Files produced by the author(s)


  • HAL Id : hal-01881426, version 1



Tuomas Virtanen, Emmanuel Vincent, Sharon Gannot. Time-frequency processing - Spectral properties. Emmanuel Vincent; Tuomas Virtanen; Sharon Gannot. Audio source separation and speech enhancement, Wiley, 2018, 978-1-119-27989-1. 〈hal-01881426〉



Record views


Files downloads