Skip to Main content Skip to Navigation
New interface
Book sections

Time-frequency processing - Spectral properties

Tuomas Virtanen 1 Emmanuel Vincent 2 Sharon Gannot 3 
2 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Many audio signal processing algorithms typically do not operate on raw time-domain audio signals, but rather on time-frequency representations. A raw audio signal encodes the amplitude of a sound as a function of time. Its Fourier spectrum represents it as a function of frequency, but does not represent variations over time. A time-frequency representation presents the amplitude of a sound as a function of both time and frequency, and is able to jointly account for its temporal and spectral characteristics (Gröchenig, 2001). Time-frequency representations are appropriate for three reasons in our context. First, separation and enhancement often require modeling the structure of sound sources. Natural sound sources have a prominent structure both in time and frequency , which can be easily modeled in the time-frequency domain. Second, the sound sources are often mixed convolutively, and this convolutive mixing process can be approximated with simpler operations in the time-frequency domain. Third natural sounds are more sparsely distributed and overlap less with each other in the time-frequency domain than in the time or frequency domain, which facilitates their separation. In this chapter we introduce the most common time-frequency representations used for source separation and speech enhancement. Section 2.1 describes the procedure for calculating a time-frequency representation and converting it back to the time domain, using the short-time Fourier transform (STFT) as an example. It also presents other common time-frequency representations and their relevance for separation and enhancement. Section 2.2 discusses the properties of sound sources in the time-frequency domain, including sparsity, disjointness, and more complex structures such as harmonicity. Section 2.3 explains how to achieve separation by time-varying filtering in the time-frequency domain. We summarize the main concepts and provide links to other chapters and more advanced topics in Section 2.4.
Document type :
Book sections
Complete list of metadata

Cited literature [17 references]  Display  Hide  Download
Contributor : Emmanuel Vincent Connect in order to contact the contributor
Submitted on : Tuesday, September 25, 2018 - 9:29:43 PM
Last modification on : Tuesday, October 25, 2022 - 4:23:27 PM
Long-term archiving on: : Wednesday, December 26, 2018 - 5:17:31 PM


Files produced by the author(s)


  • HAL Id : hal-01881426, version 1


Tuomas Virtanen, Emmanuel Vincent, Sharon Gannot. Time-frequency processing - Spectral properties. Emmanuel Vincent; Tuomas Virtanen; Sharon Gannot. Audio source separation and speech enhancement, Wiley, 2018, 978-1-119-27989-1. ⟨hal-01881426⟩



Record views


Files downloads