Skip to Main content Skip to Navigation
Conference papers

Regression versus classification for neural network based audio source localization

Abstract : We compare the performance of regression and classification neural networks for single-source direction-of-arrival estimation. Since the output space is continuous and structured, regression seems more appropriate. However, classification on a discrete spherical grid is widely believed to perform better and is predominantly used in the literature. For regression, we propose two ways to account for the spherical geometry of the output space based either on the angular distance between spherical coordinates or on the mean squared error between Cartesian coordinates. For classification, we propose two alternatives to the classical one-hot encoding framework: we derive a Gibbs distribution from the squared angular distance between grid points and use the corresponding probabilities either as soft targets or as cross-entropy weights that retain a clear probabilis-tic interpretation. We show that regression on Cartesian coordinates is generally more accurate, except when localized interference is present, in which case classification appears to be more robust.
Complete list of metadata

Cited literature [33 references]  Display  Hide  Download
Contributor : Lauréline Perotin Connect in order to contact the contributor
Submitted on : Wednesday, July 17, 2019 - 4:15:10 PM
Last modification on : Friday, January 21, 2022 - 3:09:35 AM


Files produced by the author(s)


  • HAL Id : hal-02125985, version 2


Lauréline Perotin, Alexandre Défossez, Emmanuel Vincent, Romain Serizel, Alexandre Guérin. Regression versus classification for neural network based audio source localization. WASPAA 2019 - IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, IEEE, Oct 2019, New Paltz, United States. ⟨hal-02125985v2⟩



Les métriques sont temporairement indisponibles