Regression versus classification for neural network based audio source localization

Abstract : We compare the performance of regression and classification neural networks for single-source direction-of-arrival estimation. Since the output space is continuous and structured, regression seems more appropriate. However, classification on a discrete spherical grid is widely believed to perform better and is predominantly used in the literature. For regression, we propose two ways to account for the spherical geometry of the output space based either on the angular distance between spherical coordinates or on the mean squared error between Cartesian coordinates. For classification, we propose two alternatives to the classical one-hot encoding framework: we derive a Gibbs distribution from the squared angular distance between grid points and use the corresponding probabilities either as soft targets or as cross-entropy weights that retain a clear probabilis-tic interpretation. We show that regression on Cartesian coordinates is generally more accurate, except when localized interference is present, in which case classification appears to be more robust.
Complete list of metadatas

Cited literature [33 references]  Display  Hide  Download

https://hal.inria.fr/hal-02125985
Contributor : Lauréline Perotin <>
Submitted on : Wednesday, July 17, 2019 - 4:15:10 PM
Last modification on : Wednesday, August 28, 2019 - 2:13:40 PM

File

waspaa_perotin_camera_ready.pd...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02125985, version 2

Citation

Lauréline Perotin, Alexandre Défossez, Emmanuel Vincent, Romain Serizel, Alexandre Guérin. Regression versus classification for neural network based audio source localization. WASPAA 2019 - IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, IEEE, Oct 2019, New Paltz, United States. ⟨hal-02125985v2⟩

Share

Metrics

Record views

100

Files downloads

513