Regression versus classification for neural network based audio source localization

Abstract : We compare the performance of regression and classification neural networks for single-source direction-of-arrival estimation. Since the output space is continuous and structured, regression seems more appropriate. However, classification on a discrete spherical grid is widely believed to perform better and is predominantly used in the literature. For regression, we propose two ways to account for the spherical geometry of the output space based either on the angular distance between spherical coordinates or on the mean squared error between Cartesian coordinates. For classification, we propose two alternatives to the classical one-hot encoding framework: we derive a Gibbs distribution from the squared angular distance between grid points and use the corresponding probabilities either as soft targets or as cross-entropy weights that retain a clear probabilis-tic interpretation. We show that regression on Cartesian coordinates is generally more accurate, except when localized interference is present, in which case classification appears to be more robust.
Complete list of metadatas

Cited literature [33 references]  Display  Hide  Download

https://hal.inria.fr/hal-02125985
Contributor : Lauréline Perotin <>
Submitted on : Friday, May 10, 2019 - 5:39:00 PM
Last modification on : Thursday, July 18, 2019 - 1:30:36 AM

File

waspaa_regression_vs_classif_l...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02125985, version 1

Citation

Lauréline Perotin, Alexandre Défossez, Emmanuel Vincent, Romain Serizel, Alexandre Guérin. Regression versus classification for neural network based audio source localization. 2019. ⟨hal-02125985v1⟩

Share

Metrics

Record views

225

Files downloads

352