Skip to Main content Skip to Navigation
New interface
Preprints, Working Papers, ...

LibriMix: An open-source dataset for generalizable speech separation

Joris Cosentino 1 Manuel Pariente 1 Samuele Cornell 2 Antoine Deleforge 1 Emmanuel Vincent 1 
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : In recent years, wsj0-2mix has become the reference dataset for single-channel speech separation. Most deep learning-based speech separation models today are benchmarked on it. However, recent studies have shown important performance drops when models trained on wsj0-2mix are evaluated on other, similar datasets. To address this generalization issue, we created LibriMix, an open-source alternative to wsj0-2mix, and to its noisy extension, WHAM!. Based on LibriSpeech, LibriMix consists of two-or three-speaker mixtures combined with ambient noise samples from WHAM!. Using Conv-TasNet, we achieve competitive performance on all LibriMix versions. In order to fairly evaluate across datasets, we introduce a third test set based on VCTK for speech and WHAM! for noise. Our experiments show that the generalization error is smaller for models trained with LibriMix than with WHAM!, in both clean and noisy conditions. Aiming towards evaluation in more realistic, conversation-like scenarios, we also release a sparsely overlapping version of LibriMix's test set.
Document type :
Preprints, Working Papers, ...
Complete list of metadata

https://hal.inria.fr/hal-03354695
Contributor : Emmanuel Vincent Connect in order to contact the contributor
Submitted on : Saturday, September 25, 2021 - 7:28:20 PM
Last modification on : Thursday, May 5, 2022 - 3:11:28 AM

File

cosentino2020.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03354695, version 1
  • ARXIV : 2005.11262

Citation

Joris Cosentino, Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent. LibriMix: An open-source dataset for generalizable speech separation. 2020. ⟨hal-03354695⟩

Share

Metrics

Record views

75

Files downloads

379