Skip to Main content Skip to Navigation
Conference papers

Detecting and counting overlapping speakers in distant speech scenarios

Abstract : We consider the problem of detecting the activity and counting overlapping speakers in distant-microphone recordings. We treat supervised Voice Activity Detection (VAD), Overlapped Speech Detection (OSD), joint VAD+OSD, and speaker counting as instances of a general Overlapped Speech Detection and Counting (OSDC) task, and we design a Temporal Convolu-tional Network (TCN) based method to address it. We show that TCNs significantly outperform state-of-the-art methods on two real-world distant speech datasets. In particular our best architecture obtains, for OSD, 29.1% and 25.5% absolute improvement in Average Precision over previous techniques on, respectively, the AMI and CHiME-6 datasets. Furthermore, we find that generalization for joint VAD+OSD improves by using a speaker counting objective rather than a VAD+OSD objective. We also study the effectiveness of forced alignment based labeling and data augmentation, and show that both can improve OSD performance.
Document type :
Conference papers
Complete list of metadata

Cited literature [36 references]  Display  Hide  Download

https://hal.inria.fr/hal-02908241
Contributor : Emmanuel Vincent <>
Submitted on : Tuesday, July 28, 2020 - 3:52:09 PM
Last modification on : Wednesday, May 19, 2021 - 4:52:03 PM
Long-term archiving on: : Tuesday, December 1, 2020 - 9:00:34 AM

File

cornell_IS20 (1).pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02908241, version 1

Collections

Citation

Samuele Cornell, Maurizio Omologo, Stefano Squartini, Emmanuel Vincent. Detecting and counting overlapping speakers in distant speech scenarios. INTERSPEECH 2020, Oct 2020, Shanghai, China. ⟨hal-02908241⟩

Share

Metrics

Record views

208

Files downloads

676