Skip to Main content Skip to Navigation
New interface
Conference papers

Detecting and counting overlapping speakers in distant speech scenarios

Abstract : We consider the problem of detecting the activity and counting overlapping speakers in distant-microphone recordings. We treat supervised Voice Activity Detection (VAD), Overlapped Speech Detection (OSD), joint VAD+OSD, and speaker counting as instances of a general Overlapped Speech Detection and Counting (OSDC) task, and we design a Temporal Convolu-tional Network (TCN) based method to address it. We show that TCNs significantly outperform state-of-the-art methods on two real-world distant speech datasets. In particular our best architecture obtains, for OSD, 29.1% and 25.5% absolute improvement in Average Precision over previous techniques on, respectively, the AMI and CHiME-6 datasets. Furthermore, we find that generalization for joint VAD+OSD improves by using a speaker counting objective rather than a VAD+OSD objective. We also study the effectiveness of forced alignment based labeling and data augmentation, and show that both can improve OSD performance.
Document type :
Conference papers
Complete list of metadata
Contributor : Emmanuel Vincent Connect in order to contact the contributor
Submitted on : Wednesday, October 13, 2021 - 8:11:58 AM
Last modification on : Saturday, June 25, 2022 - 7:40:39 PM


Files produced by the author(s)


  • HAL Id : hal-02908241, version 2


Samuele Cornell, Maurizio Omologo, Stefano Squartini, Emmanuel Vincent. Detecting and counting overlapping speakers in distant speech scenarios. INTERSPEECH 2020, Oct 2020, Shanghai, China. ⟨hal-02908241v2⟩



Record views


Files downloads