Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision

Abstract : We tackle the problem of audiovisual scene analysis for weakly-labeled data. To this end, we build upon our previous audiovisual representation learning framework to perform object classification in noisy acoustic environments and integrate audio source enhancement capability. This is made possible by a novel use of non-negative matrix factorization for the audio modality. Our approach is founded on the multiple instance learning paradigm. Its effectiveness is established through experiments over a challenging dataset of music instrument performance videos. We also show encouraging visual object localization results.
Complete list of metadata

Cited literature [31 references]  Display  Hide  Download

https://hal.inria.fr/hal-01914532
Contributor : Alexey Ozerov <>
Submitted on : Wednesday, November 7, 2018 - 12:29:24 AM
Last modification on : Tuesday, February 2, 2021 - 12:26:06 AM
Long-term archiving on: : Friday, February 8, 2019 - 12:28:05 PM

Files

weak_nmf_prop.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01914532, version 1
  • ARXIV : 1811.04000

Collections

Citation

Sanjeel Parekh, Alexey Ozerov, Slim Essid, Ngoc Duong, Patrick Pérez, et al.. Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision. 2018. ⟨hal-01914532⟩

Share

Metrics

Record views

133

Files downloads

263