Joint Attention for Automated Video Editing - Archive ouverte HAL Access content directly
Conference Papers Year :

Joint Attention for Automated Video Editing

(1, 2) , (3) , (4) , (4) , (5)
1
2
3
4
5

Abstract

Joint attention refers to the shared focal points of attention for occupants in a space. In this work, we introduce a computational definition of joint attention for the automated editing of meetings in multi-camera environments from the AMI corpus. Using extracted head pose and individual headset amplitude as features, we developed three editing methods: (1) a naive audio-based method that selects the camera using only the headset input, (2) a rule-based edit that selects cameras at a fixed pacing using pose data, and (3) an editing algorithm using LSTM (Long-short term memory) learned joint-attention from both pose and audio data, trained on expert edits. The methods are evaluated qualitatively against the human edit, and quantitatively in a user study with 22 participants. Results indicate that LSTM-trained joint attention produces edits that are comparable to the expert edit, offering a wider range of camera views than audio, while being more generalizable as compared to rule-based methods.
Fichier principal
Vignette du fichier
imx-2020-final-sigchi.pdf (4.63 Mo) Télécharger le fichier
Origin : Publisher files allowed on an open archive
Loading...

Dates and versions

hal-02960390 , version 1 (09-10-2020)

Identifiers

Cite

Hui-Yin Wu, Trevor Santarra, Michael Leece, Rolando Vargas, Arnav Jhala. Joint Attention for Automated Video Editing. IMX 2020 - ACM International Conference on Interactive Media Experiences, Jun 2020, Barcelona, Spain. pp.55-64, ⟨10.1145/3391614.3393656⟩. ⟨hal-02960390⟩
74 View
80 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More