Abstract : This paper presents a content-aware attention network (CatNet) for action recognition task, which can leverage attention mechanism to aggregate frame-level features into a compact video-level representation. Unlike most previous methods that consider every video frame equally, our CatNet contains an attention module which can adaptively emphasize the representative frames, and thus can benefit the action recognition task. Moreover, the CatNet can take an action video with arbitrary length yet produce a compact video representation with fixed length. The attention module consists of two cascaded blocks, an adaptive attention weighting block and a content-aware weighting block. The experiments are carried on two challenging video action datasets, i.e., the UCF-101 dataset and HMDB-51 dataset. Our method achieves significantly improvements on both datasets compared with existing methods. The results show that our proposed CatNet is able to focus on the representative frames corresponding to a specific action category, and meanwhile significantly improve the recognition performance.
https://hal.inria.fr/hal-01821028
Contributor : Hal Ifip <>
Submitted on : Friday, June 22, 2018 - 11:43:57 AM Last modification on : Wednesday, June 10, 2020 - 10:00:04 AM Long-term archiving on: : Tuesday, September 25, 2018 - 11:44:52 AM
Ziyi Liu, Le Wang, Nanning Zheng. Content-Aware Attention Network for Action Recognition. 14th IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI), May 2018, Rhodes, Greece. pp.109-120, ⟨10.1007/978-3-319-92007-8_10⟩. ⟨hal-01821028⟩