Abstract : Action recognition is an important yet challenging problem in many applications. Recently, neural network and deep learning approaches have been widely applied to action recognition and yielded impressive results. In this paper, we present a spatial-temporal neural network model to recognize human actions in videos. This network is composed of two connected structures. A two-stream-based network extracts appearance and optical flow features from video frames. This network characterizes spatial information of human actions in videos. A group of LSTM structures following the spatial network describe the temporal information of human actions. We test our model with data from two public datasets and the experimental results show that our method improves the action recognition accuracy compared to the baseline methods.
https://hal.inria.fr/hal-01821062 Contributor : Hal IfipConnect in order to contact the contributor Submitted on : Friday, June 22, 2018 - 11:45:34 AM Last modification on : Wednesday, June 10, 2020 - 10:00:04 AM Long-term archiving on: : Tuesday, September 25, 2018 - 3:59:33 PM