A. Bobick and J. Davis, The recognition of human movement 904 using temporal templates, p.905, 2001.

M. Breitenstein, F. Reichlin, and L. Van-gool, Robust tracking- 906 by-detection using a detector confidence particle filter, ICCV, 907, p.908, 2009.

T. Brox and J. Malik, Large displacement optical flow, 2009 IEEE Conference on Computer Vision and Pattern Recognition, p.910, 2011.
DOI : 10.1109/CVPR.2009.5206697

N. Dalal and B. Triggs, Histogram of oriented gradients for 911 human detection, CVPR, p.912, 2005.

P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, Behavior 913 recognition via sparse spatio-temporal features, VS-PETS, 2005.
DOI : 10.1109/vspets.2005.1570899

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.77.5712

O. Duchenne, I. Laptev, J. Sivic, F. Bach, and J. Ponce, Automatic 915 annotation of human actions in video, ICCV, p.916, 2009.

A. Efros, A. Berg, G. Mori, and J. Malik, Recognizing action at a 917 distance, ICCV, p.918, 2003.
DOI : 10.1109/iccv.2003.1238420

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.589.7214

A. Fathi, G. F. Mori-]-p, R. B. Felzenszwalb, D. Girshick, D. Mcallester et al., Action recognition by learning mid-level 924 motion features 926 Object detection with discriminatively trained part based models. 927 PAMI Caltech object category datasets Progressive 931 search space reduction for human pose estimation Robust sequence alignment for 935, CVPR CVPR13] R. Filipovych and E. Ribeiro. Recognizing primitive interactions 933 by exploring actor-object states. In CVPR, pp.925-928, 2003.

]. A. Gaidon, Z. Harchaoui, C. Schmid, L. Gorelick, M. Blank et al., Actom sequence models 938 for efficient action detection 940 Actions as space-time shapes. PAMI On-line boosting and vision Semi-supervised on-line 944 boosting for robust tracking Observing human- 946 object interactions: Using spatial and functional compatibility for 947 recognition. PAMI Searching for complex human 949 activities with no visual examples. IJCV, Actor-Object interaction recognition: Discovering Actor-Object 936 states. CVIU CVPR CVPR ECCV Cinbis, and S. Sclaroff. Learning actions from 951 the web ICCV Mikolajczyk. P-n learning: Bootstrapping 953 binary classifiers by structural constraints CVPR Marsza?ek, C. Schmid, and A. Zisserman. Human 955 focused action localization in video. In International Workshop on 956, pp.937-939, 2007.
DOI : 10.1109/cvpr.2011.5995646

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.648.3429

G. Sign, A. I. Laptev, M. Marsza?ek, C. Schmid, B. Rozenfeld et al., Learning 958 realistic human actions from movies 959 [25] I. Laptev and P. Perez. Retrieving actions in movies Recognizing realistic actions from 962 videos in the wild Classification using intersection 964 kernel support vector machines is efficient, SGA) in conjunction with ECCV CVPR ICCV CVPR CVPR [28] P. Matikainen, M. Hebert, and R. Sukthankar. Representing 966, pp.957-963, 2008.

R. Messing, C. Pal, and H. Kautz, Activity recognition using the 969 velocity histories of tracked keypoints Action recognition with motion- 971 appearance vocabulary forest, pairwise spatial and temporal relations for action recognition. In 967 ECCV ICCV CVPR Niebles, C.-W. Chen, and L. Fei-Fei. Modeling temporal struc- 973 ture of decomposable motion segments for activity classification, pp.968-970, 2008.

E. A. In, C. Prest, V. Schmid, D. Ferrari, D. A. Ramanan et al., Weakly supervised learning 976 of interactions between humans and objects. PAMI Tracking people 978 by learning their appearance. PAMI Action mach: a spatio- 980 temporal maximum average correlation height filter for action 981 recognition, CVPR [35] S. Satkin and M. Hebert. Modeling the temporal extent of actions, pp.975-977, 2007.

E. C. In, I. Schuldt, B. Laptev, J. Caputo, M. Sivic et al., Person spotting: video 987 shot retrieval for face sets 'who are you? ? 989 learning person specific classifiers from video Dense point trajectories by 991 GPU-accelerated large displacement optical flow Exemplar- 993 based action recognition in video Chaotic invariants of lagrangian 995 particle trajectories for anomaly detection in crowded scenes Tracking with dynamic 998 hidden-state shape models Efficient mean-shift 1000 tracking via a new similarity measure Grouplet: A structured image representation 1002 for recognizing human and object interactions Modeling mutual context of object and 1004 human pose in human-object interaction activities Actions sketch: a novel action represen- 1006 tation Event-based analysis of video, Recognizing human actions: 985 A local SVM approach ICPR CIVR, 2005. 988 [38] CVPR ECCV BMVC 996 CVPR ECCV CVPR, 2005. 1001 [44] CVPR CVPR CVPR, 2005. 1007 [47] L. Zelnik-Manor and M. Irani, pp.984-986, 2001.