R. 1. Aggarwal, J. K. Park, and S. , Human motion: modeling and recognition of actions and interactions, Proceedings. 2nd International Symposium on 3D Data Processing, Visualization and Transmission, 2004. 3DPVT 2004., pp.640-647, 2004.
DOI : 10.1109/TDPVT.2004.1335299

J. Allen and G. Ferguson, Actions and Events in Interval Temporal Logic, Journal of Logic and Computation, vol.4, issue.5, p.531, 1994.
DOI : 10.1093/logcom/4.5.531

M. Arens, R. Gerber, and H. H. Nagel, Conceptual representations between video signals and natural language descriptions, Image and Vision Computing, vol.26, issue.1, pp.53-66, 2008.
DOI : 10.1016/j.imavis.2005.07.026

M. Breitenstein, F. Reichlin, B. Leibe, E. Koller, and L. Van-gool, Online multiperson tracking-by-detection from a single, 2010.

H. Buxton, Learning and understanding dynamic scene activity: a review, Image and Vision Computing, vol.21, issue.1, pp.125-136, 2003.
DOI : 10.1016/S0262-8856(02)00127-0

N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp.886-893, 2005.
DOI : 10.1109/CVPR.2005.177

URL : https://hal.archives-ouvertes.fr/inria-00548512

A. Ellis, A. Shahrokni, and J. Ferryman, PETS2009 and Winter-PETS 2009 results: A combined evaluation, 2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, pp.1-8, 2009.
DOI : 10.1109/PETS-WINTER.2009.5399728

M. Enzweiler and D. M. Gavrila, Monocular Pedestrian Detection: Survey and Experiments, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, issue.12, pp.2179-2195, 2009.
DOI : 10.1109/TPAMI.2008.260

S. Gammeter, A. Ess, T. Jäggli, K. Schindler, B. Leibe et al., Articulated Multi-body Tracking under Egomotion, Proc. ECCV, pp.816-830, 2008.
DOI : 10.1007/978-3-540-88688-4_60

R. Gerber and H. H. Nagel, Representation of occurrences for road vehicle traffic, Artificial Intelligence, vol.172, issue.4-5, pp.351-391, 2008.
DOI : 10.1016/j.artint.2007.07.001

J. Gonzàlez, D. Rowe, J. Varona, and F. X. Roca, Understanding dynamic scenes based on human sequence evaluation. IVC, Special Section: Computer Vision Methods for, Ambient Intelligence, vol.27, issue.10, pp.1433-1444, 2009.

G. Guerra-filho and Y. Aloimonos, A Language for Human Action, Computer, vol.40, issue.5, pp.42-51, 2007.
DOI : 10.1109/MC.2007.154

I. Haritaoglu, D. Harwood, and L. Davis, W4s: A real-time system for detecting and tracking people in 2.5 d, Proc. ECCV, pp.877-886, 1998.

Y. Ivanov and A. Bobick, Recognition of visual activities and interactions by stochastic parsing, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.22, issue.8, pp.852-872, 2000.
DOI : 10.1109/34.868686

K. Jüngling and M. Arens, Local Feature Based Person Reidentification in Infrared Image Sequences, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance, pp.448-454, 2010.
DOI : 10.1109/AVSS.2010.75

K. Jüngling and M. Arens, Pedestrian tracking in infrared from moving vehicles, 2010 IEEE Intelligent Vehicles Symposium, pp.470-477, 2010.
DOI : 10.1109/IVS.2010.5548132

G. Lavee, E. Rivlin, and M. Rudzsky, Understanding Video Events: A Survey of Methods for Automatic Interpretation of Semantic Occurrences in Video, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol.39, issue.5, pp.489-504, 2009.
DOI : 10.1109/TSMCC.2009.2023380

B. Leibe, A. Leonardis, and B. Schiele, Robust Object Detection with Interleaved Categorization and Segmentation, International Journal of Computer Vision, vol.73, issue.2, pp.259-289, 2008.
DOI : 10.1007/s11263-007-0095-3

B. Leibe, K. Schindler, and L. V. Gool, Coupled Detection and Trajectory Estimation for Multi-Object Tracking, 2007 IEEE 11th International Conference on Computer Vision, pp.1-8, 2007.
DOI : 10.1109/ICCV.2007.4408936

D. G. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, vol.60, issue.2, pp.91-110, 2004.
DOI : 10.1023/B:VISI.0000029664.99615.94

H. H. Nagel, Steps toward a cognitive vision system, AI Mag, vol.25, issue.2, pp.31-50, 2004.

S. Park and J. Aggarwal, A hierarchical Bayesian network for event recognition of human actions and interactions, Multimedia Systems, vol.22, issue.2, pp.164-179, 2004.
DOI : 10.1007/s00530-004-0148-1

M. Ryoo and J. Aggarwal, Semantic Representation and Recognition of Continued and??Recursive Human Activities, International Journal of Computer Vision, vol.00, issue.1, pp.1-24, 2009.
DOI : 10.1007/s11263-008-0181-1

L. Sigal, A. O. Balan, and M. J. Black, HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human??Motion, International Journal of Computer Vision, vol.74, issue.3, pp.4-27, 2010.
DOI : 10.1007/s11263-009-0273-6

C. Stauffer and W. Grimson, Adaptive background mixture models for real-time tracking, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149), pp.246-252, 1999.
DOI : 10.1109/CVPR.1999.784637

S. Tran and L. Davis, Event Modeling and Recognition Using Markov Logic Networks, Lecture Notes in Computer Science, vol.5303, pp.610-623, 2008.
DOI : 10.1007/978-3-540-88688-4_45

P. Turaga, R. Chellappa, V. Subrahmanian, and O. Udrea, Machine Recognition of Human Activities: A Survey, IEEE Transactions on Circuits and Systems for Video Technology, vol.18, issue.11, pp.1473-1488, 2008.
DOI : 10.1109/TCSVT.2008.2005594

P. Viola and M. Jones, Rapid object detection using a boosted cascade of simple features, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, pp.511-518, 2001.
DOI : 10.1109/CVPR.2001.990517

V. T. Vu, F. Bremond, and M. Thonnat, Automatic video interpretation: a novel algorithm for temporal scenario recognition, Proc. IJCAI, pp.1295-1300, 2003.

B. Wu and R. Nevatia, Detection and Tracking of Multiple, Partially Occluded Humans by Bayesian Combination of Edgelet based Part Detectors, International Journal of Computer Vision, vol.I, issue.4, pp.247-266, 2007.
DOI : 10.1007/s11263-006-0027-7