. Arora, Unsupervised Segmentation of Objects using Efficient Learning, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383011

Z. Aytar, Aytar and A. Zisserman. Tabula rasa: Model transfer for object category detection, ICCV, pp.4-5, 2011.

S. Bagon, O. Brostovski, M. Galun, and M. Irani, Detecting and sketching the common, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540233

. Bay, SURF: Speeded up robust features, CVIU, vol.110, issue.3, pp.346-359, 2008.
DOI : 10.1007/11744023_32

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.679.3046

M. Blaschko, A. Vedaldi, and A. Zisserman, Simulatenous object detection and ranking with weak supervision

]. A. Bobick and J. Davis, The recognition of human movement using temporal templates, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.23, issue.3, pp.257-267, 2001.
DOI : 10.1109/34.910878

S. Borenstein and . Ullman, Learning to Segment, ECCV, 2004. 1.3.2
DOI : 10.1007/978-3-540-24672-5_25

. Bosch, Representing shape with a spatial pyramid kernel, Proceedings of the 6th ACM international conference on Image and video retrieval, CIVR '07, 2007.
DOI : 10.1145/1282280.1282340

M. Breitenstein, F. Reichlin, and L. V. , Robust trackingby-detection using a detector confidence particle filter, ICCV, 2009.

]. T. Brox and J. Malik, Object Segmentation by Long Term Analysis of Point Trajectories, 2010.
DOI : 10.1007/978-3-642-15555-0_21

M. Brox, ]. T. Brox, and J. Malik, Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.33, issue.3, pp.2011-2014
DOI : 10.1109/TPAMI.2010.143

]. C. Desai-2010, D. Desai, and . Ramanan, Discriminative models for static humanobject interactions, Workshop on Structued Models in Computer Vision, Computer Vision and Pattern Recognition (SMiCV) in Conjunction with CVPR, 2010.

]. L. Cao and F. Li, Spatially Coherent Latent Topic Model for Concurrent Segmentation and Classification of Objects and Scenes, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4408965

L. Chen, A. Zhu, and . Yuille, Active Mask Hierarchies for Object Detection, ECCV, 2010.
DOI : 10.1007/978-3-642-15555-0_4

A. Zisserman, An exemplar model for learning object classes, CVPR, 2007.

. Comaniciu, The variable bandwidth mean shift and data-driven scale selection, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, pp.438-445, 2001.
DOI : 10.1109/ICCV.2001.937550

]. D. Crandall and D. Huttenlocher, Weakly Supervised Learning of Part-Based Spatial Models for Visual Object Recognition, ECCV, 2006.
DOI : 10.1007/11744023_2

. Crandall, Discrete-continuous optimization for large-scale structure from motion, CVPR, pp.2011-2016, 2011.

. Deng, ImageNet: A large-scale hierarchical image database, CVPR, 2009.

C. Desai, D. Ramanan, and C. Folkess, Discriminative models for multi-class object layout, ICCV, 2009.

V. Alexe and . Ferrari, Localizing objects while learning their appearance

. Dollar, Behavior Recognition via Sparse Spatio-Temporal Features, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005.
DOI : 10.1109/VSPETS.2005.1570899

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.77.5712

. Duchenne, Automatic annotation of human actions in video, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459279

A. A. Efros, A. C. Berg, G. Mori, and J. Malik, Recognizing action at a distance, Proceedings Ninth IEEE International Conference on Computer Vision, 2003.
DOI : 10.1109/ICCV.2003.1238420

]. M. Eichner and V. Ferrari, Better appearance models for pictorial structures, Procedings of the British Machine Vision Conference 2009, 2002.
DOI : 10.5244/C.23.3

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.149.8906

. Everingham, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, 2007.
DOI : 10.1007/s11263-009-0275-4

. Everingham, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, 2007.
DOI : 10.1007/s11263-009-0275-4

. Everingham, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, 2008.
DOI : 10.1007/s11263-009-0275-4

R. B. Felzenszwalb, D. Girshick, D. Mcallester, and . Ramanan, Object Detection with Discriminatively Trained Part-Based Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.9
DOI : 10.1109/TPAMI.2009.167

. Felzenszwalb, Cascade object detection with deformable part models, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5539906

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.164.8688

]. R. Fergus and P. Perona, Caltech object category datasets, 2003.

. Ferrari, Progressive search space reduction for human pose estimation, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587468

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.321.2867

]. R. Filipovych and E. Ribeiro, Recognizing primitive interactions by exploring actor-object states, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587726

]. R. Filipovych and E. Ribeiro, Robust sequence alignment for actor???object interaction recognition: Discovering actor???object states, Computer Vision and Image Understanding, vol.115, issue.2, 2010.
DOI : 10.1016/j.cviu.2010.11.012

. Gaidon, Actom sequence models for efficient action detection, CVPR 2011
DOI : 10.1109/CVPR.2011.5995646

URL : https://hal.archives-ouvertes.fr/inria-00575217

. Galleguillos, Weakly Supervised Object Localization with Stable Segmentations, ECCV, 2008.
DOI : 10.1007/978-3-540-88682-2_16

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.210.4047

]. P. Gehler and S. Nowozin, On feature combination for multiclass object classification, 2009 IEEE 12th International Conference on Computer Vision, pp.2-6, 2009.
DOI : 10.1109/ICCV.2009.5459169

. Gorelick, Actions as Space-Time Shapes, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.29, issue.12, 2007.
DOI : 10.1109/TPAMI.2007.70711

. Grabner, ]. H. Bischof, H. Grabner, and . Bischof, On-line Boosting and Vision, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 1 (CVPR'06), 2002.
DOI : 10.1109/CVPR.2006.215

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.214.8804

. Grabner, Semi-supervised On-Line Boosting for Robust Tracking, ECCV, 2008.
DOI : 10.1007/978-3-540-88682-2_19

. Grubinger, The iapr benchmark: A new evaluation resource for visual information systems, International Conference on Language Resources and Evaluation, 2001.

. Gupta, Context and observation driven latent variable model for human pose estimation, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587511

. Gupta, Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition, PAMI, 2009.
DOI : 10.1109/TPAMI.2009.83

. Harzallah, Combining efficient object localization and image classification, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459257

URL : https://hal.archives-ouvertes.fr/inria-00439516

. Heusch, Local Binary Patterns as an Image Preprocessing for Face Authentication, 7th International Conference on Automatic Face and Gesture Recognition (FGR06), 2006.
DOI : 10.1109/FGR.2006.72

D. A. Ikizler and . Forsyth, Searching for Complex Human Activities with??No??Visual??Examples, International Journal of Computer Vision, vol.26, issue.9, 2008.
DOI : 10.1007/s11263-008-0142-8

. Ikizler, Recognizing actions from still images, 2008 19th International Conference on Pattern Recognition, 2008.
DOI : 10.1109/ICPR.2008.4761663

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.159.265

G. Cinbis, S. Cinbis, and . Sclaroff, Learning actions from the web, ICCV, 2002.

]. R. Johansson and P. Nugues, Dependency-based syntactic-semantic analysis with PropBank and NomBank, Proceedings of the Twelfth Conference on Computational Natural Language Learning, CoNLL '08, pp.183-187, 2008.
DOI : 10.3115/1596324.1596355

Y. Kalal, J. Matas, and K. Mikolajcyzk, P-N learning: Bootstrapping binary classifiers from unlabeled data by structural constraints, CVPR, 2002.
DOI : 10.1109/cvpr.2010.5540231

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.231.4328

]. Kim and J. Kim, An adaptive shot change detection algorithm using an average of absolute difference histogram within extension sliding window, ISCE, 2009.

]. G. Kim and A. Torralba, Unsupervised detection of regions of interest using iterative link analysis, NIPS, 2009. 1.3.2, 2009.

A. Kläser, M. Marsza?ek, C. Schmid, and A. Zisserman, Human Focused Action Localization in Video, International Workshop on Sign, Gesture, and Activity (SGA) in conjunction with ECCV
DOI : 10.1007/978-3-642-35749-7_17

]. V. Kolmogorov, Convergent Tree-Reweighted Message Passing for Energy Minimization, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.28, issue.10, pp.1568-1583, 2006.
DOI : 10.1109/TPAMI.2006.200

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.100.2409

. Kulis, What you saw is not what you get: Domain adaptation using asymmetric kernel transforms, CVPR 2011, pp.2011-2016, 2011.
DOI : 10.1109/CVPR.2011.5995702

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.294.2663

T. Ladicky, ]. L. Ladicky, and P. H. Torr, Locally linear support vector machines, ICML, pp.5-8, 2011.

]. I. Laptev and P. Perez, Retrieving actions in movies, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4409105

. Laptev, Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2002.
DOI : 10.1109/CVPR.2008.4587756

URL : https://hal.archives-ouvertes.fr/inria-00548659

. Lazebnik, Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.68

URL : https://hal.archives-ouvertes.fr/inria-00548585

]. Y. Lee and K. Grauman, Shape discovery from unlabeled image collections, CVPR, 2009.

. Lee, . J. Grauman-2011-]-y, K. Lee, and . Grauman, Learning the easy things first: Self-paced visual category discovery, CVPR 2011, 2002.
DOI : 10.1109/CVPR.2011.5995523

. Leistner, Improving classifiers with unlabeled weakly-related videos, CVPR 2011
DOI : 10.1109/CVPR.2011.5995475

F. Li, L. Li, and . Fei-fei, What, where and who? Classifying events by scene and object recognition, 2007 IEEE 11th International Conference on Computer Vision, 2002.
DOI : 10.1109/ICCV.2007.4408872

. Liu, Recognizing realistic actions from videos in the wild, 2009.

]. D. Lowe, Object recognition from local scale-invariant features, Proceedings of the Seventh IEEE International Conference on Computer Vision, pp.1150-1157, 1999.
DOI : 10.1109/ICCV.1999.790410

. Maji, Classification using intersection kernel support vector machines is efficient, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587630

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.141.3974

. Matikainen, Representing Pairwise Spatial and Temporal Relations for Action Recognition
DOI : 10.1007/978-3-642-15549-9_37

. Messing, Activity recognition using the velocity histories of tracked keypoints, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459154

]. K. Mikolajczyk and H. Uemura, Action recognition with motion-appearance vocabulary forest, 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2002.
DOI : 10.1109/CVPR.2008.4587628

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.323.8584

L. Nguyen, F. Torresani, C. De-la-torre, and . Rother, Weakly supervised discriminative localization and classification: a joint learning process, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459426

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.153.2127

. Niebles, Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification
DOI : 10.1007/978-3-642-15552-9_29

B. Ochs, ]. P. Ochs, and T. Brox, Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions, 2011 International Conference on Computer Vision, pp.5-8, 2011.
DOI : 10.1109/ICCV.2011.6126418

B. Ochs, ]. P. Ochs, and T. Brox, Higher order motion models and spectral clustering, 2012 IEEE Conference on Computer Vision and Pattern Recognition
DOI : 10.1109/CVPR.2012.6247728

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.348.6238

T. Oliva, ]. A. Oliva, and A. Torralba, Modeling the shape of the scene: a holistic representation of the spatial envelope, International Journal of Computer Vision, vol.42, issue.3, pp.145-175, 2001.
DOI : 10.1023/A:1011139631724

T. Mader and J. M. Buhmann, Seeing the objects behind the dots: Recognition in videos from a moving camera, IJCV, vol.83, issue.1, pp.57-71, 2009.

]. S. Pan and Q. Yang, A Survey on Transfer Learning, IEEE Transactions on Knowledge and Data Engineering, vol.22, issue.10, 2010.
DOI : 10.1109/TKDE.2009.191

L. Pandey, ]. M. Pandey, and S. Lazebnik, Scene recognition and weaklysupervised object localization with deformable part-based models, ICCV, 2011.
DOI : 10.1109/iccv.2011.6126383

. Prest, Weakly supervised learning of interactions between humans and objects. TPAMI (accepted for publication
URL : https://hal.archives-ouvertes.fr/inria-00516477

. Prest, Explicit modeling of humanobject interactions in realistic videos. TPAMI (accepted for publication, pp.2012-2017
URL : https://hal.archives-ouvertes.fr/hal-00720847

. Prest, Learning object class detectors from weakly annotated video, 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.2012-2017
DOI : 10.1109/CVPR.2012.6248065

URL : https://hal.archives-ouvertes.fr/hal-00695940

. Ramanan, Building models of animals from video, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.28, issue.8, 2006.
DOI : 10.1109/TPAMI.2006.155

. Ramanan, Tracking People by Learning Their Appearance, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.29, issue.1, pp.65-81, 2002.
DOI : 10.1109/TPAMI.2007.250600

. Rodriguez, Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587727

]. Y. Rodriguez, Face Detection and Verification using Local Binary Patterns, 2006.

]. S. Satkin and M. Hebert, Modeling the Temporal Extent of Actions, 2010.
DOI : 10.1007/978-3-642-15549-9_39

I. Laptev and B. Caputo, Recognizing human actions: A local svm approach, Pattern Recognition, International Conference on, vol.3, pp.32-36

. Sharma, Discriminative spatial saliency for image classification, 2012 IEEE Conference on Computer Vision and Pattern Recognition
DOI : 10.1109/CVPR.2012.6248093

URL : https://hal.archives-ouvertes.fr/hal-00714311

X. Siva, ]. P. Siva, and T. Xiang, Weakly supervised object detector learning with model drift detection, 2011 International Conference on Computer Vision
DOI : 10.1109/ICCV.2011.6126261

. Sivic, Person Spotting: Video Shot Retrieval for Face Sets, CIVR, 2005. 3.4, 2005.
DOI : 10.1007/11526346_26

. Sivic, Who are you? " ? Learning person specific classifiers from video, CVPR, 2009.
DOI : 10.1109/cvpr.2009.5206513

C. Sullivan, ]. J. Sullivan, and S. Carlsson, Recognizing and Tracking Human Action, ECCV '02: Proceedings of the 7th European Conference on Computer Vision-Part I, pp.629-644, 2002.
DOI : 10.1007/3-540-47969-4_42

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.163.4755

. Sundaram, Dense Point Trajectories by GPU-Accelerated Large Displacement Optical Flow
DOI : 10.1007/978-3-642-15549-9_32

]. S. Todorovic and N. Ahuja, Extracting Subimages of an Unknown Category from a Set of Images, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 1 (CVPR'06), 2002.
DOI : 10.1109/CVPR.2006.116

. Tommasi, Safety in numbers: Learning categories from few examples with multi model knowledge transfer, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540064

. Tsochantaridis, Large margin methods for structured and interdependent output variables, JMLR, vol.63, issue.43, pp.1453-1484, 2005.

. Vedaldi, Multiple kernels for object detection, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459183

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.163.5316

G. Vijayanarasimhan, ]. S. Vijayanarasimhan, and K. Grauman, Largescale live active learning: Training object detectors with crawled data and crowds
DOI : 10.1109/cvpr.2011.5995430

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.221.799

M. Jones, Rapid object detection using a boosted cascade of simple features, CVPR, pp.511-518, 2001.

. Wang, Unsupervised Discovery of Action Classes, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.321

G. Willems, J. H. Becker, T. Tuytelaars, and L. Van-gool, Exemplar-based Action Recognition in Video, Procedings of the British Machine Vision Conference 2009
DOI : 10.5244/C.23.90

J. Winn, ]. J. Winn, and N. Jojic, LOCUS: learning object classes with unsupervised segmentation, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, 2005.
DOI : 10.1109/ICCV.2005.148

. Winn, Object categorization by learned universal visual dictionary, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, 2005.
DOI : 10.1109/ICCV.2005.171

. Wu, Tracking with Dynamic Hidden-State Shape Models, ECCV, 2008.
DOI : 10.1007/978-3-540-88682-2_49

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.144.3412

. Wu, Chaotic invariants of Lagrangian particle trajectories for anomaly detection in crowded scenes, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5539882

Y. and D. Ramanan, Articulated pose estimation with flexible mixtures-of-parts, CVPR, pp.5-6, 2011.

Y. Yang, G. Wang, and . Mori, Recognizing human actions from still images with latent poses, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5539879

F. Yao, L. Yao, and . Fei-fei, Grouplet: A structured image representation for recognizing human and object interactions, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
DOI : 10.1109/CVPR.2010.5540234

F. Yao, L. Yao, and . Fei-fei, Modeling mutual context of object and human pose in human-object interaction activities, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
DOI : 10.1109/CVPR.2010.5540235

. Yao, Human action recognition by learning bases of action attributes and parts, 2011 International Conference on Computer Vision, 2002.
DOI : 10.1109/ICCV.2011.6126386

. Yao, Combining randomization and discrimination for fine-grained image categorization, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995368

]. A. Yilmaz and M. Shah, Actions Sketch: A Novel Action Representation, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005.
DOI : 10.1109/CVPR.2005.58

. Zanetti, Zelnik-Manor, and P. Perona. A walk through the web's video clips, CVPRW, 2005.

]. L. Zelnik-manor and M. Irani, Event-based analysis of video, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 2001.
DOI : 10.1109/CVPR.2001.990935

. Zhang, Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study, International Journal of Computer Vision, vol.36, issue.1, 2007.
DOI : 10.1007/s11263-006-9794-4

URL : https://hal.archives-ouvertes.fr/inria-00548574

. Zhang, Boosted local structured HOG-LBP for object localization, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995678

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.362.6498

B. Place and . Belluno, Switzerland Doctoral studies Italy Studies of Computer Science and Information Technology Graduation cum laude, Istituto Tecnico Commerciale P.F. Calvi Isomorph srl, 1999.