P. Agrawal, J. Carreira, and J. Malik, Learning to See by Moving, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.13

S. Andrews, I. Tsochantaridis, and T. Hofmann, Support vector machines for multiple-instance learning, NIPS, 2003.

T. Baltru?aitis, C. Ahuja, and L. Morency, Multimodal Machine Learning: A Survey and Taxonomy, IEEE Transactions on Pattern Analysis and Machine Intelligence, issue.3, 2018.
DOI : 10.1109/TPAMI.2018.2798607

J. Carreira and A. Zisserman, Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.502

S. Chopra, R. Hadsell, and Y. Lecun, Learning a Similarity Metric Discriminatively, with Application to Face Verification, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005.
DOI : 10.1109/CVPR.2005.202

URL : http://yann.lecun.com/exdb/publis/psgz/chopra-05.ps.gz

C. Doersch, A. Gupta, and A. A. Efros, Unsupervised Visual Representation Learning by Context Prediction, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.167

URL : http://arxiv.org/pdf/1505.05192

B. G. Fabian-caba-heilbron, V. Escorcia, and J. C. Niebles, Activitynet: A large-scale video benchmark for human activity understanding, CVPR, 2015.

C. Fan, J. Lee, M. Xu, K. K. Singh, Y. J. Lee et al., Identifying First-Person Camera Wearers in Third-Person Videos, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
DOI : 10.1109/CVPR.2017.503

URL : http://arxiv.org/pdf/1704.06340

A. Fathi, A. Farhadi, and J. M. Rehg, Understanding egocentric activities, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126269

URL : http://www.cc.gatech.edu/%7Eafathi3/publication/ICCV11.pdf

A. Fathi, X. Ren, and J. M. Rehg, Learning to recognize objects in egocentric activities, CVPR 2011, 2003.
DOI : 10.1109/CVPR.2011.5995444

Y. Gong, Y. Jia, T. K. Leung, A. Toshev, and S. Ioffe, Deep convolutional ranking for multilabel image annotation, ICLR, issue.2, 2014.

R. Hadsell, S. Chopra, and Y. Lecun, Dimensionality Reduction by Learning an Invariant Mapping, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.100

URL : http://www.cs.nyu.edu/~raia/docs/cvpr06.pdf

E. Hoffer, I. Hubara, and N. Ailon, Spatial contrasting for deep unsupervised learning, 2005.

D. Jayaraman and K. Grauman, Learning Image Representations Tied to Ego-Motion, 2015 IEEE International Conference on Computer Vision (ICCV)
DOI : 10.1109/ICCV.2015.166

URL : http://arxiv.org/pdf/1505.02206

H. Joo, H. Liu, L. Tan, L. Gui, B. Nabbe et al., Panoptic Studio: A Massively Multiview System for Social Motion Capture, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.381

T. Kanade and M. Hebert, First-Person Vision, Proc. IEEE
DOI : 10.1109/JPROC.2012.2200554

A. Klaser, M. Marszalek, and C. Schmid, A Spatio-Temporal Descriptor Based on 3D-Gradients, Procedings of the British Machine Vision Conference 2008, 2008.
DOI : 10.5244/C.22.99

URL : https://hal.archives-ouvertes.fr/inria-00514853

I. Laptev, On space-time interest points, IJCV, issue.2, 2005.
DOI : 10.1007/s11263-005-1838-7

URL : http://kth.diva-portal.org/smash/get/diva2:442088/FULLTEXT01

I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587756

URL : https://hal.archives-ouvertes.fr/inria-00548659

Y. J. Lee, J. Ghosh, and K. Grauman, Discovering important people and objects for egocentric video summarization, CVPR, 2012. 1

Y. Li, M. Paluri, J. M. Rehg, and P. Dollár, Unsupervised Learning of Edges, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.179

Y. Li, Z. Ye, and J. M. Rehg, Delving into egocentric actions, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298625

URL : http://europepmc.org/articles/pmc4784702?pdf=render

M. Ma, H. Fan, and K. M. Kitani, Going deeper into firstperson activity recognition, CVPR, 2016
DOI : 10.1109/cvpr.2016.209

URL : http://arxiv.org/pdf/1605.03688

M. Mathieu, C. Couprie, and Y. Lecun, Deep multi-scale video prediction beyond mean square error, ICLR, 2016.

P. X. Nguyen, G. Rogez, C. Fowlkes, and D. Ramanan, The open world of micro-videos. arXiv, 2016.

D. Pathak, R. Girshick, P. Dollár, T. Darrell, and B. Hariharan, Learning Features by Watching Objects Move, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.638

URL : http://arxiv.org/pdf/1612.06370

H. Pirsiavash and D. Ramanan, Detecting activities of daily living in first-person camera views, 2012 IEEE Conference on Computer Vision and Pattern Recognition
DOI : 10.1109/CVPR.2012.6248010

Y. Poleg, C. Arora, and S. Peleg, Head Motion Signatures from Egocentric Videos, ACCV, 2014.
DOI : 10.1007/978-3-319-16811-1_21

R. Poppe, A survey on vision-based human action recognition, Image and Vision Computing, vol.28, issue.6, 2010.
DOI : 10.1016/j.imavis.2009.11.014

D. Premack and G. Woodruff, Does the chimpanzee have a theory of mind?, Behavioral and Brain Sciences, vol.1, issue.04, 1978.
DOI : 10.1126/science.705342

N. Rhinehart and K. M. Kitani, Learning Action Maps of Large Environments via First-Person Vision, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.69

N. Rhinehart and K. M. Kitani, First-Person Activity Forecasting with Online Inverse Reinforcement Learning, 2017 IEEE International Conference on Computer Vision (ICCV), 2017.
DOI : 10.1109/ICCV.2017.399

G. Rizzolatti and L. Craighero, The mirror-neuron system, Annu. Rev. Neurosci, issue.1, 2004.
DOI : 10.1146/annurev.neuro.27.070203.144230

M. S. Ryoo and L. Matthies, First-Person Activity Recognition: What Are They Doing to Me?, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.352

URL : http://cvrc.ece.utexas.edu/mryoo/papers/cvpr2013_ryoo.pdf

G. A. Sigurdsson, J. Choi, A. Farhadi, and A. Gupta, Charades challenge 2017

G. A. Sigurdsson, S. Divvala, A. Farhadi, and A. Gupta, Asynchronous Temporal Fields for Action Recognition, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.599

URL : http://arxiv.org/pdf/1612.06371

G. A. Sigurdsson, G. Varol, X. Wang, A. Farhadi, I. Laptev et al., Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding, ECCV, 2008.
DOI : 10.1109/ICCV.2015.515

URL : https://hal.archives-ouvertes.fr/hal-01418216

K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, NIPS, 2014, p.5

J. Wang, Y. Song, T. Leung, C. Rosenberg, J. Wang et al., Learning Fine-Grained Image Similarity with Deep Ranking, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.180

URL : http://arxiv.org/pdf/1404.4661

W. Wang, R. Arora, K. Livescu, and J. Bilmes, On deep multi-view representation learning, ICML, 2015.

X. Wang and A. Gupta, Unsupervised Learning of Visual Representations Using Videos, 2015 IEEE International Conference on Computer Vision (ICCV), p.4
DOI : 10.1109/ICCV.2015.320

D. Weinland, R. Ronfard, and E. Boyer, A survey of visionbased methods for action representation, segmentation and recognition. CVIU, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00640088

R. Yonetani, K. M. Kitani, and Y. Sato, Ego-surfing first person videos, CVPR, 2015.
DOI : 10.1109/tpami.2017.2771767

URL : http://arxiv.org/pdf/1606.04637

S. Zagoruyko and N. Komodakis, Learning to compare image patches via convolutional neural networks, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7299064

URL : https://hal.archives-ouvertes.fr/hal-01246261