, , 2012.
SLIC Superpixels Compared to State-of-the-art Superpixel Methods, PAMI (cit, p.169, 2012. ,
, , vol.58, 2016.
Unsupervised learning from Narrated Instruction Videos, 2016. ,
Learning from narrated instruction videos, PAMI (cit, p.10, 2017. ,
Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs, p.10, 2016. ,
Joint Discovery of Object States and Manipulation Actions, ICCV (cit, p.10, 2017. ,
Pictorial structures revisited: People detection and articulated pose estimation, CVPR (cit, p.14, 2009. ,
DIFFRAC: A discriminative and flexible framework for clustering, NIPS (cit. on pp, vol.26, p.93, 2007. ,
The Cyclic Block Conditional Gradient Method for Convex Optimization Problems, SIOPT (cit, p.125, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01491541
Linearly convergent awaystep conditional gradient for non-strongly convex functions, p.156, 2015. ,
Pattern recognition and machine learning, p.38, 2006. ,
Finding Actors and Actions in Movies, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00904991
Weakly Supervised Action Labeling in Videos Under Ordering Constraints, ECCV (cit. on pp. 26, vol.27, p.100, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01053967
Weakly-supervised alignment of video with text, ICCV (cit. on pp. 26, vol.55, p.105, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01154523
An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision, PAMI (cit, vol.171, p.132, 2004. ,
Detecting changes in real-world objects: The relationship between visual longterm memory and change blindness, Communicative and Integrative Biology, vol.87, 2006. ,
Efficient Large-Scale Structured Learning, CVPR (cit, p.114, 2013. ,
Lazifying Conditional Gradient Algorithms, ICML (cit, p.131, 2017. ,
Efficient use of limited-memory accelerators for linear learning on heterogeneous systems, NIPS (cit, p.125, 2017. ,
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset". In: CVPR, 2017. ,
Unsupervised Learning of Narrative Event Chains, In: ACL (cit. on pp, vol.27, 2008. ,
On Pairwise Costs for Network Flow Multi-Object Tracking, CVPR (cit, p.65, 2015. ,
Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations, 2014. ,
A Flexible Model for Training Action Localization with Varying Levels of Supervision, p.11, 2018. ,
Deep Filter Banks for Texture Recognition and Segmentation, CVPR (cit, p.74, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01263622
Learning from ambiguously labeled images, CVPR (cit, p.23, 2009. ,
DOI : 10.1109/cvprw.2009.5206667
URL : http://www.cis.upenn.edu/~taskar/pubs/cvpr09.pdf
Stochastic Dual Coordinate Ascent with Adaptive Probabilities, ICML (cit, vol.125, p.123, 2015. ,
Faster coordinate descent via adaptive importance sampling, AISTATS (cit, p.125, 2017. ,
Histograms of Oriented Gradients for Human Detection, CVPR (cit, vol.16, p.13, 2005. ,
DOI : 10.1109/cvpr.2005.177
URL : https://hal.archives-ouvertes.fr/inria-00548512
Scaling Egocentric Vision: The EPIC-KITCHENS Dataset, p.37, 2018. ,
, , 2014.
You-Do, I-Learn: Discovering Task Relevant Objects and their Modes of Interaction from Multi-User Egocentric Video, 2014. ,
An optimal affine invariant smooth minimization algorithm, p.147, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00907547
Scene semantics from long-term observation of people, 2012. ,
Learning personobject interactions for action recognition in still images, NIPS (cit. on pp, vol.88, p.18, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00648156
ImageNet: A Large-Scale Hierarchical Image Database, CVPR (cit, vol.103, p.92, 2009. ,
Discriminative models for static human-object interactions, CVPR Workshops, p.19, 2010. ,
What Makes Paris Look like Paris?, In: SIGGRAPH (cit, p.90, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-01248528
Discovering localized attributes for fine-grained recognition, CVPR (cit, vol.87, 2012. ,
Automatic annotation of human actions in video, ICCV (cit, vol.55, p.23, 2009. ,
The Pascal Visual Object Classes (VOC) Challenge". In: IJCV (cit, p.16, 2010. ,
Hello! My name is... Buffy"-Automatic Naming of Characters in TV Video, BMVC (cit, p.23, 2006. ,
Describing Objects by their Attributes, CVPR (cit. on pp. 17, vol.87, 2009. ,
Modeling Actions through State Changes, CVPR (cit. on pp, vol.18, 2013. ,
WordNet: An Electronic Lexical Database, p.63, 1998. ,
Object Detection with Discriminatively Trained Part-Based Models, PAMI (cit, pp.14-16, 2010. ,
Efficient matching of pictorial structures, CVPR (cit, p.16, 2000. ,
Pictorial structures for object recognition, IJCV (cit, vol.173, p.132, 2005. ,
Modeling Video Evolution For Action Recognition, CVPR (cit, p.89, 2015. ,
The Representation and Matching of Pictorial Structures, IEEE Transactions on Computers, p.16, 1973. ,
A fast operator for detection and precise location of distinct points, corners and centres of circular features, ISPRS (cit, p.19, 1987. ,
, From Lifestyle Vlogs to Everyday Interactions". In: arXiv (cit, vol.138, p.37, 2017.
People Watching: Human Actions as a Cue for Single View Geometry, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01066257
FASOLE: Fast Algorithm for Structured Output LEarning, ECML PKDD, p.128, 2014. ,
An algorithm for quadratic programming, Naval Research Logistics Quarterly, 1956. ,
A hierarchical Bayesian model for unsupervised induction of script knowledge, 2014. ,
Linear-Memory and Decomposition-Invariant Linearly Convergent Conditional Gradient Algorithm for Structured Polytopes, NIPS (cit, p.128, 2016. ,
, Fast R-CNN". In: ICCV (cit. on pp. 16, vol.103, p.91, 2015.
Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR (cit, p.16, 2014. ,
Deformable part models, p.13, 2013. ,
, Generative Adversarial Networks". In: NIPS (cit, p.42, 2014.
The something something video database for learning and evaluating visual common sense, 2017. ,
ImageNet Auto-Annotation with Segmentation Propagation, IJCV (cit, p.169, 2014. ,
Observing human-object interactions: Using spatial and functional compatibility for recognition, 2009. ,
Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos, CVPR (cit, vol.30, p.29, 2009. ,
A combined corner and edge detector, BMVC (cit, p.19, 1988. ,
The Elements of Statistical Learning: Data Mining, Inference and Prediction, vol.44, p.38, 2009. ,
, Mask R-CNN". In: ICCV (cit, p.17, 2017.
Deep Residual Learning for Image Recognition, CVPR (cit, p.103, 2016. ,
Clustal: A package for performing multiple sequence alignment on a microcomputer, Gene (cit, vol.75, p.65, 1988. ,
On Approximate Solutions of Systems of Linear Inequalities, Journal of Research of the National Bureau of Standards, p.156, 1952. ,
An extension of the Frank and Wolfe method of feasible directions, Mathematical Programming, p.125, 1974. ,
Random Design Analysis of Ridge Regression, Foundations of Computational Mathematics, p.82, 2014. ,
Connectionist Temporal Modeling for Weakly Supervised Action Labeling, ECCV (cit. on pp, vol.90, p.27, 2016. ,
Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos, CVPR (cit, vol.36, p.35, 2017. ,
Finding "It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Video, CVPR (cit, vol.36, p.35, 2018. ,
Discovering States and Transformations in Image Collections, CVPR (cit. on pp. 17, vol.87, 2015. ,
Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization, ICML (cit, vol.144, p.141, 2013. ,
Revisiting Frank-Wolfe: Projection-free sparse convex optimization, ICML (cit, vol.49, p.48, 2013. ,
Revisiting Frank-Wolfe: Projection-free sparse convex optimization, 2013. ,
Representing videos using mid-level discriminative patches, CVPR (cit, p.90, 2013. ,
Reflection methods for user-friendly submodular optimization, NIPS (cit, p.115, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00905258
Cutting-plane training of structural SVMs, Machine Learning, 2009. ,
Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation, BMVC (cit. on pp. 132, vol.135, p.167, 2010. ,
, Panoptic Studio: A Massively Multiview System for Social Motion Capture *". In: ICCV (cit, p.15, 2015.
Efficient Image and Video Co-localization with Frank-Wolfe Algorithm, 2014. ,
DOI : 10.1007/978-3-319-10599-4_17
URL : http://ai.stanford.edu/%7Ekdtang/papers/eccv14-vidcoloc.pdf
Efficient image and video co-localization with Frank-Wolfe algorithm, ECCV (cit, p.115, 2014. ,
DOI : 10.1007/978-3-319-10599-4_17
URL : http://ai.stanford.edu/%7Ekdtang/papers/eccv14-vidcoloc.pdf
Discriminative Clustering for Image Co-segmentation", vol.46, 2010. ,
DOI : 10.1109/cvpr.2010.5539868
URL : http://www.di.ens.fr/%7Efbach/cosegmentation_cvpr2010.pdf
Multi-class cosegmentation, CVPR (cit, p.46, 2012. ,
DOI : 10.1109/cvpr.2012.6247719
URL : https://hal.archives-ouvertes.fr/hal-00717448
Efficient Image and Video Co-localization with Frank-Wolfe Algorithm, ECCV (cit, p.47, 2014. ,
DOI : 10.1007/978-3-319-10599-4_17
URL : http://ai.stanford.edu/%7Ekdtang/papers/eccv14-vidcoloc.pdf
FrankWolfe with Subsampling Oracle, ICML (cit, p.131, 2018. ,
Visual objectaction recognition: Inferring object affordances from human demonstration, CVIU (cit, p.88, 2011. ,
Closed-form approximate CRF training for scalable image segmentation, 2014. ,
DOI : 10.1007/978-3-319-10578-9_36
URL : http://groups.inf.ed.ac.uk/calvin/Publications/kolesnikov14eccv.pdf
Geodesic Object Proposals, ECCV (cit, vol.103, p.16, 2014. ,
HMDB: a large video database for human motion recognition, 2011. ,
DOI : 10.1109/iccv.2011.6126543
URL : http://dspace.mit.edu/bitstream/1721.1/69981/1/Poggio-HMDB.pdf
On the Global Linear Convergence of Frank-Wolfe Optimization Variants, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01248675
On the Global Linear Convergence of Frank-Wolfe Optimization Variants, NIPS (cit. on pp. 115, vol.119, pp.153-158, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01248675
Block-Coordinate Frank-Wolfe Optimization for Structural SVMs, ICML (cit. on pp. 7, 9, vol.10, p.165, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00720158
Convergence Rate of Frank-Wolfe for Non-Convex Objectives, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01415335
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, ICML (cit, p.157, 2001. ,
Learning realistic human actions from movies, CVPR (cit, vol.55, p.23, 2008. ,
DOI : 10.1109/cvpr.2008.4587756
URL : https://hal.archives-ouvertes.fr/inria-00548659
On space-time interest points, IJCV (cit, p.19, 2005. ,
DOI : 10.1007/s11263-005-1838-7
URL : http://kth.diva-portal.org/smash/get/diva2:442088/FULLTEXT01
Learning Realistic Human Actions from Movies, CVPR (cit, p.89, 2008. ,
DOI : 10.1109/cvpr.2008.4587756
URL : https://hal.archives-ouvertes.fr/inria-00548659
SEARNN: Training RNNs with Global-Local Losses, International Conference on Learning Representations (ICLR, p.11, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01665263
Multiple sequence alignment using partial order graphs, Bioinformatics (cit. on pp, vol.63, p.73, 2002. ,
DOI : 10.1093/bioinformatics/18.3.452
URL : https://academic.oup.com/bioinformatics/article-pdf/18/3/452/648375/180452.pdf
A Pylon Model for Semantic Segmentation, NIPS (cit, p.169, 2011. ,
Clustering of time series data, a survey, Pattern recognition, p.77, 2014. ,
Microsoft coco: Common objects in context, 2014. ,
Least squares quantization in PCM, IEEE Transactions on Information Theory, p.42, 1982. ,
, , 2007.
A Survey for the Quadratic Assignment Problem, EJOR (cit, p.95, 2007. ,
Distinctive Image Features from Scale-Invariant Keypoint, IJCV (cit, p.170, 2004. ,
What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision, 2015. ,
Building a large annotated corpus of English: The Penn Treebank, Computational linguistics, p.168, 1993. ,
Generating typed dependency parses from phrase structure parses, LREC (cit, vol.73, p.62, 2006. ,
Actions in context, CVPR (cit, vol.23, p.22, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00548645
Learning from Video and Text via Large-Scale Discriminative Clustering, ICCV (cit, vol.113, p.11, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01569540
Distributed Representations of Words and Phrases and their Compositionality, NIPS (cit, p.74, 2013. ,
WordNet: A Lexical Database for English, Communications of the ACM (cit. on p, vol.63, 1995. ,
Finding the Point of a Polyhedron Closest to the Origin, SIAM Journal on Control, vol.125, p.115, 1974. ,
Hand detection using multiple proposals, BMVC (cit, p.15, 2011. ,
Machine learning : a probabilistic perspective, p.38, 2012. ,
Discriminative Unsupervised Alignment of Natural Language Instructions with Corresponding Video Segments, p.55, 2015. ,
A novel Frank-Wolfe algorithm. Analysis and applications to large-scale SVM training, Information Sciences, p.128, 2014. ,
Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm, NIPS (cit, vol.147, p.124, 2014. ,
Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems, SIAM Journal on Optimization, 2012. ,
Introductory Lectures on Convex Programming Volume I: Basic course (cit, p.48, 1998. ,
Human detection from images and videos: A survey, Pattern Recognition, p.14, 2016. ,
Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification, ECCV (cit, p.56, 2010. ,
Unsupervised learning of human action categories using spatial-temporal words, IJCV (cit, p.55, 2008. ,
CRFsuite: a fast implementation of Conditional Random Fields (CRFs) (cit, p.168, 2007. ,
Perceptually Inspired Layout-aware Losses for Image Segmentation, ECCV (cit, p.173, 2014. ,
Towards Accurate Multi-person Pose Estimation in the Wild, CVPR (cit, p.15, 2017. ,
, Relative Attributes". In: ICCV (cit, p.89, 2011.
, Deep Face Recognition". In: BMVC (cit, p.13, 2015.
, Partial order alignment code for Multiple Sequence Alignment, p.75
The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding, IJCV (cit, p.89, 2014. ,
Weaklysupervised learning of visual relations, ICCV (cit, p.19, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01576035
Detecting activities of daily living in first-person camera views, CVPR (cit, p.88, 2012. ,
Fast Training of Support Vector Machines using Sequential Minimal Optimization, Advances in Kernel Methods-Support Vector Learning, p.128, 1999. ,
Category-specific video summarization, ECCV (cit, p.56, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01022967
Weakly Supervised Learning of Interactions between Humans and Objects, PAMI (cit, p.19, 2012. ,
URL : https://hal.archives-ouvertes.fr/inria-00516477
Adaptive Stochastic Dual Coordinate Ascent for Conditional Random Fields, UAI (cit, p.125, 2018. ,
Linking people with "their" names using coreference resolution, ECCV (cit, p.90, 2014. ,
Poselet Key-framing: A Model for Human Activity Recognition, CVPR (cit, p.56, 2013. ,
Online) Subgradient Methods for Structured Prediction, AISTATS (cit, p.114, 2007. ,
Learning Script Knowledge with Web Experiments, ACL (cit. on pp. 28, vol.29, 2010. ,
Grounding Action Descriptions in Videos, TACL (cit, p.26, 2013. ,
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, 2015. ,
Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling, CVPR (cit, p.27, 2017. ,
A dataset for movie description, CVPR (cit, p.25, 2015. ,
Script Data for Attribute-Based Recognition of Composite Activities, ECCV (cit, p.30, 2012. ,
, Introduction to the CoNLL-2000 shared task: Chunking (cit. on pp. 132, vol.135, p.167, 2000.
Unsupervised Learning and Segmentation of Complex Activities from Video, CVPR (cit, p.34, 2018. ,
Unsupervised Semantic Parsing of Video Collections, ICCV (cit. on pp. 32, vol.33, 2015. ,
, Shallow Parsing with Conditional Random Fields". In: NAACL (cit, p.168, 2003.
A MultiPlane Block-Coordinate Frank-Wolfe Algorithm for Training Structural SVMs with a Costly max-Oracle, CVPR (cit, p.131, 2015. ,
Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding, ECCV (cit, vol.102, p.89, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01418216
Mastering the game of Go without human knowledge, p.41, 2017. ,
Hand Keypoint Detection in Single Images using Multiview Bootstrapping, CVPR (cit, p.15, 2017. ,
Very Deep Convolutional Networks for Large-Scale Image Recognition, 2015. ,
Twostream convolutional networks for action recognition in videos, NIPS (cit, vol.89, p.20, 2014. ,
Unsupervised Discovery of Mid-level Discriminative Patches, ECCV (cit, p.90, 2012. ,
Who are you?"-Learning person specific classifiers from video, CVPR (cit, p.23, 2009. ,
UCF101: A dataset of 101 human actions classes from videos in the wild, 2012. ,
Ranking domain-specific highlights by analyzing edited videos, ECCV (cit, p.56, 2014. ,
Introduction to Reinforcement Learning. 1st, p.41, 1998. ,
MovieQA: Understanding Stories in Movies through QuestionAnswering, CVPR (cit, p.25, 2016. ,
Learning structured prediction models: A large margin approach". Doctoral dissertation. Stanford University (cit, p.128, 2004. ,
Max-Margin Markov Networks, 2003. ,
Efficient and Precise Interactive Hand Tracking through Joint, Continuous Optimization of Pose and Correspondences, p.15, 2016. ,
DeepPose: Human Pose Estimation via Deep Neural Networks, CVPR (cit, p.14, 2014. ,
Learning spatiotemporal features with 3D convolutional networks, 2015. ,
, , 2005.
Large margin methods for structured and interdependent output variables, 2005. ,
Sequence to sequence-video to text, ICCV (cit, p.25, 2015. ,
Translating Videos to Natural Language Using Deep Recurrent Neural Networks, NAACL (cit, p.25, 2015. ,
Show and tell: A neural image caption generator, p.24, 2014. ,
Rapid Object Detection using a Boosted Cascade of Simple Features, CVPR (cit, p.13, 2001. ,
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm, IEEE Transactions on Information Theory 13, vol.2, pp.260-269, 1967. ,
Graphical models, exponential families, and variational inference, Foundations and Trends in Machine Learning, p.157, 2008. ,
Action Recognition with Improved Trajectories, vol.107, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00873267
Action recognition by dense trajectories, CVPR (cit, p.20, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00583818
On the complexity of multiple sequence alignment, Journal of computational biology, 1994. ,
Actions Transformations, CVPR (cit, vol.89, p.17, 2016. ,
Parallel and Distributed Block-Coordinate Frank-Wolfe Algorithms, p.125, 2014. ,
, Convolutional Pose Machines". In: CVPR (cit, p.14, 2016.
DOI : 10.1109/cvpr.2016.511
URL : http://arxiv.org/pdf/1602.00134
Convergence Theory in Nonlinear Programming, Integer and Nonlinear Programming, 1970. ,
, Maximum Margin Clustering". In: NIPS (cit, vol.90, p.44, 2004.
Discriminative tag learning on YouTube videos with latent sub-tags, CVPR (cit, p.23, 2011. ,
DOI : 10.1109/cvpr.2011.5995402
URL : http://www.sfu.ca/%7Ewya16/cvpr2011_sub_tag_draft.pdf
Articulated pose estimation with flexible mixtures-of-parts, CVPR (cit, p.14, 2011. ,
DOI : 10.1109/cvpr.2011.5995741
Grouplet: A structured image representation for recognizing human and object interactions, CVPR (cit, p.19, 2010. ,
DOI : 10.1109/cvpr.2010.5540234
Human action recognition by learning bases of action attributes and parts, p.88, 2011. ,
DOI : 10.1109/iccv.2011.6126386
URL : http://people.csail.mit.edu/khosla/papers/iccv2011_yao.pdf
Describing videos by exploiting temporal structure, ICCV (cit, p.25, 2015. ,
DOI : 10.1109/iccv.2015.512
URL : http://arxiv.org/pdf/1502.08029
A Survey of Recent Advances in Face Detection, p.13, 2010. ,
Stochastic Optimization with Importance Sampling for Regularized Loss Minimization, ICML (cit, vol.147, p.124, 2015. ,
Towards Automatic Learning of Procedures from Web Instructional Videos, AAAI (cit, vol.37, p.33, 2018. ,