Y. Amit, M. Fink, N. Srebro, and S. Ullman, Uncovering shared structures in multiclass classification, Proceedings of the 24th international conference on Machine learning, ICML '07, 2007.
DOI : 10.1145/1273496.1273499
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.76.5871

A. Argyriou, T. Evgeniou, and M. Pontil, Convex multi-task feature learning, Machine Learning, 2008.
DOI : 10.1007/s10994-007-5040-8
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.130.2025

P. L. Bartlett, M. Jordan, and J. D. Mcauliffe, Convexity, Classification, and Risk Bounds, Journal of the American Statistical Association, vol.101, issue.473, pp.138-156, 2006.
DOI : 10.1198/016214505000000907

L. Bottou and O. Bousquet, The tradeoffs of large scale learning, NIPS, 2007.

K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman, The devil is in the details: an evaluation of recent feature encoding methods, Procedings of the British Machine Vision Conference 2011, 2011.
DOI : 10.5244/C.25.76

S. S. Chen, D. L. Donoho, and M. A. Saunders, Atomic decomposition by basis pursuit. SIAM Rev, p.43
DOI : 10.1137/s003614450037906x
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.113.7694

K. Crammer and Y. Singer, Ultraconservative Online Algorithms for Multiclass Problems, J. Mach. Learn. Research, vol.3, 2003.
DOI : 10.1007/3-540-44581-1_7
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.2693

J. Deng, A. C. Berg, K. Li, and L. Fei-fei, What does classifying more than 10000 image categories tell us, ECCV, 2010.

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., Imagenet: A large-scale hierarchical image database, CVPR, 2009.

J. Duchi and Y. Singer, Boosting with structural sparsity, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553412
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.149.7959

M. Dudik, Z. Harchaoui, and J. Malick, Lifted coordinate descent for learning with trace-norm regularization, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00756802

R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin, Liblinear: A library for large linear classification, J. Mach. Learn. Research, vol.9, pp.1871-1874, 2008.

M. Fazel, Matrix rank minimization with applications, 2002.

T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, 2008.

H. Jégou, M. Douze, and C. Schmid, Product Quantization for Nearest Neighbor Search, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.33, issue.1, 2011.
DOI : 10.1109/TPAMI.2010.57

J. Langford, L. Li, and T. Zhang, Sparse online learning via truncated gradient, J. Mach. Learn. Research, vol.10, 2009.

Y. Lecun, L. Bottou, G. Orr, and K. Muller, Efficient backprop, Neural Networks: Tricks of the trade, 1998.

Y. Lin, F. Lv, S. Zhu, M. Yang, T. Cour et al., Large-scale image classification: Fast feature extraction and SVM training Distinctive image features from scale-invariant keypoints, CVPR, 2004.
DOI : 10.1109/cvpr.2011.5995477
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.225.3736

S. Mallat, A Wavelet Tour of Signal Processing: the sparse way, 2009.

J. Nocedal and S. Wright, Numerical Optimization, 1999.
DOI : 10.1007/b98874

F. Perronnin and C. Dance, Fisher Kernels on Visual Vocabularies for Image Categorization, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2006.
DOI : 10.1109/CVPR.2007.383266
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.71.7388

F. Perronnin, J. Sánchez, and T. Mensink, Improving the Fisher Kernel for Large-Scale Image Classification, ECCV, pp.143-156, 2010.
DOI : 10.1007/978-3-642-15561-1_11
URL : https://hal.archives-ouvertes.fr/inria-00548630

R. Rifkin and A. Klautau, In defense of one-vs-all classification, J. Mach. Learn. Research, vol.5, pp.101-141, 2004.

J. Sánchez and F. Perronnin, High-dimensional signature compression for large-scale image classification, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995504

S. Shalev-shwartz, Y. Wexler, and A. Shashua, Shareboost: Efficient multiclass learning with feature sharing, 2011.

A. Tewari and P. Bartlett, On the Consistency of Multiclass Classification Methods, J. Mach. Learn. Research, vol.8, pp.1007-1025, 2007.
DOI : 10.1007/11503415_10

P. Tseng and S. Yun, A coordinate gradient descent method for nonsmooth separable minimization, Mathematical Programming, vol.23, issue.1-2, 2009.
DOI : 10.1007/s10107-007-0170-0

A. Vedaldi and A. Zisserman, Efficient additive kernels via explicit feature maps, CVPR, 2010.
DOI : 10.1109/cvpr.2010.5539949
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.167.7024

J. Wang, J. Yang, K. Yu, F. Lv, T. Huang et al., Localityconstrained linear coding for image classification, CVPR, 2010.
DOI : 10.1109/cvpr.2010.5540018
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.175.2312

Z. Zhou, K. Yu, T. Zhang, and T. Huang, Image Classification Using Super-Vector Coding of Local Image Descriptors, ECCV, 2010.
DOI : 10.1007/978-3-642-15555-0_11

H. Zou and T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc., Ser. B, Stat. Methodol, vol.67, 2005.