J. Abernethy, E. Hazan, and A. Rakhlin, Competing in the dark: An efficient algorithm for bandit linear optimization, Proceedings of the International Conference on Learning Theory (COLT), pp.263-274, 2008.

A. Agarwal, P. L. Bartlett, P. Ravikumar, and M. J. Wainwright, Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization, IEEE Transactions on Information Theory, vol.58, issue.5, pp.3235-3249, 2012.
DOI : 10.1109/TIT.2011.2182178

F. Bach, Duality Between Subgradient and Conditional Gradient Methods, SIAM Journal on Optimization, vol.25, issue.1, pp.115-129, 2015.
DOI : 10.1137/130941961
URL : https://hal.archives-ouvertes.fr/hal-00757696

F. Bach and E. Moulines, Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n), Advances in Neural Information Processing Systems (NIPS), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00831977

H. H. Bauschke and J. M. Borwein, Legendre functions and the method of random Bregman projections, J. Convex Anal, vol.4, issue.1, pp.27-67, 1997.

H. H. Bauschke, P. L. , and C. , Convex analysis and monotone operator theory in Hilbert spaces, CMS Books in Mathematics, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00643354

H. H. Bauschke, J. Bolte, and M. Teboulle, A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications, Mathematics of Operations Research, 2016.
DOI : 10.1287/moor.2016.0817

A. Beck and M. Teboulle, Mirror descent and nonlinear projected subgradient methods for convex optimization, Operations Research Letters, vol.31, issue.3, pp.167-175, 2003.
DOI : 10.1016/S0167-6377(02)00231-6

A. Beck and M. Teboulle, A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems, SIAM Journal on Imaging Sciences, vol.2, issue.1, pp.183-202, 2009.
DOI : 10.1137/080716542
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.231.3271

S. Boyd and L. Vandenberghe, Convex optimization, 2004.

L. M. Bregman, The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming, USSR Computational Mathematics and Mathematical Physics, vol.7, issue.3, pp.620-631, 1967.
DOI : 10.1016/0041-5553(67)90040-7

N. Cesa-bianchi and G. Lugosi, Prediction, learning, and games, 2006.
DOI : 10.1017/CBO9780511546921

I. Colin, A. Bellet, J. Salmon, and S. Clémençon, Gossip dual averaging for decentralized optimization of pairwise functions, Proceedings of the conference on machine learning (ICML), 2016.
URL : https://hal.archives-ouvertes.fr/hal-01329315

P. L. Combettes and J. Pesquet, Proximal Splitting Methods in Signal Processing, Springer Optim. Appl, vol.49, pp.185-212, 2011.
DOI : 10.1007/978-1-4419-9569-8_10
URL : https://hal.archives-ouvertes.fr/hal-00643807

A. Defazio, F. Bach, and S. Lacoste-julien, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, Advances in Neural Information Processing Systems (NIPS), pp.1646-1654, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01016843

O. Dekel, R. Gilad-bachrach, O. Shamir, and L. Xiao, Optimal distributed online prediction using mini-batches, J. Mach. Learn. Res, vol.13, pp.165-202, 2012.

O. Devolder, F. Glineur, and Y. Nesterov, First-order methods with inexact oracle: the strongly convex case, CORE Discussion Papers, 2013.

L. Devroye, L. Györfi, and G. Lugosi, A probabilistic theory of pattern recognition, Applications of Mathematics, vol.31, 1996.
DOI : 10.1007/978-1-4612-0711-5

A. Dieuleveut, N. Flammarion, and F. Bach, Harder, better, faster, stronger convergence rates for least-squares regression. arXiv preprint, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01275431

J. Duchi and F. Ruan, Local asymptotics for some stochastic optimization problems: Optimality, constraint identification, and dual averaging, 2016.

J. Duchi, S. Shalev-shwartz, Y. Singer, and A. Tewari, Composite objective mirror descent, Proceedings of the International Conference on Learning Theory (COLT), pp.14-26, 2010.

J. Duchi, A. Agarwal, and M. Wainwright, Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling, IEEE Transactions on Automatic Control, vol.57, issue.3, pp.592-606, 2012.
DOI : 10.1109/TAC.2011.2161027

N. Flammarion and F. Bach, From averaging to acceleration, there is only a step-size, Proceedings of the International Conference on Learning Theory (COLT), 2015.
URL : https://hal.archives-ouvertes.fr/hal-01136945

C. Gentile and N. Littlestone, -norm algorithms, Proceedings of the twelfth annual conference on Computational learning theory , COLT '99, pp.1-11, 1999.
DOI : 10.1145/307400.307405
URL : https://hal.archives-ouvertes.fr/in2p3-00002187

O. Hanner, On the uniform convexity of Lp and lp, Arkiv f??r Matematik, vol.3, issue.3, pp.239-244, 1956.
DOI : 10.1007/BF02589410

J. Hiriart-urruty and C. Lemaréchal, Fundamentals of convex analysis. Grundlehren Text Editions, 2001.

P. Jain, S. M. Kakade, R. Kidambi, P. Netrapalli, and A. Sidford, Parallelizing stochastic approximation through mini-batching and tail-averaging. arXiv preprint, 2016.

A. Juditsky and A. S. Nemirovski, Functional aggregation for nonparametric regression, Ann. Statist, vol.28, issue.3, pp.681-712, 2000.

A. Kalai and S. Vempala, Efficient algorithms for online decision problems, Journal of Computer and System Sciences, vol.71, issue.3, pp.291-307, 2005.
DOI : 10.1016/j.jcss.2004.10.016

J. Kivinen and M. K. Warmuth, Exponentiated Gradient versus Gradient Descent for Linear Predictors, Information and Computation, vol.132, issue.1, pp.1-63, 1997.
DOI : 10.1006/inco.1996.2612
URL : http://doi.org/10.1006/inco.1996.2612

W. Krichene, A. Bayen, and P. L. Bartlett, Accelerated mirror descent in continuous and discrete time, Advances in Neural Information Processing Systems (NIPS), pp.2845-2853, 2015.

H. Kushner and G. G. Yin, Stochastic Approximation and Recursive Algorithms and Applications, 2003.
DOI : 10.1007/978-1-4899-2696-8

G. Lecué, Optimal Oracle Inequality for Aggregation of Classifiers Under Low Noise Condition, Learning theory, pp.364-378, 2006.
DOI : 10.1007/11776420_28

G. Lecué, Optimal rates of aggregation in classification under low noise assumption, Bernoulli, vol.13, issue.4, pp.1000-1022, 2007.
DOI : 10.3150/07-BEJ6044

S. Lee and S. J. Wright, Manifold identification in dual averaging for regularized stochastic online learning, J. Mach. Learn. Res, vol.13, pp.1705-1744, 2012.

H. Lu, R. Freund, and Y. Nesterov, Relatively-smooth convex optimization by first-order methods, and applications. arXiv preprint, 2016.

O. Macchi, Adaptive processing: The least mean squares approach with applications in transmission, 1995.

B. Martinet, Br??ve communication. R??gularisation d'in??quations variationnelles par approximations successives, Revue fran??aise d'informatique et de recherche op??rationnelle. S??rie rouge, vol.4, issue.R3, pp.154-158, 1970.
DOI : 10.1051/m2an/197004R301541

H. B. Mcmahan, Follow-the-regularized-leader and mirror descent: Equivalence theorems and l1 regularization, AISTATS, pp.525-533, 2011.

J. Moreau, Fonctions convexes duales et points proximaux dans un espace Hilbertien, C. R. Acad. Sci. Paris, vol.255, pp.2897-2899, 1962.

A. S. Nemirovski and D. B. Yudin, Effective methods for the solution of convex programming problems of large dimensions, Ekonom. i Mat. Metody, vol.15, issue.1, pp.135-152, 1979.

A. S. Nemirovsky and D. B. Yudin, Problem complexity and method efficiency in optimization. A Wiley-Interscience Publication, Translated from the Russian and with a preface by E. R. Dawson, Wiley-Interscience Series in Discrete Mathematics, 1983.

Y. Nesterov, Introductory Lectures on Convex Optimization, of Applied Optimization, 2004.
DOI : 10.1007/978-1-4419-8853-9

Y. Nesterov, Primal-dual subgradient methods for convex problems, Mathematical Programming, vol.8, issue.1, pp.221-259, 2009.
DOI : 10.1007/s10107-007-0149-x
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.184.7055

Y. Nesterov, Gradient methods for minimizing composite functions, Mathematical Programming, vol.51, issue.1, pp.125-161, 2013.
DOI : 10.1007/s10107-012-0629-5

B. T. Polyak and A. B. Juditsky, Acceleration of Stochastic Approximation by Averaging, SIAM Journal on Control and Optimization, vol.30, issue.4, pp.838-855, 1992.
DOI : 10.1137/0330046

M. Raginsky and A. Rakhlin, Information-Based Complexity, Feedback and Dynamics in Convex Programming, IEEE Transactions on Information Theory, vol.57, issue.10, pp.7036-7056, 2011.
DOI : 10.1109/TIT.2011.2154375
URL : http://arxiv.org/abs/1010.2285

I. Rish and G. Grabarnik, Sparse modeling: theory, algorithms, and applications, 2014.

R. T. Rockafellar, Convex analysis. Princeton Mathematical Series, 1970.

S. Shalev-shwartz and S. M. Kakade, Mind the duality gap: Logarithmic regret algorithms for online optimization, Advances in Neural Information Processing Systems (NIPS), pp.1457-1464, 2009.

S. Shalev-shwartz and Y. Singer, Online Learning Meets Optimization in the Dual, Learning theory, pp.423-437, 2006.
DOI : 10.1007/11776420_32

T. Suzuki, Dual averaging and proximal gradient descent for online alternating direction multiplier method, Proceedings of the conference on machine learning (ICML), pp.392-400, 2013.

A. B. Tsybakov, Optimal Rates of Aggregation, Proceedings of the Annual Conference on Computational Learning Theory, 2003.
DOI : 10.1007/978-3-540-45167-9_23
URL : https://hal.archives-ouvertes.fr/hal-00104867

A. B. Tsybakov, Introduction to Nonparametric Estimation, 2008.
DOI : 10.1007/b13794

J. Vial, Strong and Weak Convexity of Sets and Functions, Mathematics of Operations Research, vol.8, issue.2, pp.231-259, 1983.
DOI : 10.1287/moor.8.2.231

A. Wibisono, A. C. Wilson, and M. I. Jordan, A variational perspective on accelerated methods in optimization, Proceedings of the National Academy of Sciences, vol.113, issue.47, p.2016
DOI : 10.1073/pnas.1614734113

A. I. Wilson, B. Recht, and M. I. Jordan, A Lyapunov analysis of momentum methods in optimization, 2016.

S. J. Wright, R. D. Nowak, and M. A. Figueiredo, Sparse Reconstruction by Separable Approximation, IEEE Transactions on Signal Processing, vol.57, issue.7, pp.2479-2493, 2009.
DOI : 10.1109/TSP.2009.2016892
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.9334

L. Xiao, Dual averaging methods for regularized stochastic learning and online optimization, J. Mach. Learn. Res, vol.11, pp.2543-2596, 2010.

M. Zinkevich, Online convex programming and generalized infinitesimal gradient ascent, Proceedings of the conference on machine learning (ICML), 2003.