F. Bach, Sharp analysis of low-rank kernel matrix approximations, Conference on Learning Theory, pp.185-209, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00723365

F. Bach and E. Moulines, Non-asymptotic analysis of stochastic approximation algorithms for machine learning, Adv. NIPS, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00608041

F. Bach and E. Moulines, Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n), Advances in Neural Information Processing Systems (NIPS), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00831977

A. Bordes, S. Ertekin, J. Weston, and L. Bottou, Fast kernel classifiers with online and active learning, Journal of Machine Learning Research, vol.6, pp.1579-1619, 2005.
URL : https://hal.archives-ouvertes.fr/hal-00752361

L. Bottou, F. E. Curtis, and J. Nocedal, Optimization methods for large-scale machine learning, 2016.

A. Caponnetto and E. D. Vito, Optimal rates for the regularized least-squares algorithm, Foundations of Computational Mathematics, vol.7, issue.3, pp.331-368, 2007.

A. Dieuleveut and F. Bach, Nonparametric stochastic approximation with large step-sizes, Ann. Statist, vol.44, issue.4, pp.1363-1399, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01053831

A. Dieuleveut, A. Durmus, and F. Bach, Bridging the gap between constant step size stochastic gradient descent and markov chains, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01565514

W. R. Gilks, S. Richardson, and D. Spiegelhalter, Markov chain Monte Carlo in practice, 1995.

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, 2016.

J. M. Hilbe, Negative binomial regression, 2011.

D. Koller and N. Friedman, Probabilistic Graphical Models: Principles and Techniques-Adaptive Computation and Machine Learning, 2009.

J. Lafferty, A. Mccallum, and F. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, Proc. ICML, 2001.

M. Lichman, UCI machine learning repository, 2013.

P. Mccullagh, Generalized linear models, European Journal of Operational Research, vol.16, issue.3, pp.285-292, 1984.

S. P. Meyn and R. L. Tweedie, Markov chains and stochastic stability, 1993.

K. P. Murphy, Machine Learning: A Probabilistic Perspective, 2012.

B. T. Polyak and A. B. Juditsky, Acceleration of stochastic approximation by averaging, SIAM Journal on Control and Optimization, vol.30, issue.4, pp.838-855, 1992.

C. E. Rasmussen and C. K. Williams, Gaussian Processes for Machine Learning, 2006.

A. Rudi, L. Carratino, and L. Rosasco, Falkon: An optimal large scale kernel method, Advances in Neural Information Processing Systems, pp.3891-3901, 2017.

B. Scholkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and beyond, 2001.

J. Shawe-taylor and N. Cristianini, Kernel Methods for Pattern Analysis, 2004.

B. K. Sriperumbudur, A. Gretton, K. Fukumizu, G. Lanckriet, and B. Schölkopf, Injective hilbert space embeddings of probability measures, Proc. COLT, 2008.

C. K. Williams and M. Seeger, Using the nyström method to speed up kernel machines, Advances in neural information processing systems, pp.682-688, 2001.

D. P. Woodruff, Sketching as a tool for numerical linear algebra, Foundations and Trends R in Theoretical Computer Science, vol.10, issue.1-2, pp.1-157, 2014.