P. Alquier, Transductive and Inductive Adaptive Inference for Regression and Density Estimation, 2006.

P. Alquier and G. Biau, Sparse single-index model, Journal of Machine Learning Research, vol.14, pp.243-280, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00839239

P. Alquier and B. Guedj, An Oracle Inequality for Quasi-Bayesian Non-Negative Matrix Factorization, Mathematical Methods of Statistics, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01251878

P. Alquier and K. Lounici, PAC-Bayesian bounds for sparse regression estimation with exponential weights, Electronic Journal of Statistics, vol.5, issue.0, pp.127-145, 2011.
DOI : 10.1214/11-EJS601

URL : https://hal.archives-ouvertes.fr/hal-00465801

J. Audibert, Une approche PAC-bayésienne de la théorie statistique de l'apprentissage, 2004.

J. Audibert, Fast learning rates in statistical inference through aggregation. The Annals of Statistics, pp.1591-1646, 2009.
DOI : 10.1214/08-aos623

URL : https://hal.archives-ouvertes.fr/hal-00139030

K. S. Azoury and M. K. Warmuth, Relative loss bounds for on-line density estimation with the exponential family of distributions, Machine Learning, pp.211-246, 2001.

W. Barbakh and C. Fyfe, ONLINE CLUSTERING ALGORITHMS, International Journal of Neural Systems, vol.4, issue.03, pp.185-194, 2008.
DOI : 10.1016/S0031-3203(02)00060-2

P. L. Bartlett, T. Linder, and G. Lugosi, The minimax distortion redundancy in empirical quantizer design, IEEE Transactions on Information Theory, vol.44, issue.5, pp.1802-1813, 1998.
DOI : 10.1109/18.705560

J. Baudry, C. Maugis, and B. Michel, Slope heuristics: overview and implementation, Statistics and Computing, vol.6, issue.2, pp.455-470, 2012.
DOI : 10.1007/s11222-011-9236-1

URL : https://hal.archives-ouvertes.fr/hal-00461639

R. B. Calinski and J. Harabasz, A dendrite method for cluster analysis, Communications in Statistics, vol.3, pp.1-27, 1974.

O. Catoni, Statistical Learning Theory and Stochastic Optimization. École d'Été de Probabilités de Saint-Flour, 2001.
DOI : 10.1007/b99352

URL : https://hal.archives-ouvertes.fr/hal-00104952

O. Catoni, PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning, volume 56 of Lecture notes ? Monograph Series, 2007.

N. Cesa-bianchi, Analysis of two gradient-based algorithms for on-line regression, Proceedings of the tenth annual conference on Computational learning theory , COLT '97, pp.392-411, 1999.
DOI : 10.1145/267460.267492

N. Cesa-bianchi and G. Lugosi, Prediction, Learning and Games, 2006.
DOI : 10.1017/CBO9780511546921

N. Cesa-bianchi, P. M. Long, and M. K. Warmuth, Worst-case quadratic loss bounds for prediction using linear functions and gradient descent, IEEE Transactions on Neural Networks, vol.7, issue.3, pp.604-619, 1996.
DOI : 10.1109/72.501719

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.33.8418

N. Cesa-bianchi, D. Helmbold, N. Freund, Y. Haussler, and M. K. Warmuth, How to use expert advice, Journal of the ACM, vol.44, issue.3, pp.427-485, 1997.
DOI : 10.1145/258128.258179

A. Choromanska and C. Monteleoni, Online clustering with experts, Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS), pp.227-235, 2012.

I. Csiszár, $I$-Divergence Geometry of Probability Distributions and Minimization Problems, The Annals of Probability, vol.3, issue.1, pp.146-158, 1975.
DOI : 10.1214/aop/1176996454

A. S. Dalalyan and A. B. Tsybakov, Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity, Machine Learning, pp.39-61, 2008.
DOI : 10.1007/s10994-008-5051-0

URL : https://hal.archives-ouvertes.fr/hal-00291504

A. S. Dalalyan and A. B. Tsybakov, Sparse regression learning by aggregation and Langevin Monte-Carlo, Journal of Computer and System Sciences, vol.78, issue.5, pp.1423-1443, 2012.
DOI : 10.1016/j.jcss.2011.12.023

URL : https://hal.archives-ouvertes.fr/hal-00773553

S. Arnak, A. B. Dalalyan, and . Tsybakov, Aggregation by exponential weighting and sharp oracle inequalities, Learning theory (COLT2007), pp.97-111978, 2007.

P. Dellaportas, J. J. Forster, and I. Ntzoufras, On Bayesian model and variable selection using MCMC, Statistics and Computing, vol.12, issue.1, pp.27-36, 2002.
DOI : 10.1023/A:1013164120801

A. Fischer, On the number of groups in clustering, Statistics & Probability Letters, vol.81, issue.12, pp.1771-1781, 2011.
DOI : 10.1016/j.spl.2011.07.005

S. Gerchinovitz, Prédiction de suites individuelles et cadre statistique classique : étude de quelques liens autour de la régression parcimonieuse et des techniques d'agrégation, 2011.

A. D. Gordon, Classification, volume 82 of Monographs on Statistics and Applied Probability, 1999.

P. J. Green, Reversible jump Markov chain Monte Carlo computation and Bayesian model determination, Biometrika, vol.82, issue.4, pp.711-732, 1995.
DOI : 10.1093/biomet/82.4.711

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.407.8942

B. Guedj and P. Alquier, PAC-Bayesian estimation and prediction in sparse additive models, Electronic Journal of Statistics, vol.7, issue.0, pp.264-291, 2013.
DOI : 10.1214/13-EJS771

URL : https://hal.archives-ouvertes.fr/hal-00722969

B. Guedj and S. Robbiano, PAC-Bayesian High Dimensional Bipartite Ranking. arXiv preprint, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01226472

S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O. Callaghan, Clustering data streams: theory and practice, IEEE Transactions on Knowledge and Data Engineering, vol.15, issue.3, pp.511-528, 2003.
DOI : 10.1109/TKDE.2003.1198387

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.108.9085

J. A. Hartigan, Clustering Algorithms Wiley Series in Probability and Mathematical Statistics, 1975.

L. Kaufman and P. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, Wiley Series in Probability and Mathematical Statistics, 1990.
DOI : 10.1002/9780470316801

J. Kivinen and M. K. Warmuth, Exponentiated Gradient versus Gradient Descent for Linear Predictors, Information and Computation, vol.132, issue.1, pp.1-63, 1997.
DOI : 10.1006/inco.1996.2612

URL : http://doi.org/10.1006/inco.1996.2612

J. Kivinen and M. K. Warmuth, Averaging Expert Predictions, Computational Learning Theory: 4th European Conference (EuroCOLT '99), pp.153-167, 1999.
DOI : 10.1007/3-540-49097-3_13

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.3.5415

A. N. Kolmogorov and V. M. Tikhomirov, -entropy and -capacity of sets in function spaces, pp.277-364, 1961.
URL : https://hal.archives-ouvertes.fr/hal-00773608

W. J. Krzanowski and Y. T. Lai, A Criterion for Determining the Number of Groups in a Data Set Using Sum-of-Squares Clustering, Biometrics, vol.44, issue.1, pp.23-34, 1988.
DOI : 10.2307/2531893

L. Li, PACBO: PAC-Bayesian Online Clustering, 2016. URL https://CRAN.R-project.org/ package=PACBO

E. Liberty, R. Sriharsha, and M. Sviridenko, An Algorithm for Online K-Means Clustering, 2016 Proceedings of the Eighteenth Workshop on Algorithm Engineering and Experiments (ALENEX), pp.81-89, 2016.
DOI : 10.1137/1.9781611974317.7

URL : http://arxiv.org/abs/1412.5721

N. Littlestone and M. K. Warmuth, The Weighted Majority Algorithm, Information and Computation, vol.108, issue.2, pp.212-216, 1994.
DOI : 10.1006/inco.1994.1009

URL : http://dx.doi.org/10.1006/inco.1994.1009

D. A. Mcallester, Some PAC-Bayesian theorems, Proceedings of the eleventh annual conference on Computational learning theory , COLT' 98, pp.355-363, 1999.
DOI : 10.1145/279943.279989

D. A. Mcallester, PAC-Bayesian model averaging, Proceedings of the twelfth annual conference on Computational learning theory , COLT '99, pp.164-170, 1999.
DOI : 10.1145/307400.307435

G. W. Milligan and M. C. Cooper, An examination of procedures for determining the number of clusters in a data set, Psychometrika, vol.77, issue.2, pp.159-179, 1985.
DOI : 10.1007/BF02294245

A. Petralias and P. Dellaportas, An MCMC model search algorithm for regression problems, Journal of Statistical Computation and Simulation, vol.75, issue.9, pp.1722-1740, 2013.
DOI : 10.1080/01621459.1997.10474045

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.666.3083

C. P. Robert and G. Casella, Monte Carlo Statistical Methods, 2004.

G. O. Roberts and J. S. Rosenthal, Harris recurrence of Metropolis-within-Gibbs and trans-dimensional Markov chains, The Annals of Applied Probability, vol.16, issue.4, pp.2123-2139, 2006.
DOI : 10.1214/105051606000000510

M. Seeger, 10.1162/153244303765208377, CrossRef Listing of Deleted DOIs, vol.7, issue.5, pp.233-269, 2002.
DOI : 10.1016/S0004-3702(98)00002-2

M. Seeger, Bayesian Gaussian Process Models: PAC-Bayesian Generalisation Error Bounds and Sparse Approximations, 2003.
DOI : 10.1162/153244303765208386

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.137.979

J. Shawe-taylor and R. C. Williamson, A PAC analysis of a Bayes estimator, Proceedings of the 10th annual conference on Computational Learning Theory, pp.2-9, 1997.

R. Tibshirani, G. Walther, and T. Hastie, Estimating the number of clusters in a data set via the gap statistic, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.63, issue.2, pp.411-423, 2001.
DOI : 10.1111/1467-9868.00293

V. Vovk, Competitive On-line Statistics, International Statistical Review, vol.20, issue.2, pp.213-248, 2001.
DOI : 10.1093/comjnl/42.4.318

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.103.9745