O. Bousquet, S. Boucheron, and G. Lugosi, Introduction to Statistical Learning Theory, Advanced Lectures on Machine Learning, pp.169-207, 2004.
DOI : 10.1007/3-540-45435-7_5

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.63.2036

N. Cesa-bianchi and G. Lugosi, Prediction, learning, and games, 2006.
DOI : 10.1017/CBO9780511546921

O. Dekel and Y. Singer, Data-driven online to batch conversions, Advances in Neural Information Processing Systems 18 (NIPS), pp.267-274, 2006.

J. Abernethy, A. Agarwal, P. L. Bartlett, and A. Rakhlin, A stochastic view of optimal regret through minimax duality, Proceedings of the 22nd Conference on Learning Theory (COLT), 2009.

Y. Kalnishkan and M. V. Vyugin, The weak aggregating algorithm and weak mixability, Journal of Computer and System Sciences, vol.74, issue.8, pp.1228-1244, 2008.
DOI : 10.1016/j.jcss.2007.08.003

URL : http://dx.doi.org/10.1016/j.jcss.2007.08.003

V. Vovk, A game of prediction with expert advice, Proceedings of the 8th Conference on Learning Theory (COLT), pp.51-60, 1995.

E. Mammen and A. B. Tsybakov, Smooth discrimination analysis. The Annals of Statistics, pp.1808-1829, 1999.

A. B. Tsybakov, Optimal aggregation of classifiers in statistical learning. The Annals of Statistics, pp.135-166, 2004.
URL : https://hal.archives-ouvertes.fr/hal-00102142

J. L. Doob, Application of the theory of martingales, Le Calcul de Probabilités et ses Applications. Colloques Internationaux du Centre National de la Recherche Scientifique, pp.23-27, 1949.

A. Barron and T. Cover, Minimum complexity density estimation, IEEE Transactions on Information Theory, vol.37, issue.4, pp.1034-1054, 1991.
DOI : 10.1109/18.86996

T. Zhang, From ?? -entropy to KL-entropy: Analysis of minimum information complexity density estimation, The Annals of Statistics, vol.34, issue.5, pp.2180-2210, 2006.
DOI : 10.1214/009053606000000704

J. Li, Estimation of Mixture Models, 1999.

B. Kleijn, A. Van, and . Vaart, Misspecification in infinite-dimensional Bayesian statistics, The Annals of Statistics, vol.34, issue.2, 2006.
DOI : 10.1214/009053606000000029

URL : http://arxiv.org/abs/math/0607023

P. Grünwald, Safe learning: bridging the gap between Bayes, MDL and statistical learning theory via empirical convexity, Proceedings of the 24th Conference on Learning Theory (COLT), 2011.

A. Chernov, Y. Kalnishkan, F. Zhdanov, and V. Vovk, Supermartingales in prediction with expert advice, Theoretical Computer Science, vol.411, issue.29-30, pp.2647-2669, 2010.
DOI : 10.1016/j.tcs.2010.04.003

E. Vernet, R. C. Williamson, and M. D. Reid, Composite multiclass losses, Advances in Neural Information Processing Systems 24 (NIPS), 2011.

P. Grünwald, The Minimum Description Length Principle, 2007.

T. Van-erven, M. Reid, and R. Williamson, Mixability is Bayes risk curvature relative to log loss, Proceedings of the 24th Conference on Learning Theory (COLT), 2011.
URL : https://hal.archives-ouvertes.fr/hal-00758204

S. Arlot and P. L. Bartlett, Margin-adaptive model selection in statistical learning, Bernoulli, vol.17, issue.2, pp.687-713, 2011.
DOI : 10.3150/10-BEJ288

URL : https://hal.archives-ouvertes.fr/hal-00274327

T. Zhang, Information-theoretic upper and lower bounds for statistical estimation, IEEE Transactions on Information Theory, vol.52, issue.4, pp.1307-1321, 2006.
DOI : 10.1109/TIT.2005.864439

V. Vapnik, Statistical Learning Theory, 1998.

J. Audibert, PAC-Bayesian statistical learning theory, 2004.

O. Catoni, PAC-Bayesian Supervised Classification, Lecture Notes-Monograph Series. IMS, 2007.
DOI : 10.1007/978-3-319-21852-6_20

URL : https://hal.archives-ouvertes.fr/hal-00206119

W. Lee, P. Bartlett, and R. Williamson, The importance of convexity in learning with squared loss, Proceedings of the ninth annual conference on Computational learning theory , COLT '96, pp.1974-1980, 1998.
DOI : 10.1145/238061.238082

J. Audibert, A better variance control for PAC-Bayesian classification, Preprint 905, 2004.