P. Alquier and B. Guedj, Simpler PAC-Bayesian Bounds for Hostile Data, Machine Learning, vol.107, pp.887-902, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01385064

H. Arora, M. Khandeparkar, O. Khodak, N. Plevrakis, and . Saunshi, A Theoretical Analysis of Contrastive Unsupervised Representation Learning, ICML, pp.5628-5637, 2019.

Y. Bengio, A. Courville, and P. Vincent, Representation Learning: A Review and New Perspectives, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.8, pp.1798-1828, 2013.

M. Caron, P. Bojanowski, A. Joulin, and M. Douze, Deep Clustering for Unsupervised Learning of Visual Features, In ECCV, 2018.

O. Catoni, A PAC-Bayesian Approach to Adaptive Classification. preprint, vol.840, 2003.

O. Catoni, Statistical Learning Theory and Stochastic Optimization: Ecole d'Eté de Probabilités de Saint-Flour XXXI-2001, 2004.

O. Catoni, PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning, vol.56, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00206119

I. Csiszár and P. C. Shields, Information Theory and Statistics: A Tutorial, Foundations and Trends R in Communications and Information Theory, vol.1, issue.4, pp.417-528, 2004.

J. Devlin, M. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL-HLT, pp.4171-4186, 2019.

G. K. Dziugaite and D. M. Roy, Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data, UAI, 2017.

B. Guedj, , 2019.

M. Higgs and J. Shawe-taylor, A PAC-Bayes Bound for Tailored Density Estimation, ALT, 2010.

M. W. Kadous, Temporal classification: Extending the classification paradigm to multivariate time series, 2002.

P. Kingma and J. L. Ba, Adam: A Method for Stochastic Optimization, ICLR, 2015.

A. Krizhevsky, Learning Multiple Layers of Features from Tiny Images, 2009.

G. Letarte, P. Germain, B. Guedj, and F. Laviolette, Dichotomize and generalize: Pac-bayesian binary activated deep neural networks, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02139432

L. Logeswaran and H. Lee, An Efficient Framework for Learning Sentence Representations, ICLR, 2018.

D. A. Mcallester, Some PAC-Bayesian Theorems, COLT, pp.230-234, 1998.

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, Distributed Representations of Words and Phrases and their Compositionality, NeurIPS, 2013.

M. Noroozi and P. Favaro, Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles, ECCV, 2016.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang et al., Automatic Differentiation in PyTorch, NeurIPS Workshop, 2017.

J. Shawe-taylor and R. C. Williamson, A PAC Analysis of a Bayesian Estimator, COLT, pp.2-9, 1997.

K. Simonyan and A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, ICLR, 2015.

T. Tieleman and G. Hinton, Lecture 6.5-RmsProp: Divide the Gradient by a Running Average of its Recent Magnitude, STOC, pp.436-445, 1984.

R. Zhang, P. Isola, and A. A. Efros, Colorful Image Colorization, ECCV, 2016.