SGD with Variance Reduction beyond Empirical Risk Minimization, 2015. ,
Katyusha: The first direct acceleration of stochastic gradient methods, 2016. ,
Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters, Advances in Neural Information Processing Systems (NIPS), 2016. ,
Optimization Methods for Large-Scale Machine Learning, 2016. ,
Invariant Scattering Convolution Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.8, pp.1872-1886, 2013. ,
DOI : 10.1109/TPAMI.2012.230
URL : http://arxiv.org/abs/1203.1513
An Analysis of Single-Layer Networks in Unsupervised Feature Learning, International Conference on Artificial Intelligence and Statistics (AISTATS), 2011. ,
Saga: A fast incremental gradient method with support for non-strongly convex composite objectives, Advances in Neural Information Processing Systems (NIPS), 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01016843
Finito: A faster, permutable incremental gradient method for big data problems, International Conference on Machine Learning (ICML), 2014. ,
Privacy Aware Learning, Advances in Neural Information Processing Systems (NIPS), 2012. ,
DOI : 10.1145/2666468
Efficient online and batch learning using forward backward splitting, Journal of Machine Learning Research (JMLR), vol.10, pp.2899-2934, 2009. ,
Deep Residual Learning for Image Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. ,
DOI : 10.1109/CVPR.2016.90
URL : http://arxiv.org/abs/1512.03385
Convex analysis and minimization algorithms I: Fundamentals. Springer science & business media, 1993. ,
DOI : 10.1007/978-3-662-02796-7
Variance Reduced Stochastic Gradient Descent with Neighbors, Advances in Neural Information Processing Systems (NIPS), 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01248672
Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems (NIPS), 2013. ,
A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method. arXiv:1212, 2002. ,
URL : https://hal.archives-ouvertes.fr/hal-00768187
An optimal randomized incremental gradient method, 2015. ,
A Universal Catalyst for First-Order Optimization, Advances in Neural Information Processing Systems (NIPS), 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01160728
Training invariant support vector machines using selective sampling, Large Scale Kernel Machines, pp.301-320, 2007. ,
Learning word vectors for sentiment analysis, The 49th Annual Meeting of the Association for Computational Linguistics (ACL), pp.142-150, 2011. ,
Stability selection, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.7, issue.4, pp.417-473, 2010. ,
DOI : 10.1111/j.1467-9868.2010.00740.x
Robust Stochastic Approximation Approach to Stochastic Programming, SIAM Journal on Optimization, vol.19, issue.4, pp.1574-1609, 2009. ,
DOI : 10.1137/070704277
URL : https://hal.archives-ouvertes.fr/hal-00976649
Introductory Lectures on Convex Optimization, 2004. ,
DOI : 10.1007/978-1-4419-8853-9
Transformation Pursuit for Image Classification, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014. ,
DOI : 10.1109/CVPR.2014.466
URL : https://hal.archives-ouvertes.fr/hal-00979464
Minimizing finite sums with the stochastic average gradient, Mathematical Programming, vol.24, issue.2, pp.83-112, 2017. ,
DOI : 10.1007/s10107-016-1030-6
URL : https://hal.archives-ouvertes.fr/hal-00860051
SDCA without Duality, Regularization, and Individual Convexity, International Conference on Machine Learning (ICML), 2016. ,
Stochastic dual coordinate ascent methods for regularized loss minimization, Journal of Machine Learning Research, vol.14, pp.567-599, 2013. ,
Transformation Invariance in Pattern Recognition ? Tangent Distance and Tangent Propagation, Neural Networks: Tricks of the Trade, number 1524 in Lecture Notes in Computer Science, pp.239-274, 1998. ,
DOI : 10.1007/3-540-49430-8_13
URL : https://hal.archives-ouvertes.fr/halshs-00009505
A Gene-Expression Signature as a Predictor of Survival in Breast Cancer, New England Journal of Medicine, vol.347, issue.25, pp.1999-2009, 2002. ,
DOI : 10.1056/NEJMoa021967
Learning with marginalized corrupted features, International Conference on Machine Learning (ICML), 2013. ,
Altitude Training: Strong Bounds for Single-layer Dropout, Advances in Neural Information Processing Systems (NIPS), 2014. ,
Dual averaging methods for regularized stochastic learning and online optimization, Journal of Machine Learning Research (JMLR), vol.11, pp.2543-2596, 2010. ,
A Proximal Stochastic Gradient Method with Progressive Variance Reduction, SIAM Journal on Optimization, vol.24, issue.4, pp.2057-2075, 2014. ,
DOI : 10.1137/140961791
URL : http://arxiv.org/abs/1403.4699
Improving the Robustness of Deep Neural Networks via Stability Training, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. ,
DOI : 10.1109/CVPR.2016.485