On the almost sure convergence of stochastic gradient descent in non-convex problems - Archive ouverte HAL Access content directly
Conference Papers Year :

On the almost sure convergence of stochastic gradient descent in non-convex problems

(1, 2) , (3) , (3) , (3)
1
2
3

Abstract

This paper analyzes the trajectories of stochastic gradient descent (SGD) to help understand the algorithm's convergence properties in non-convex problems. We first show that the sequence of iterates generated by SGD remains bounded and converges with probability 1 under a very broad range of step-size schedules. Subsequently, going beyond existing positive probability guarantees, we show that SGD avoids strict saddle points/manifolds with probability 1 for the entire spectrum of step-size policies considered. Finally, we prove that the algorithm's rate of convergence to Hurwicz minimizers is O(1/n p) if the method is employed with a Θ(1/n p) step-size. This provides an important guideline for tuning the algorithm's step-size as it suggests that a cool-down phase with a vanishing step-size could lead to faster convergence; we demonstrate this heuristic using ResNet architectures on CIFAR.
Fichier principal
Vignette du fichier
NonConvexSGD-NIPS.pdf (2.64 Mo) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-03043771 , version 1 (07-12-2020)

Identifiers

  • HAL Id : hal-03043771 , version 1

Cite

Panayotis Mertikopoulos, Nadav Hallak, Ali Kavis, Volkan Cevher. On the almost sure convergence of stochastic gradient descent in non-convex problems. NeurIPS 2020 - 34th International Conference on Neural Information Processing Systems, Dec 2020, Vancouver, Canada. pp.1-32. ⟨hal-03043771⟩
121 View
273 Download

Share

Gmail Facebook Twitter LinkedIn More