Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize

Mert Gürbüzbalaban; Yuanhan Hu; Umut Şimşekli; Lingjiong Zhu

Article Dans Une Revue Transactions on Machine Learning Research Journal Année : 2023

Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize

(1) , (2) , (3) , (4)

1
2
3
4

Mert Gürbüzbalaban

Fonction : Auteur

Princeton University

Yuanhan Hu

Fonction : Auteur

Rutgers Business School

Umut Şimşekli

Fonction : Auteur
PersonId : 6757
IdHAL : umut-simsekli
IdRef : 250884003

Statistical Machine Learning and Parsimony

Lingjiong Zhu

Fonction : Auteur

Florida State University [Tallahassee]

Résumé

Cyclic and randomized stepsizes are widely used in the deep learning practice and can often outperform standard stepsize choices such as constant stepsize in SGD. Despite their empirical success, not much is currently known about when and why they can theoretically improve the generalization performance. We consider a general class of Markovian stepsizes for learning, which contain i.i.d. random stepsize, cyclic stepsize as well as the constant stepsize as special cases, and motivated by the literature which shows that heaviness of the tails (measured by the so-called "tail-index") in the SGD iterates is correlated with generalization, we study tail-index and provide a number of theoretical results that demonstrate how the tail-index varies on the stepsize scheduling. Our results bring a new understanding of the benefits of cyclic and randomized stepsizes compared to constant stepsize in terms of the tail behavior. We illustrate our theory on linear regression experiments and show through deep learning experiments that Markovian stepsizes can achieve even a heavier tail and be a viable alternative to cyclic and i.i.d. randomized stepsize rules.

Domaines

Informatique [cs]

Umut Şimşekli : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-04478948

Soumis le : mardi 27 février 2024-02:05:20

Dernière modification le : vendredi 19 avril 2024-16:18:58

Dates et versions

hal-04478948 , version 1 (27-02-2024)

Licence

Paternité

Identifiants

HAL Id : hal-04478948 , version 1
ARXIV : 2302.05516

Citer

Mert Gürbüzbalaban, Yuanhan Hu, Umut Şimşekli, Lingjiong Zhu. Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize. Transactions on Machine Learning Research Journal, 2023. ⟨hal-04478948⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS INRIA INRIA2 PSL ANR PRAIRIE-IA

13 Consultations

0 Téléchargements

Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize

Résumé

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager