The Role of Network Topology for Distributed Machine Learning - Archive ouverte HAL Access content directly
Conference Papers Year :

The Role of Network Topology for Distributed Machine Learning

Abstract

Many learning problems are formulated as minimization of some loss function on a training set of examples. Distributed gradient methods on a cluster are often used for this purpose. In this paper, we study how the variability of task execution times at cluster nodes affects the system throughput. In particular, a simple but accurate model allows us to quantity how the time to solve the minimization problem depends on the network of information exchanges among the nodes. Interestingly, we show that, even when communication overhead may be neglected, the clique is not necessarily the most effective topology, as commonly assumed in previous works.
Fichier principal
Vignette du fichier
neglia19infocom(5).pdf (688.31 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-02411164 , version 1 (16-12-2019)

Identifiers

Cite

Giovanni Neglia, Gianmarco Calbi, Don Towsley, Gayane Vardoyan. The Role of Network Topology for Distributed Machine Learning. IEEE INFOCOM 2019 - IEEE Conference on Computer Communications, Apr 2019, Paris, France. pp.2350-2358, ⟨10.1109/INFOCOM.2019.8737602⟩. ⟨hal-02411164⟩
85 View
655 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More