The Role of Network Topology for Distributed Machine Learning

Abstract : Many learning problems are formulated as minimization of some loss function on a training set of examples. Distributed gradient methods on a cluster are often used for this purpose. In this paper, we study how the variability of task execution times at cluster nodes affects the system throughput. In particular, a simple but accurate model allows us to quantity how the time to solve the minimization problem depends on the network of information exchanges among the nodes. Interestingly, we show that, even when communication overhead may be neglected, the clique is not necessarily the most effective topology, as commonly assumed in previous works.
Complete list of metadatas

Cited literature [22 references]  Display  Hide  Download

https://hal.inria.fr/hal-02411164
Contributor : Giovanni Neglia <>
Submitted on : Monday, December 16, 2019 - 5:17:36 PM
Last modification on : Tuesday, January 21, 2020 - 3:22:15 PM

File

neglia19infocom(5).pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Giovanni Neglia, Gianmarco Calbi, Don Towsley, Gayane Vardoyan. The Role of Network Topology for Distributed Machine Learning. IEEE INFOCOM 2019 - IEEE Conference on Computer Communications, Apr 2019, Paris, France. pp.2350-2358, ⟨10.1109/INFOCOM.2019.8737602⟩. ⟨hal-02411164⟩

Share

Metrics

Record views

52

Files downloads

226