Skip to Main content Skip to Navigation
Conference papers

The Role of Network Topology for Distributed Machine Learning

Abstract : Many learning problems are formulated as minimization of some loss function on a training set of examples. Distributed gradient methods on a cluster are often used for this purpose. In this paper, we study how the variability of task execution times at cluster nodes affects the system throughput. In particular, a simple but accurate model allows us to quantity how the time to solve the minimization problem depends on the network of information exchanges among the nodes. Interestingly, we show that, even when communication overhead may be neglected, the clique is not necessarily the most effective topology, as commonly assumed in previous works.
Complete list of metadata

Cited literature [22 references]  Display  Hide  Download
Contributor : Giovanni Neglia Connect in order to contact the contributor
Submitted on : Monday, December 16, 2019 - 5:17:36 PM
Last modification on : Tuesday, November 17, 2020 - 12:10:13 PM
Long-term archiving on: : Tuesday, March 17, 2020 - 10:10:34 PM


Files produced by the author(s)




Giovanni Neglia, Gianmarco Calbi, Don Towsley, Gayane Vardoyan. The Role of Network Topology for Distributed Machine Learning. IEEE INFOCOM 2019 - IEEE Conference on Computer Communications, Apr 2019, Paris, France. pp.2350-2358, ⟨10.1109/INFOCOM.2019.8737602⟩. ⟨hal-02411164⟩



Record views


Files downloads