The Role of Network Topology for Distributed Machine Learning

Giovanni Neglia; Gianmarco Calbi; Don Towsley; Gayane Vardoyan

doi:10.1109/INFOCOM.2019.8737602

Communication Dans Un Congrès Année : 2019

The Role of Network Topology for Distributed Machine Learning

(1) , (1) , (2) , (3)

1
2
3

Giovanni Neglia

Fonction : Auteur
PersonId : 1683
IdHAL : giovanni-neglia
ORCID : 0000-0001-8779-0620
IdRef : 18310966X

Network Engineering and Operations

Gianmarco Calbi

Fonction : Auteur

Network Engineering and Operations

Don Towsley

Fonction : Auteur

Department of Computer Science [Amherst]

Gayane Vardoyan

Fonction : Auteur
PersonId : 1042556

University of Massachusetts [Amherst]

Résumé

Many learning problems are formulated as minimization of some loss function on a training set of examples. Distributed gradient methods on a cluster are often used for this purpose. In this paper, we study how the variability of task execution times at cluster nodes affects the system throughput. In particular, a simple but accurate model allows us to quantity how the time to solve the minimization problem depends on the network of information exchanges among the nodes. Interestingly, we show that, even when communication overhead may be neglected, the clique is not necessarily the most effective topology, as commonly assumed in previous works.

Mots clés

Throughput Task analysis Convergence Computational modeling Synchronization Network topology Servers

Domaines

Calcul parallèle, distribué et partagé [cs.DC] Apprentissage [cs.LG] Réseaux et télécommunications [cs.NI]

Fichier principal

neglia19infocom(5).pdf (688.31 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Giovanni Neglia : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02411164

Soumis le : lundi 16 décembre 2019-17:17:36

Dernière modification le : mercredi 15 mars 2023-08:58:09

Archivage à long terme le : mardi 17 mars 2020-22:10:34

Dates et versions

hal-02411164 , version 1 (16-12-2019)

Identifiants

HAL Id : hal-02411164 , version 1
DOI : 10.1109/INFOCOM.2019.8737602

Citer

Giovanni Neglia, Gianmarco Calbi, Don Towsley, Gayane Vardoyan. The Role of Network Topology for Distributed Machine Learning. IEEE INFOCOM 2019 - IEEE Conference on Computer Communications, Apr 2019, Paris, France. pp.2350-2358, ⟨10.1109/INFOCOM.2019.8737602⟩. ⟨hal-02411164⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRIA INRIA2 UNIV-COTEDAZUR

93 Consultations

725 Téléchargements

The Role of Network Topology for Distributed Machine Learning

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager