Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Throughput-Optimal Topology Design for Cross-Silo Federated Learning

Othmane Marfoq 1 Chuan Xu 1 Giovanni Neglia 1 Richard Vidal 2
1 NEO - Network Engineering and Operations
CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : Federated learning usually employs a client-server architecture where an orchestrator iteratively aggregates model updates from remote clients and pushes them back a refined model. This approach may be inefficient in cross-silo settings, as close-by data silos with high-speed access links may exchange information faster than with the orchestrator, and the orchestrator may become a communication bottleneck. In this paper we define the problem of topology design for cross-silo federated learning using the theory of max-plus linear systems to compute the system throughput---number of communication rounds per time unit. We also propose practical algorithms that, under the knowledge of measurable network characteristics, find a topology with the largest throughput or with provable throughput guarantees. In realistic Internet networks with 10 Gbps access links for silos, our algorithms speed up training by a factor 9 and 1.5 in comparison to the master-slave architecture and to state-of-the-art MATCHA, respectively. Speedups are even larger with slower access links.
Complete list of metadata

Cited literature [111 references]  Display  Hide  Download

https://hal.inria.fr/hal-03007834
Contributor : Othmane Marfoq Connect in order to contact the contributor
Submitted on : Tuesday, November 17, 2020 - 11:48:48 AM
Last modification on : Tuesday, October 5, 2021 - 8:52:01 PM
Long-term archiving on: : Thursday, February 18, 2021 - 7:15:05 PM

File

2010.12229.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03007834, version 1
  • ARXIV : 2010.12229

Citation

Othmane Marfoq, Chuan Xu, Giovanni Neglia, Richard Vidal. Throughput-Optimal Topology Design for Cross-Silo Federated Learning. 2020. ⟨hal-03007834v1⟩

Share

Metrics

Les métriques sont temporairement indisponibles