Communication-efficient Federated Learning through Clustering optimization

Hugo Miralles; Tamara Tosic; Michel Riveill

Communication Dans Un Congrès Année : 2021

Communication-efficient Federated Learning through Clustering optimization

(1, 2) , (1) , (2)

1
2

Hugo Miralles

Fonction : Auteur
PersonId : 1120351

Orange Labs [Sophia Antipolis]

Modèles et algorithmes pour l’intelligence artificielle

Tamara Tosic

Fonction : Auteur

Orange Labs [Sophia Antipolis]

Michel Riveill

Fonction : Auteur
PersonId : 3769
IdHAL : michel-riveill
ORCID : 0000-0001-6726-6637
IdRef : 033194297

Modèles et algorithmes pour l’intelligence artificielle

Résumé

We study the problem of model personalization in Federated Learning (FL) with non-IID (Independent and Identically Distributed) data collected at nodes in a network, under the network communication cost constraints. Classical FL collaboratively trains a unique global model. If data is statistically heterogenic (non-IID), personalized models for groups of nodes with similar statistics have been shown to provide better performances compared to FL [1]. We propose a Clustered Federated Learning approach that provides a trade-off between identifying models that are more adapted to nodes locally, under communication cost constraints. Our method identifies clusters of nodes with similar data statistics, which improves the local model accuracy. In particular, it aims at finding the cluster structure, cluster heads and a set of model weights (one per cluster) that minimize an objective function composed of two terms: a classical multi-task optimization term and a communication cost regularization. Local model updates represent proxy values of the local data distributions (statistically similar train sets have similar updates) where similar updates are aggregated together [2,3,4]. Our algorithm has two phases: initialization and cluster optimization. During the initialization, nodes collaboratively train a global initial model. The cluster head nodes are identified and nodes are clustered based only on the communication cost minimization [5]. The cluster optimization phase starts by applying the Hierarchical Agglomerative Clustering on a distance metric composed of two terms: the cosine dissimilarity between the locally computed model updates of two nodes, and the communication cost of grouping two nodes in the same cluster. In parallel, respective cluster heads are also optimized. The clusters are organized in a tree hierarchy. At each round, the cluster heads verify if a new cluster optimization is needed based on the model update values. If required, the same method is applied to further create sub-clusters. We evaluate our method on several non-IID settings generated from MNIST dataset, while simulating the communication cost at each round. We show that our algorithm improves the quantity of nodes reaching 99% of accuracy (from 48% to 72%) and can reduce the overall communication cost by 35%. Finally, it is able to adapt the cluster structure in case of new conditions (new network nodes or time-evolution of local data distribution) by a tree structure search.

Domaines

Statistiques [stat] Informatique [cs] Mathématiques [math]

Fichier principal

Abstract_vfinale.pdf (175.98 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Hugo Miralles : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03479640

Soumis le : vendredi 17 décembre 2021-11:26:11

Dernière modification le : jeudi 14 mars 2024-03:14:53

Archivage à long terme le : vendredi 18 mars 2022-18:11:19

Dates et versions

hal-03479640 , version 1 (17-12-2021)

Identifiants

HAL Id : hal-03479640 , version 1

Citer

Hugo Miralles, Tamara Tosic, Michel Riveill. Communication-efficient Federated Learning through Clustering optimization. SophI.A. Summit, Nov 2021, Biot, France. ⟨hal-03479640⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA I3S DIEUDONNE INRIA2 UNIV-COTEDAZUR

188 Consultations

67 Téléchargements

Communication-efficient Federated Learning through Clustering optimization

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager