Data Stream Clustering with Affinity Propagation

Xiangliang Zhang 1, 2 Cyril Furtlehner 1 Cecile Germain-Renaud 3 Michèle Sebag 3
1 TAO - Machine Learning and Optimisation
CNRS - Centre National de la Recherche Scientifique : UMR8623, Inria Saclay - Ile de France, UP11 - Université Paris-Sud - Paris 11, LRI - Laboratoire de Recherche en Informatique
Abstract : Data stream clustering provides insights into the under- lying patterns of data flows. This paper focuses on selecting the best representatives from clusters of streaming data. There are two main challenges: how to cluster with the best representatives and how to handle the evolving patterns that are important characteristics of streaming data with dynamic distributions. We employ the Affinity Propagation (AP) algorithm presented in 2007 by Frey and Dueck for the first challenge, as it offers good guarantees of clustering optimality for selecting exemplars. The second challenging problem is solved by change detection. The presented STRAP algorithm com- bines AP with a statistical change point detection test; the clustering model is rebuilt whenever the test detects a change in the underlying data distribution. Besides the validation on two benchmark data sets, the presented algorithm is validated on a real-world application, monitoring the data flow of jobs submitted to the EGEE grid.
Type de document :
Article dans une revue
IEEE Transactions on Knowledge and Data Engineering, Institute of Electrical and Electronics Engineers, 2014, 26 (7), pp.1
Liste complète des métadonnées

Littérature citée [40 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00862941
Contributeur : Cecile Germain <>
Soumis le : mardi 17 septembre 2013 - 21:24:40
Dernière modification le : jeudi 11 janvier 2018 - 06:22:14
Document(s) archivé(s) le : jeudi 6 avril 2017 - 21:44:19

Fichier

strap_final_revision.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00862941, version 1

Collections

Citation

Xiangliang Zhang, Cyril Furtlehner, Cecile Germain-Renaud, Michèle Sebag. Data Stream Clustering with Affinity Propagation. IEEE Transactions on Knowledge and Data Engineering, Institute of Electrical and Electronics Engineers, 2014, 26 (7), pp.1. 〈hal-00862941〉

Partager

Métriques

Consultations de la notice

724

Téléchargements de fichiers

1159