Detection and analysis of resource usage anomalies in large distributed systems through multi-scale visualization

Lucas Mello Schnorr 1 Arnaud Legrand 1 Jean-Marc Vincent 1
1 MESCAL - Middleware efficiently scalable
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : Understanding the behavior of large scale distributed systems is generally extremely difficult as it requires to observe a very large number of components over very large time.Most analysis tools for distributed systems gather basic information such as individual processor or network utilization. Although scalable because of the data reduction techniques applied before the analysis, these tools are often insufficient to detect or fully understand anomalies in the dynamic behavior of resource utilization and their influence on the applications performance.In this paper, we propose a methodology for detecting resource usage anomalies in large scale distributed systems. The methodology relies on four functionalities: characterized trace collection, multi-scale data aggregation, specifically tailored user interaction techniques, and visualization techniques. We show the efficiency of this approach through the analysis of simulations of the volunteer computing Berkeley Open Infrastructure for Network Computing architecture. Three scenarios are analyzed in this paper: analysis of the resource sharing mechanism, resource usage considering response time instead of throughput, and the evaluation of input file size on Berkeley Open Infrastructure for Network Computing architecture. The results show that our methodology enables to easily identify resource usage anomalies, such as unfair resource sharing, contention, moving network bottlenecks, and harmful short-term resource sharing.
Type de document :
Article dans une revue
Concurrency and Computation: Practice and Experience, Wiley, 2011, 24, pp.1792-1816. 〈10.1002/cpe.1885〉
Liste complète des métadonnées

https://hal.inria.fr/hal-00788767
Contributeur : Arnaud Legrand <>
Soumis le : vendredi 15 février 2013 - 11:09:53
Dernière modification le : mercredi 14 décembre 2016 - 01:08:42

Identifiants

Collections

Citation

Lucas Mello Schnorr, Arnaud Legrand, Jean-Marc Vincent. Detection and analysis of resource usage anomalies in large distributed systems through multi-scale visualization. Concurrency and Computation: Practice and Experience, Wiley, 2011, 24, pp.1792-1816. 〈10.1002/cpe.1885〉. 〈hal-00788767〉

Partager

Métriques

Consultations de la notice

217