Multi-scale analysis of large distributed computing systems

Lucas Mello Schnorr 1 Arnaud Legrand 1, 2 Jean-Marc Vincent 1
1 MESCAL - Middleware efficiently scalable
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : Large scale distributed systems are composed of many thousands of computing units. Today's examples of such systems are grid, volunteer and cloud computing platforms. Generally, their analyses are done through monitoring tools that gather resource information like processor or network utilization, providing high-level statistics and basic resource usage traces. Such approaches are recognized as rather scalable but are unfortunately often insufficient to detect or fully understand unexpected behavior. In this paper, we investigate the use of more detailed tracing techniques --commonly used in parallel computing-- in distributed systems. Finely analyzing the behavior of such systems comprising thousands of resources over several months may seem infeasible. Yet, we show that the resulting trace can be analyzed using tools that enable to easily zoom in and out on selected area of space and time. We use the BOINC volunteer computing system as a basis of this study. Since detailed activity traces of the BOINC clients are not available yet, we rely instead on traces obtained through a BOINC simulator developed with the SimGrid toolkit and which uses as input real availability trace files from the Seti@Home BOINC project. We show that the analysis of such detailed resource utilization traces provides several non-trivial insights about the whole system and enables the discovery of unexpected behavior.
Type de document :
Communication dans un congrès
Proceedings of the third international workshop on Large-scale system and application performance, Jun 2011, San Jose, CA, United States. ACM, pp.27--34, 2011, 〈http://doi.acm.org/10.1145/1996029.1996037〉. 〈10.1145/1996029.1996037〉
Liste complète des métadonnées

https://hal.inria.fr/inria-00627754
Contributeur : Lucas Mello Schnorr <>
Soumis le : vendredi 30 septembre 2011 - 10:45:55
Dernière modification le : lundi 5 octobre 2015 - 17:00:39
Document(s) archivé(s) le : mardi 13 novembre 2012 - 14:55:40

Identifiants

Collections

INRIA | LIG | UGA

Citation

Lucas Mello Schnorr, Arnaud Legrand, Jean-Marc Vincent. Multi-scale analysis of large distributed computing systems. Proceedings of the third international workshop on Large-scale system and application performance, Jun 2011, San Jose, CA, United States. ACM, pp.27--34, 2011, 〈http://doi.acm.org/10.1145/1996029.1996037〉. 〈10.1145/1996029.1996037〉. 〈inria-00627754〉

Partager

Métriques

Consultations de
la notice

299

Téléchargements du document

196