Visualization and Detection of Resource Usage Anomalies in Large Scale Distributed Systems

Lucas Mello Schnorr 1, * Arnaud Legrand 1 Jean-Marc Vincent 1
* Corresponding author
1 MESCAL - Middleware efficiently scalable
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : Understanding the behavior of large scale distributed systems such as clouds, computing grids or volunteer computing systems is generally extremely difficult and tedious as it requires to observe a very large number of components over a very large period of time. The analysis of distributed systems generally begins with gathering resource utilization monitoring data through the use of observation tools. This information can then be explored with different analysis techniques to understand the reason behind anomalies that can be present in the system. This paper follows the same two-phase approach but proposes some methods that reveal particularly well suited to the study of very large scale distributed systems. More specifically, in the first phase, we register resource utilization categorized according to application components. The second phase proposes various \emph{ad hoc} different visualization techniques enabling easy navigation through space and time. We demonstrate the efficiency of this approach through the analysis of simulations of the famous volunteer computing BOINC architecture. These simulations rely on the SimGrid framework, to which our analysis techniques have been incorporated. Three scenarios are analyzed in this paper: analysis of the resource sharing mechanism, resource usage of projects that aim at optimizing response time instead of throughput, and the impact of input file size on such an architecture. The results show that our approach allows an easy identification of different types of resource usage anomalies, unfair resource sharing, contention, moving network bottlenecks, and suboptimal resource usage.
Complete list of metadatas

Cited literature [6 references]  Display  Hide  Download

https://hal.inria.fr/inria-00529569
Contributor : Lucas Mello Schnorr <>
Submitted on : Tuesday, October 26, 2010 - 1:50:48 AM
Last modification on : Thursday, October 11, 2018 - 8:48:02 AM
Long-term archiving on : Friday, October 26, 2012 - 12:21:16 PM

File

RR-7438.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00529569, version 1

Collections

Citation

Lucas Mello Schnorr, Arnaud Legrand, Jean-Marc Vincent. Visualization and Detection of Resource Usage Anomalies in Large Scale Distributed Systems. [Research Report] RR-7438, INRIA. 2010. ⟨inria-00529569⟩

Share

Metrics

Record views

453

Files downloads

225