Detection and analysis of resource usage anomalies in large distributed systems through multi-scale visualization - Archive ouverte HAL Access content directly
Journal Articles Concurrency and Computation: Practice and Experience Year : 2011

Detection and analysis of resource usage anomalies in large distributed systems through multi-scale visualization

(1) , (1) , (1)
1
Lucas Mello Schnorr
Arnaud Legrand
Jean-Marc Vincent

Abstract

Understanding the behavior of large scale distributed systems is generally extremely difficult as it requires to observe a very large number of components over very large time.Most analysis tools for distributed systems gather basic information such as individual processor or network utilization. Although scalable because of the data reduction techniques applied before the analysis, these tools are often insufficient to detect or fully understand anomalies in the dynamic behavior of resource utilization and their influence on the applications performance.In this paper, we propose a methodology for detecting resource usage anomalies in large scale distributed systems. The methodology relies on four functionalities: characterized trace collection, multi-scale data aggregation, specifically tailored user interaction techniques, and visualization techniques. We show the efficiency of this approach through the analysis of simulations of the volunteer computing Berkeley Open Infrastructure for Network Computing architecture. Three scenarios are analyzed in this paper: analysis of the resource sharing mechanism, resource usage considering response time instead of throughput, and the evaluation of input file size on Berkeley Open Infrastructure for Network Computing architecture. The results show that our methodology enables to easily identify resource usage anomalies, such as unfair resource sharing, contention, moving network bottlenecks, and harmful short-term resource sharing.

Dates and versions

hal-00788767 , version 1 (15-02-2013)

Identifiers

Cite

Lucas Mello Schnorr, Arnaud Legrand, Jean-Marc Vincent. Detection and analysis of resource usage anomalies in large distributed systems through multi-scale visualization. Concurrency and Computation: Practice and Experience, 2011, 24, pp.1792-1816. ⟨10.1002/cpe.1885⟩. ⟨hal-00788767⟩
97 View
0 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More