Online Event Correlations Analysis in System Logs of Large-Scale Cluster Systems - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2010

Online Event Correlations Analysis in System Logs of Large-Scale Cluster Systems

Résumé

It has been long recognized that failure events are correlated, not independent. Previous research efforts have shown the correlation analysis of system logs is helpful to resource allocation, job scheduling and proactive management. However, previous log analysis methods analyze the history logs offline. They fail to capture the dynamic change of system errors and failures. In this paper, we purpose an online log analysis approach to mine event correlations in system logs of large-scale cluster systems. Our contributions are three-fold: first, we analyze the event correlations of system logs of a 260-nodes production Hadoop cluster system, and the result shows that the correlation rules of logs change dramatically in different periods; Second, we present a online log analysis algorithm Apriori-SO; third, based on the online event correlations mining, we present an online event prediction method that can predict diversities of failure events with the great detail. The experiment result of a 260-nodes production Hadoop cluster system shows that our online log analysis algorithm can analyze the log streams to obtain event correlation rules in soft real time, and our online event prediction method can achieve higher precision rate and recall rate than the offline log analysis approach.
Fichier principal
Vignette du fichier
Online_Event_Correlations_Analysis_in_System_logs_of_Large-scale_Cluster_Systems.pdf (166 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01054978 , version 1 (11-08-2014)

Licence

Paternité

Identifiants

Citer

Wei Zhou, Jianfeng Zhan, Dan Meng, Zhihong Zhang. Online Event Correlations Analysis in System Logs of Large-Scale Cluster Systems. IFIP International Conference on Network and Parallel Computing (NPC), Sep 2010, Zhengzhou, China. pp.262-276, ⟨10.1007/978-3-642-15672-4_23⟩. ⟨hal-01054978⟩
122 Consultations
904 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More