Online Event Correlations Analysis in System Logs of Large-Scale Cluster Systems

Abstract : It has been long recognized that failure events are correlated, not independent. Previous research efforts have shown the correlation analysis of system logs is helpful to resource allocation, job scheduling and proactive management. However, previous log analysis methods analyze the history logs offline. They fail to capture the dynamic change of system errors and failures. In this paper, we purpose an online log analysis approach to mine event correlations in system logs of large-scale cluster systems. Our contributions are three-fold: first, we analyze the event correlations of system logs of a 260-nodes production Hadoop cluster system, and the result shows that the correlation rules of logs change dramatically in different periods; Second, we present a online log analysis algorithm Apriori-SO; third, based on the online event correlations mining, we present an online event prediction method that can predict diversities of failure events with the great detail. The experiment result of a 260-nodes production Hadoop cluster system shows that our online log analysis algorithm can analyze the log streams to obtain event correlation rules in soft real time, and our online event prediction method can achieve higher precision rate and recall rate than the offline log analysis approach.
Type de document :
Communication dans un congrès
Chen Ding; Zhiyuan Shao; Ran Zheng. IFIP International Conference on Network and Parallel Computing (NPC), Sep 2010, Zhengzhou, China. Springer, Lecture Notes in Computer Science, LNCS-6289, pp.262-276, 2010, Network and Parallel Computing. 〈10.1007/978-3-642-15672-4_23〉
Liste complète des métadonnées

Littérature citée [22 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01054978
Contributeur : Hal Ifip <>
Soumis le : lundi 11 août 2014 - 09:11:33
Dernière modification le : vendredi 11 août 2017 - 17:44:28
Document(s) archivé(s) le : mercredi 26 novembre 2014 - 21:37:03

Fichier

Online_Event_Correlations_Anal...
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

Citation

Wei Zhou, Jianfeng Zhan, Dan Meng, Zhihong Zhang. Online Event Correlations Analysis in System Logs of Large-Scale Cluster Systems. Chen Ding; Zhiyuan Shao; Ran Zheng. IFIP International Conference on Network and Parallel Computing (NPC), Sep 2010, Zhengzhou, China. Springer, Lecture Notes in Computer Science, LNCS-6289, pp.262-276, 2010, Network and Parallel Computing. 〈10.1007/978-3-642-15672-4_23〉. 〈hal-01054978〉

Partager

Métriques

Consultations de la notice

128

Téléchargements de fichiers

811