Skip to Main content Skip to Navigation
Conference papers

Online Event Correlations Analysis in System Logs of Large-Scale Cluster Systems

Abstract : It has been long recognized that failure events are correlated, not independent. Previous research efforts have shown the correlation analysis of system logs is helpful to resource allocation, job scheduling and proactive management. However, previous log analysis methods analyze the history logs offline. They fail to capture the dynamic change of system errors and failures. In this paper, we purpose an online log analysis approach to mine event correlations in system logs of large-scale cluster systems. Our contributions are three-fold: first, we analyze the event correlations of system logs of a 260-nodes production Hadoop cluster system, and the result shows that the correlation rules of logs change dramatically in different periods; Second, we present a online log analysis algorithm Apriori-SO; third, based on the online event correlations mining, we present an online event prediction method that can predict diversities of failure events with the great detail. The experiment result of a 260-nodes production Hadoop cluster system shows that our online log analysis algorithm can analyze the log streams to obtain event correlation rules in soft real time, and our online event prediction method can achieve higher precision rate and recall rate than the offline log analysis approach.
Document type :
Conference papers
Complete list of metadata

Cited literature [22 references]  Display  Hide  Download

https://hal.inria.fr/hal-01054978
Contributor : Hal Ifip <>
Submitted on : Monday, August 11, 2014 - 9:11:33 AM
Last modification on : Friday, August 11, 2017 - 5:44:28 PM
Long-term archiving on: : Wednesday, November 26, 2014 - 9:37:03 PM

File

Online_Event_Correlations_Anal...
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Citation

Wei Zhou, Jianfeng Zhan, Dan Meng, Zhihong Zhang. Online Event Correlations Analysis in System Logs of Large-Scale Cluster Systems. IFIP International Conference on Network and Parallel Computing (NPC), Sep 2010, Zhengzhou, China. pp.262-276, ⟨10.1007/978-3-642-15672-4_23⟩. ⟨hal-01054978⟩

Share

Metrics

Record views

218

Files downloads

1332