Skip to Main content Skip to Navigation
Journal articles

Fault monitoring with sequential matrix factorization

Dawei Feng 1, 2 Cecile Germain 3 
3 TAO - Machine Learning and Optimisation
LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France, CNRS - Centre National de la Recherche Scientifique : UMR8623
Abstract : For real-world distributed systems, the knowledge component at the core of the MAPE-K loop has to be inferred, as it cannot be realistically assumed to be defined a priori. Accordingly, this paper considers fault monitoring as a latent factors discovery problem. In the context of end-to-end probing, the goal is to devise an efficient sampling policy that makes the best use of a constrained sampling budget. Previous work addresses fault monitoring in a Collaborative Prediction framework, where the information is a snapshot of the probes outcomes. Here, we take into account the fact that the system dynamically evolves at various time scales. We propose and evaluate Sequential Matrix Factor-ization (SMF) that exploits both the recent advances in matrix factoriza-tion for the instantaneous information and a new sampling heuristics based on historical information. The effectiveness of the SMF approach is exemplified on datasets of increasing difficulty and compared with state of the art history-based or snapshot-based methods. In all cases, strong adaptivity under the specific flavor of active learning is required to unleash the full potential of coupling the most confident and the most uncertain sampling heuristics, which is the cornerstone of SMF.
Complete list of metadata

Cited literature [45 references]  Display  Hide  Download
Contributor : Cecile Germain Connect in order to contact the contributor
Submitted on : Friday, October 2, 2015 - 4:54:17 PM
Last modification on : Saturday, June 25, 2022 - 10:16:51 PM
Long-term archiving on: : Sunday, January 3, 2016 - 10:11:34 AM


Files produced by the author(s)



Dawei Feng, Cecile Germain. Fault monitoring with sequential matrix factorization. ACM Transactions on Autonomous and Adaptive Systems, Association for Computing Machinery (ACM), 2015, 10 (3), pp.20:1--20:25. ⟨10.1145/2797141⟩. ⟨hal-01176013⟩



Record views


Files downloads