Skip to Main content Skip to Navigation
New interface
Conference papers

MicroRAS: Automatic Recovery in the Absence of Historical Failure Data for Microservice Systems

Abstract : Microservices represent a popular paradigm to construct large-scale applications in many domains thanks to benefits such as scalability, flexibility, and agility. However, it is difficult to manage and operate a microservice system due to its high dynamics and complexity. In particular, the frequent updates of microservices lead to the absence of historical failure data, where the current automatic recovery methods fail short. In this paper, we propose an automatic recovery method named MicroRAS, which requires no historical failure data, to mitigate performance issues in microservice systems. MicroRAS is a model-driven method that selects the appropriate recovery action with a trade-off between the effectiveness and recovery time of actions. It estimates the effectiveness of an action in terms of its effects of recovering the pinpointed faulty service and its effects of interfering with other services. The estimation of action effects is based on a system-state model represented by an attributed graph that tracks the propagation of effects. For the experimental evaluation, several types of anomalies are injected into a microservice system based on Kubernetes, which also serves a real-world workload. The corresponding benchmarks show that the actions selected by MicroRAS can recover the faulty services by 94.7%, and reduce the interference to other services by at least 44.3% compared to baseline methods.
Complete list of metadata

Cited literature [47 references]  Display  Hide  Download
Contributor : Guillaume Pierre Connect in order to contact the contributor
Submitted on : Friday, October 16, 2020 - 8:32:39 AM
Last modification on : Monday, December 28, 2020 - 10:22:04 AM


Files produced by the author(s)


  • HAL Id : hal-02968710, version 1



Li Wu, Johan Tordsson, Alexander Acker, Odej Kao. MicroRAS: Automatic Recovery in the Absence of Historical Failure Data for Microservice Systems. UCC 2020 - 13th IEEE/ACM International Conference on Utility and Cloud Computing, Dec 2020, Leicester, United Kingdom. ⟨hal-02968710⟩



Record views


Files downloads