Skip to Main content Skip to Navigation
Conference papers

Adaptive Fault Tolerance in Real Time Cloud Computing

Sheheryar Malik 1, * Fabrice Huet 1 
* Corresponding author
1 OASIS - Active objects, semantics, Internet and security
CRISAM - Inria Sophia Antipolis - Méditerranée , Laboratoire I3S - COMRED - COMmunications, Réseaux, systèmes Embarqués et Distribués
Abstract : With the increasing demand and benefits of cloud computing infrastructure, real time computing can be performed on cloud infrastructure. A real time system can take advantage of intensive computing capabilities and scalable virtualized environment of cloud computing to execute real time tasks. In most of the real time cloud applications, processing is done on remote cloud computing nodes. So there are more chances of errors, due to the undetermined latency and loose control over computing node. On the other side, most of the real time systems are also safety critical and should be highly reliable. So there is an increased requirement for fault tolerance to achieve reliability for the real time computing on cloud infrastructure. In this paper, a fault tolerance model for real time cloud computing is proposed. In the proposed model, the system tolerates the faults and makes the decision on the basis of reliability of the processing nodes, i.e. virtual machines. The reliability of the virtual machines is adaptive, which changes after every computing cycle. If a virtual machine manages to produce a correct result within the time limit, its reliability increases. And if it fails to produce the result within time or correct result, its reliability decreases. A metric model is given for the reliability assessment. In the model, decrease in reliability is more than increase. If the node continues to fail, it is removed, and a new node is added. There is also a minimum reliability level. If any processing node does not achieve that level, the systems will perform backward recovery or safety measures. The proposed technique is based on the execution of design diverse variants on multiple virtual machines, and assigning reliability to the results produced by variants. The virtual machine instances can be of same type or of different types. The system provides both the forward and backward recovery mechanism, but main focus is on forward recovery. The main essence of the proposed technique is the adaptive behavior of the reliability weights assigned to each processing node and adding and removing of nodes on the basis of reliability.
Complete list of metadata

Cited literature [14 references]  Display  Hide  Download
Contributor : Sheheryar Malik Connect in order to contact the contributor
Submitted on : Thursday, November 10, 2011 - 10:41:57 AM
Last modification on : Saturday, June 25, 2022 - 11:06:53 PM
Long-term archiving on: : Saturday, February 11, 2012 - 2:22:29 AM


Publisher files allowed on an open archive




Sheheryar Malik, Fabrice Huet. Adaptive Fault Tolerance in Real Time Cloud Computing. 2011 IEEE World Congress on Services, Jul 2011, Washington DC, United States. pp.280-287, ⟨10.1109/SERVICES.2011.108⟩. ⟨hal-00639904⟩



Record views


Files downloads