Adaptive Fault Tolerance in Real Time Cloud Computing

Sheheryar Malik 1, * Fabrice Huet 1
* Auteur correspondant
1 OASIS - Active objects, semantics, Internet and security
CRISAM - Inria Sophia Antipolis - Méditerranée , COMRED - COMmunications, Réseaux, systèmes Embarqués et Distribués
Abstract : With the increasing demand and benefits of cloud computing infrastructure, real time computing can be performed on cloud infrastructure. A real time system can take advantage of intensive computing capabilities and scalable virtualized environment of cloud computing to execute real time tasks. In most of the real time cloud applications, processing is done on remote cloud computing nodes. So there are more chances of errors, due to the undetermined latency and loose control over computing node. On the other side, most of the real time systems are also safety critical and should be highly reliable. So there is an increased requirement for fault tolerance to achieve reliability for the real time computing on cloud infrastructure. In this paper, a fault tolerance model for real time cloud computing is proposed. In the proposed model, the system tolerates the faults and makes the decision on the basis of reliability of the processing nodes, i.e. virtual machines. The reliability of the virtual machines is adaptive, which changes after every computing cycle. If a virtual machine manages to produce a correct result within the time limit, its reliability increases. And if it fails to produce the result within time or correct result, its reliability decreases. A metric model is given for the reliability assessment. In the model, decrease in reliability is more than increase. If the node continues to fail, it is removed, and a new node is added. There is also a minimum reliability level. If any processing node does not achieve that level, the systems will perform backward recovery or safety measures. The proposed technique is based on the execution of design diverse variants on multiple virtual machines, and assigning reliability to the results produced by variants. The virtual machine instances can be of same type or of different types. The system provides both the forward and backward recovery mechanism, but main focus is on forward recovery. The main essence of the proposed technique is the adaptive behavior of the reliability weights assigned to each processing node and adding and removing of nodes on the basis of reliability.
Type de document :
Communication dans un congrès
2011 IEEE World Congress on Services, Jul 2011, Washington DC, United States. IEEE, pp.280-287, 2011, 2011 IEEE World Congress on Services (SERVICES 2011). <10.1109/SERVICES.2011.108>
Liste complète des métadonnées


https://hal.inria.fr/hal-00639904
Contributeur : Sheheryar Malik <>
Soumis le : jeudi 10 novembre 2011 - 10:41:57
Dernière modification le : jeudi 19 janvier 2012 - 16:27:45
Document(s) archivé(s) le : samedi 11 février 2012 - 02:22:29

Fichier

Adaptive_Fault_Tolerance_in_Re...
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

Collections

Citation

Sheheryar Malik, Fabrice Huet. Adaptive Fault Tolerance in Real Time Cloud Computing. 2011 IEEE World Congress on Services, Jul 2011, Washington DC, United States. IEEE, pp.280-287, 2011, 2011 IEEE World Congress on Services (SERVICES 2011). <10.1109/SERVICES.2011.108>. <hal-00639904>

Partager

Métriques

Consultations de
la notice

462

Téléchargements du document

1724