Adaptive Fault Tolerance in Real Time Cloud Computing

Sheheryar Malik; Fabrice Huet

doi:10.1109/SERVICES.2011.108

Communication Dans Un Congrès Année : 2011

Adaptive Fault Tolerance in Real Time Cloud Computing

(1) , (1)

Sheheryar Malik

Fonction : Auteur correspondant
PersonId : 913601

Connectez-vous pour contacter l'auteur

Active objects, semantics, Internet and security

Fabrice Huet

Fonction : Auteur
PersonId : 1352
IdHAL : fabrice-huet
IdRef : 076390829

Active objects, semantics, Internet and security

Résumé

With the increasing demand and benefits of cloud computing infrastructure, real time computing can be performed on cloud infrastructure. A real time system can take advantage of intensive computing capabilities and scalable virtualized environment of cloud computing to execute real time tasks. In most of the real time cloud applications, processing is done on remote cloud computing nodes. So there are more chances of errors, due to the undetermined latency and loose control over computing node. On the other side, most of the real time systems are also safety critical and should be highly reliable. So there is an increased requirement for fault tolerance to achieve reliability for the real time computing on cloud infrastructure. In this paper, a fault tolerance model for real time cloud computing is proposed. In the proposed model, the system tolerates the faults and makes the decision on the basis of reliability of the processing nodes, i.e. virtual machines. The reliability of the virtual machines is adaptive, which changes after every computing cycle. If a virtual machine manages to produce a correct result within the time limit, its reliability increases. And if it fails to produce the result within time or correct result, its reliability decreases. A metric model is given for the reliability assessment. In the model, decrease in reliability is more than increase. If the node continues to fail, it is removed, and a new node is added. There is also a minimum reliability level. If any processing node does not achieve that level, the systems will perform backward recovery or safety measures. The proposed technique is based on the execution of design diverse variants on multiple virtual machines, and assigning reliability to the results produced by variants. The virtual machine instances can be of same type or of different types. The system provides both the forward and backward recovery mechanism, but main focus is on forward recovery. The main essence of the proposed technique is the adaptive behavior of the reliability weights assigned to each processing node and adding and removing of nodes on the basis of reliability.

Mots clés

fault tolerance reliability cloud computing

Domaines

Performance et fiabilité [cs.PF] Calcul parallèle, distribué et partagé [cs.DC]

Fichier principal

Adaptive_Fault_Tolerance_in_Real_Time_Cloud_Computing.pdf (662.13 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Sheheryar Malik : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00639904

Soumis le : jeudi 10 novembre 2011-10:41:57

Dernière modification le : lundi 26 février 2024-11:22:07

Archivage à long terme le : samedi 11 février 2012-02:22:29

Dates et versions

hal-00639904 , version 1 (10-11-2011)

Identifiants

HAL Id : hal-00639904 , version 1
DOI : 10.1109/SERVICES.2011.108

Citer

Sheheryar Malik, Fabrice Huet. Adaptive Fault Tolerance in Real Time Cloud Computing. 2011 IEEE World Congress on Services, Jul 2011, Washington DC, United States. pp.280-287, ⟨10.1109/SERVICES.2011.108⟩. ⟨hal-00639904⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA I3S INRIA2 UNIV-COTEDAZUR

367 Consultations

2056 Téléchargements

Adaptive Fault Tolerance in Real Time Cloud Computing

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager