dFault: Fault Localization in Large-Scale Peer-to-Peer Systems

Abstract : Distributed hash tables (DHTs) have been adopted as a building block for large-scale distributed systems. The upshot of this success is that their robust operation is even more important as mission-critical applications begin to be layered on them. Even though DHTs can detect and heal around unresponsive hosts and disconnected links, several hidden faults and performance bottlenecks go undetected, resulting in unanswered queries and delayed responses. In this paper, we propose dFault, a system that helps large-scale DHTs to localize such faults. Informed with a log of failed queries called symptoms and some available information about the hosts in the DHT, dFault identifies the potential root causes (hosts and overlay links) that with high likelihood contributed towards those symptoms. Its design is based on the recently proposed dependency graph modeling and inference approach for fault localization. We describe the design of dFault, and show that it can accurately localize the root causes of faults with modest amount of information collected from individual nodes using a real prototype deployed over PlanetLab.
Type de document :
Communication dans un congrès
Indranil Gupta; Cecilia Mascolo. ACM/IFIP/USENIX 11th International Middleware Conference (MIDDLEWARE), Nov 2010, Bangalore, India. Springer, Lecture Notes in Computer Science, LNCS-6452, pp.252-272, 2010, Middleware 2010. 〈10.1007/978-3-642-16955-7_13〉
Liste complète des métadonnées

Littérature citée [24 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01055280
Contributeur : Hal Ifip <>
Soumis le : mardi 12 août 2014 - 11:09:09
Dernière modification le : mercredi 16 août 2017 - 17:20:53
Document(s) archivé(s) le : mercredi 26 novembre 2014 - 22:45:58

Fichier

paper.pdf
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

Citation

Pawan Prakash, Ramana Rao Kompella, Venugopalan Ramasubramanian, Ranveer Chandra. dFault: Fault Localization in Large-Scale Peer-to-Peer Systems. Indranil Gupta; Cecilia Mascolo. ACM/IFIP/USENIX 11th International Middleware Conference (MIDDLEWARE), Nov 2010, Bangalore, India. Springer, Lecture Notes in Computer Science, LNCS-6452, pp.252-272, 2010, Middleware 2010. 〈10.1007/978-3-642-16955-7_13〉. 〈hal-01055280〉

Partager

Métriques

Consultations de la notice

103

Téléchargements de fichiers

101