When Amdahl Meets Young/Daly

Aurélien Cavelan; Jiafan Li; Yves Robert; Hongyang Sun

Rapport (Rapport De Recherche) Année : 2016

When Amdahl Meets Young/Daly

Quand Amdahl rencontre Young/Daly

(1, 2) , (3) , (1, 2, 4) , (1, 2)

1
2
3
4

Aurélien Cavelan

Fonction : Auteur

École normale supérieure de Lyon

Optimisation des ressources : modèles, algorithmes et ordonnancement

Jiafan Li

Fonction : Auteur

East China Normal University [Shangaï]

Yves Robert

Fonction : Auteur
PersonId : 739318
IdHAL : yves-robert
ORCID : 0000-0003-2361-055X
IdRef : 029813611

École normale supérieure de Lyon

Optimisation des ressources : modèles, algorithmes et ordonnancement

Innovative Computing Laboratory [Knoxville]

Hongyang Sun

Fonction : Auteur
PersonId : 987755
ORCID : 0000-0002-4379-4467
IdRef : 234090510

École normale supérieure de Lyon

Optimisation des ressources : modèles, algorithmes et ordonnancement

Résumé

This paper investigates the optimal number of processors to execute a parallel job, whose speedup profile obeys Amdahl's law, on a large-scale platform subject to fail-stop and silent errors. We combine the traditional checkpointing and rollback recovery strategies with verification mechanisms to cope with both error sources. We provide an exact formula to express the execution overhead incurred by a periodic checkpointing pattern of length $T$ and with $P$ processors, and we give first-order approximations for the optimal values $T^{*}$ and $P^{*}$ as a function of the individual processor failure rate $\lambda_{\mathrm{ind}}$. A striking result is that $P^{*}$ is of the order $\lambda_{\mathrm{ind}}^{-1/4}$ when the checkpointing cost grows linearly with the number of processors, and of the order $\lambda_{\mathrm{ind}}^{-1/3}$ when the checkpointing cost stays bounded for any $P$. We conduct an extensive set of simulations to support the theoretical study. The results confirm the accuracy of the first-order approximation under a wide range of parameter settings.

Cet article étudie le nombre optimal de processeurs pour exécuter un travail parallèle dont le profil d'accélération obéit à la loi d'Amdahl, sur une plateforme à grande échelle exposée aux pannes et aux erreurs silencieuses. Nous combinons l'approche traditionnelle de checkpointing/recovery avec des mécanismes de vérification pour faire face aux deux types d'erreurs. Nous fournissons une formule exacte pour mesurer le surcoût du temps d'exécution induit par un motif de checkpoint périodique de longueur $T$ et avec $P$ processeurs, et nous donnons une approximation au premier ordre des valeurs optimales de $T^{*}$ et $P^{*}$ en fonction du taux d'erreur individuel d'un processeur $\lambda_{\mathrm{ind}}$ Un résultat frappant est que $P^{*}$ est de l'ordre de $\lambda_{\mathrm{ind}}^{-1/4}$ quand le coût de checkpoint croît linéairement avec le nombre de processeurs, et de l'ordre de $\lambda_{\mathrm{ind}}^{-1/3}$ quand le coût de checkpoint reste borné par $P$. Nous menons une large campagne de simulations pour appuyer l'étude théorique. Les résultats confirmes la précision de l'approximation au premier ordre pour une large gamme de paramètres.

Mots clés

fail-stop errors silent errors optimal checkpointing period optimal processor allocation Amdahl's law

vérification facteur d'accélération erreur silencieuse erreur fatale résilience checkpoint loi d'Amdahl schéma de calcul optimal

Domaines

Informatique [cs] Calcul parallèle, distribué et partagé [cs.DC] Modélisation et simulation

Fichier principal

RR-8871_update.pdf (1.46 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Equipe Roma : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01280004

Soumis le : samedi 14 mai 2016-20:37:24

Dernière modification le : mardi 6 février 2024-11:09:05

Archivage à long terme le : mercredi 16 novembre 2016-05:10:14

Dates et versions

hal-01280004 , version 1 (28-02-2016)

hal-01280004 , version 2 (04-03-2016)

hal-01280004 , version 3 (14-05-2016)

hal-01280004 , version 4 (06-07-2016)

Identifiants

HAL Id : hal-01280004 , version 3

Citer

Aurélien Cavelan, Jiafan Li, Yves Robert, Hongyang Sun. When Amdahl Meets Young/Daly. [Research Report] RR-8871, ENS Lyon, CNRS & INRIA. 2016. ⟨hal-01280004v3⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

201 Consultations

294 Téléchargements

When Amdahl Meets Young/Daly

Quand Amdahl rencontre Young/Daly

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager