Charm++ on NUMA Platforms: the impact of SMP Optimizations and a NUMA-aware Load Balancing

Laércio L. Pilla; Christiane Vilaca Pousa Ribeiro; Daniel Cordeiro; Jean-François Mehaut

Communication Dans Un Congrès Année : 2010

Charm++ on NUMA Platforms: the impact of SMP Optimizations and a NUMA-aware Load Balancing

(1) , (2) , (3, 4) , (2)

1
2
3
4

Laércio L. Pilla

Fonction : Auteur

Instituto de Informática da UFRGS

Christiane Vilaca Pousa Ribeiro

Fonction : Auteur

Middleware efficiently scalable

Daniel Cordeiro

Fonction : Auteur
PersonId : 8493
IdHAL : dcordeiro
ORCID : 0000-0003-4971-7355
IdRef : 166410160

PrograMming and scheduling design fOr Applications in Interactive Simulation

Laboratoire d'Informatique de Grenoble

Jean-François Mehaut

Fonction : Auteur
PersonId : 6046
IdHAL : jean-francois-mehaut
ORCID : 0000-0003-1047-7462
IdRef : 086451227

Middleware efficiently scalable

Résumé

Cache-coherent Non-Uniform Memory Access (ccNUMA) platforms based on multi-core chips are now a common resource in High Performance Computing. To overcome scalability issues in such platforms, the shared memory is physically distributed among several memory banks. Its memory access costs may vary depending on the distance between processing units and data. The main challenge of a ccNUMA platform is to manage efficiently threads, data distribution and communication over all the machine nodes. Charm++ is a parallel programming system that provides a portable programming model for platforms based on shared and distributed memory. In this work, we revisit some of the implementation decisions currently featured on Charm++ on the context of ccNUMA platforms. First, we studied the impact of the new ― shared-memory based ― inter-object communication scheme utilized by Charm++. We show how this shared-memory approach can impact the performance of Charm++ on ccNUMA machines. Second, we conduct a performance evaluation of the CPU and memory affinity mechanisms provided by Charm++ on ccNUMA platforms. Results show that SMP optimizations and affinity support can improve the overall performance of our benchmarks in up to 75%. Finally, in light of these studies, we have designed and implemented a NUMA-aware load balancing algorithm that address the issues found. The performance evaluation of our prototype showed results as good as the ones obtained by GreedyLB and significant improvements when compared to GreedyCommLB.

Domaines

Calcul parallèle, distribué et partagé [cs.DC]

Arnaud Legrand : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00788893

Soumis le : vendredi 15 février 2013-13:12:32

Dernière modification le : jeudi 4 avril 2024-21:02:28

Dates et versions

hal-00788893 , version 1 (15-02-2013)

Identifiants

HAL Id : hal-00788893 , version 1

Citer

Laércio L. Pilla, Christiane Vilaca Pousa Ribeiro, Daniel Cordeiro, Jean-François Mehaut. Charm++ on NUMA Platforms: the impact of SMP Optimizations and a NUMA-aware Load Balancing. The fourth workshop of the INRIA-Illinois Joint Laboratory on Petascale Computing, 2010, Urbana, United States. ⟨hal-00788893⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA LIG LIG_SRCPR LIG_SRCPR_MOAIS INRIA2 POLYTECH-GRENOBLE LIG_SIDCH

130 Consultations

0 Téléchargements

Charm++ on NUMA Platforms: the impact of SMP Optimizations and a NUMA-aware Load Balancing

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager