Charm++ on NUMA Platforms: the impact of SMP Optimizations and a NUMA-aware Load Balancing

Abstract : Cache-coherent Non-Uniform Memory Access (ccNUMA) platforms based on multi-core chips are now a common resource in High Performance Computing. To overcome scalability issues in such platforms, the shared memory is physically distributed among several memory banks. Its memory access costs may vary depending on the distance between processing units and data. The main challenge of a ccNUMA platform is to manage efficiently threads, data distribution and communication over all the machine nodes. Charm++ is a parallel programming system that provides a portable programming model for platforms based on shared and distributed memory. In this work, we revisit some of the implementation decisions currently featured on Charm++ on the context of ccNUMA platforms. First, we studied the impact of the new ― shared-memory based ― inter-object communication scheme utilized by Charm++. We show how this shared-memory approach can impact the performance of Charm++ on ccNUMA machines. Second, we conduct a performance evaluation of the CPU and memory affinity mechanisms provided by Charm++ on ccNUMA platforms. Results show that SMP optimizations and affinity support can improve the overall performance of our benchmarks in up to 75%. Finally, in light of these studies, we have designed and implemented a NUMA-aware load balancing algorithm that address the issues found. The performance evaluation of our prototype showed results as good as the ones obtained by GreedyLB and significant improvements when compared to GreedyCommLB.
Complete list of metadatas

https://hal.inria.fr/hal-00788893
Contributor : Arnaud Legrand <>
Submitted on : Friday, February 15, 2013 - 1:12:32 PM
Last modification on : Monday, July 8, 2019 - 3:08:13 PM

Identifiers

  • HAL Id : hal-00788893, version 1

Citation

Laércio L. Pilla, Christiane Vilaca Pousa Ribeiro, Daniel Cordeiro, Jean-François Mehaut. Charm++ on NUMA Platforms: the impact of SMP Optimizations and a NUMA-aware Load Balancing. The fourth workshop of the INRIA-Illinois Joint Laboratory on Petascale Computing, 2010, Urbana, United States. ⟨hal-00788893⟩

Share

Metrics

Record views

299