High-Performance RMA-Based Broadcast on the Intel SCC

Abstract : Many-core chips with more than 1000 cores are expected by the end of the decade. To overcome scalability issues related to cache coherence at such a scale, one of the main research directions is to leverage the message-passing programming model. The Intel Single-Chip Cloud Computer (SCC) is a prototype of a message-passing many-core chip. It offers the ability to move data between on-chip Message Passing Buffers (MPB) using Remote Memory Access (RMA). Performance of message-passing applications is directly affected by efficiency of collective operations, such as broadcast. In this paper, we study how to make use of the MPBs to implement an efficient broadcast algorithm for the SCC. We propose OC-Bcast (On-Chip Broadcast), a pipelined k-ary tree algorithm tailored to exploit the parallelism provided by on-chip RMA. Using a LogP-based model, we present an analytical evaluation that compares our algorithm to the state-of-the-art broadcast algorithms implemented for the SCC. As predicted by the model, experimental results show that OC-Bcast attains almost three times better through-put, and improves latency by at least 27%. Furthermore, the analytical evaluation highlights the benefits of our approach: OC-Bcast takes direct advantage of RMA, unlike the other considered broadcast algorithms, which are based on a higher-level send/receive interface. This leads us to the conclusion that RMA-based collective operations are needed to take full advantage of hardware features of future message-passing many-core architectures.
Type de document :
Communication dans un congrès
24th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA2012), 2012, Pittsburgh, United States. 〈10.1145/2312005.2312029〉
Liste complète des métadonnées

Contributeur : Thomas Ropars <>
Soumis le : lundi 2 mars 2015 - 21:42:32
Dernière modification le : lundi 2 octobre 2017 - 16:06:04
Document(s) archivé(s) le : mardi 2 juin 2015 - 09:56:01


Fichiers produits par l'(les) auteur(s)



Darko Petrović, Omid Shahmirzadi, Thomas Ropars, André Schiper. High-Performance RMA-Based Broadcast on the Intel SCC. 24th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA2012), 2012, Pittsburgh, United States. 〈10.1145/2312005.2312029〉. 〈hal-01121943〉



Consultations de la notice


Téléchargements de fichiers