Kernel Assisted Collective Intra-node Communication Among Multicore and Manycore CPUs

Teng Ma; George Bosilca; Aurélien Bouteiller; Brice Goglin; Jeffrey M. Squyres; Jack J. Dongarra

Rapport (Rapport De Recherche) Année : 2010

Kernel Assisted Collective Intra-node Communication Among Multicore and Manycore CPUs

(1) , (1) , (1) , (2, 3) , (4) , (1)

1
2
3
4

Teng Ma

Fonction : Auteur

Innovative Computing Laboratory [Knoxville]

George Bosilca

Fonction : Auteur

Innovative Computing Laboratory [Knoxville]

Aurélien Bouteiller

Fonction : Auteur

Innovative Computing Laboratory [Knoxville]

Brice Goglin

Fonction : Auteur
PersonId : 1244
IdHAL : brice-goglin
ORCID : 0000-0002-8671-4615
IdRef : 104493887

Laboratoire Bordelais de Recherche en Informatique

Efficient runtime systems for parallel architectures

Jeffrey M. Squyres

Fonction : Auteur

Cisco Systems

Jack J. Dongarra

Fonction : Auteur

Innovative Computing Laboratory [Knoxville]

Résumé

More memory hierarchies, NUMA architectures and network-style interconnection are widely used in modern many-core CPU design to achieve performance scalability. As a leading intra-node programming model, Message Passing Interface (MPI) implementations must exploit these architectures to provide reliable performance portability. These new architectures not only require specialized MPI point-to-point messaging protocols, they also require carefully designed and tuned algorithms for MPI collective operations. Multiple issues must be taken into account: 1) minimizing the number of copies required, 2) minimizing traffic to ''remote'' NUMA memory, and 3) carefully avoiding memory bottlenecks for ''rooted'' collective operations. In this paper, we present a kernel assisted intra-node collective module addressing those three issues on many-core systems. A kernel level inter-process memory copy module, called KNEM, is used by a novel Open MPI collective module to implement several improved strategies based on decreasing the number of intermediate memory copies and improving locality to reduce both the pressure on the memory banks and the cache pollution. The collective topology is mapped onto the NUMA topology to minimize cross traffic on inter-socket links. Experiments illustrate that the KNEM enabled Open MPI collective module can achieve up to a threefold speedup on synthetic benchmarks, resulting in a 12% improvement for a parallel graph shortest path discovery application.

Mots clés

MPI Multicore Shared memory NUMA Kernel Collective communication

Domaines

Système d'exploitation [cs.OS]

Fichier principal

kernel-assisted-comm-multicore.pdf (304.17 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Brice Goglin : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00544872

Soumis le : jeudi 9 décembre 2010-10:40:00

Dernière modification le : vendredi 24 mars 2023-14:52:53

Archivage à long terme le : lundi 5 novembre 2012-12:55:23

Dates et versions

inria-00544872 , version 1 (09-12-2010)

Identifiants

HAL Id : inria-00544872 , version 1

Citer

Teng Ma, George Bosilca, Aurélien Bouteiller, Brice Goglin, Jeffrey M. Squyres, et al.. Kernel Assisted Collective Intra-node Communication Among Multicore and Manycore CPUs. [Research Report] 2010, pp.11. ⟨inria-00544872⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA LABRI INRIA2 LARA

714 Consultations

144 Téléchargements

Kernel Assisted Collective Intra-node Communication Among Multicore and Manycore CPUs

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager