Skip to Main content Skip to Navigation
Conference papers

Process Affinity, Metrics and Impact on Performance: an Empirical Study

Cyril Bordage 1 Emmanuel Jeannot 1
1 TADAAM - Topology-Aware System-Scale Data Management for High-Performance Computing
LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest
Abstract : Process placement, also called topology mapping, is a well-known strategy to improve parallel program execution by reducing the communication cost between processes. It requires two inputs: the topology of the target machine and a measure of the affinity between processes. In the literature, the dominant affinity measure is the communication matrix that describes the amount of communication between processes. The goal of this paper is to study the accuracy of the communication matrix as a measure of affinity. We have done an extensive set of tests with two fat-tree machines and a 3d-torus machine to evaluate several hypotheses that are often made in the literature and to discuss their validity. First, we check the correlation between algorithmic metrics and the performance of the application. Then, we check whether a good generic process placement algorithm never degrades performance. And finally, we see whether the structure of the communication matrix can be used to predict gain. I. INTRODUCTION We are currently seeing a deepening in the hierarchy of high-performance computing system. Nodes are composed of multicore processors with different levels of memory (standard DRAM, non-volatile memory, faster but smaller MCDRAM for KNL, etc.) and the network interconnecting these nodes can also be highly intricate with complex topology and high diameter. The consequence of these architectural features is that the performance of the parallel applications highly depends on the nodes allocated for the job as well as the mapping of these jobs. Process placement (also known as topology mapping) is an active field of research that deals with the development of strategies targeting the improvement of parallel applications by carefully allocating processes onto the resources [14]. The goal is to reduce the communication by mapping close to each other processes that communicate the most. The communication time depends on the algorithm implemented in the application: it depends on the quantity of data to be exchanged. Moreover, since all computing resources are not directly connected, it also depends on the distance between the running processes as well as the speed of the different links. Figure 1 shows what can be the distances (in number of hops) between cores in a fat-tree machine with 6 nodes with 24 cores each (two processors made of two NUMA nodes with 6 cores each). We see clearly blocks of same distances. Hence, it seems natural to put closer two processes that communicate a lot to reduce the communication cost. To this purpose, we need to adapt the execution of parallel applications to the target machine according to its specific topology.
Complete list of metadatas

Cited literature [20 references]  Display  Hide  Download

https://hal.inria.fr/hal-01901988
Contributor : Emmanuel Jeannot <>
Submitted on : Tuesday, October 23, 2018 - 1:49:14 PM
Last modification on : Thursday, May 16, 2019 - 6:46:02 PM
Document(s) archivé(s) le : Thursday, January 24, 2019 - 2:46:27 PM

File

article.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01901988, version 1

Collections

Citation

Cyril Bordage, Emmanuel Jeannot. Process Affinity, Metrics and Impact on Performance: an Empirical Study. 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (IEEE/ACM CCGrid), May 2018, Washington DC, United States. ⟨hal-01901988⟩

Share

Metrics

Record views

86

Files downloads

262