Co-scheduling HPC workloads on cache-partitioned CMP platforms - Archive ouverte HAL Access content directly
Reports (Research Report) Year : 2018

Co-scheduling HPC workloads on cache-partitioned CMP platforms

(1, 2) , (3, 4, 5) , (1, 2) , (3, 5) , (5, 3, 6)
1
2
3
4
5
6

Abstract

With the recent advent of many-core architectures such as chip multiprocessors (CMP), the number of processing units accessing a global shared memory is constantly increasing. Co-scheduling techniques are used to improve application throughput on such architectures, but sharing resources often generates critical interferences. In this paper, we focus on the interferences in the last level of cache (LLC) and use the Cache Allocation Technology (CAT) recently provided by Intel to partition the LLC and give eachco-scheduled application their own cache area. We consider m iterative HPC applications running concurrently and answer to the following questions: (i) how to precisely model the behavior of these applications on the cache partitioned platform? and (ii) how many cores and cache fractions should be assigned to each application to maximize the platform efficiency? Here, platform efficiency is defined as maximizing the performance either globally, or as guaranteeing a fixed ratio of iterations per second for each application. Through extensive experiments using CAT, we demonstrate the impact of cache partitioning when multiple HPC application are co-scheduled onto CMP platforms.
Ce rapport étudie les techniques de partitionnement de cachepour le co-ordonnancement d’applications scientifiques sur plates-formes multi-coeurs. Nous nous focalisons sur les interférences dans le cache de dernier niveau et utilisons la technologie CAT (Cache Allocation Technology) récemment proposée par Intel pour partitionner le LLC et allouer à chaque application sa propre zone de cache. Nous considérons m applications itératives qui s’exécutent simultanément et répondons aux questions suivantes: (i) comment modéliser de façon précise le comportement de ces applications; (ii) combien de coeurs et quelle fraction de cache allouer à chaque application? Notre objectif est de maximiser la performance quand on impose un ratio relatif d’itérations par application, ce qui revient à maximiser le plus petit débit applicatif (pondéré par ces ratios). Ensuite, via un jeu complet d’expérimentations avec CAT, nous montrons l’impact des techniques de parititonnement de cache dans ce contexte, et quantifions le gain qu’on peut en attendre
Fichier principal
Vignette du fichier
RR-9154.pdf (1.16 Mo) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-01719728 , version 1 (28-02-2018)

Identifiers

  • HAL Id : hal-01719728 , version 1

Cite

Guillaume Aupy, Anne Benoit, Brice Goglin, Loïc Pottier, Yves Robert. Co-scheduling HPC workloads on cache-partitioned CMP platforms. [Research Report] RR-9154, Inria. 2018. ⟨hal-01719728⟩
199 View
142 Download

Share

Gmail Facebook Twitter LinkedIn More