Co-scheduling HPC workloads on cache-partitioned CMP platforms

With the recent advent of many-core architectures such as chip multiprocessors (CMP), the number of processing units accessing a global shared memory is constantly increasing. Co-scheduling techniques are used to improve application throughput on such architectures, but sharing resources often generates critical interferences. In this paper, we focus on the interferences in the last level of cache (LLC) and use the Cache Allocation Technology (CAT) recently provided by Intel to partition the LLC and give eachco-scheduled application their own cache area. We consider m iterative HPC applications running concurrently and answer to the following questions: (i) how to precisely model the behavior of these applications on the cache partitioned platform? and (ii) how many cores and cache fractions should be assigned to each application to maximize the platform efficiency? Here, platform efficiency is defined as maximizing the performance either globally, or as guaranteeing a fixed ratio of iterations per second for each application. Through extensive experiments using CAT, we demonstrate the impact of cache partitioning when multiple HPC application are co-scheduled onto CMP platforms.

Ce rapport étudie les techniques de partitionnement de cachepour le co-ordonnancement d’applications scientifiques sur plates-formes multi-coeurs. Nous nous focalisons sur les interférences dans le cache de dernier niveau et utilisons la technologie CAT (Cache Allocation Technology) récemment proposée par Intel pour partitionner le LLC et allouer à chaque application sa propre zone de cache. Nous considérons m applications itératives qui s’exécutent simultanément et répondons aux questions suivantes: (i) comment modéliser de façon précise le comportement de ces applications; (ii) combien de coeurs et quelle fraction de cache allouer à chaque application? Notre objectif est de maximiser la performance quand on impose un ratio relatif d’itérations par application, ce qui revient à maximiser le plus petit débit applicatif (pondéré par ces ratios). Ensuite, via un jeu complet d’expérimentations avec CAT, nous montrons l’impact des techniques de parititonnement de cache dans ce contexte, et quantifions le gain qu’on peut en attendre

Mots clés

HPC application chip multiprocessor (CMP) cache-partitioning

partitionnement de cache application scientifique

Domaines

Informatique [cs]

Fichier principal

RR-9154.pdf (1.16 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Equipe Roma : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01719728

Soumis le : mercredi 28 février 2018-14:01:36

Dernière modification le : jeudi 11 mai 2023-11:56:10

Archivage à long terme le : lundi 28 mai 2018-14:36:29

Dates et versions

hal-01719728 , version 1 (28-02-2018)

Identifiants

HAL Id : hal-01719728 , version 1

Citer

Guillaume Aupy, Anne Benoit, Brice Goglin, Loïc Pottier, Yves Robert. Co-scheduling HPC workloads on cache-partitioned CMP platforms. [Research Report] RR-9154, Inria. 2018. ⟨hal-01719728⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-LYON CNRS INRIA UNIV-LYON1 INRIA-RRRT INRIA2 LARA UDL

208 Consultations

175 Téléchargements