XFOR: Filling the Gap between Automatic Loop Optimization and Peak Performance

Imen Fassi 1, 2 Philippe Clauss 1, 3, 2
1 CAMUS - Compilation pour les Architectures MUlti-coeurS
Inria Nancy - Grand Est, ICube - Laboratoire des sciences de l'ingénieur, de l'informatique et de l'imagerie
Abstract : We propose a new loop structure named {\em xfor}, offering programmers explicit control of the interactions between statements inside a loop nest. An xfor simultaneously represents several for-loops and several statements, and maps their respective iteration domains onto each other according to two parameters, called "grain" and "offset". Grains and offsets basically "stretch" and "shift" iteration domains relative to an implicit, global referential domain. We show that such a programming structure allows to fill important optimization gaps remained by automatic loop optimizers. We highlight five important gaps filled by xfor which are: insufficient data locality optimization, excess of conditional branches in the generated code, too verbose code with too many machine instructions, data locality optimization resulting in processor stalls, and finally missed vectorization opportunities. We describe programming strategies where xfor-loops help produce efficient code and exhibit a set of benchmark programs rewritten with xfor, with significant, and sometimes dramatic, execution time speed-ups.
Type de document :
Communication dans un congrès
IEEE. 14th International Symposium on Parallel and Distributed Computing, Jun 2015, Limassol, Cyprus. 2015, 〈10.1109/ISPDC.2015.19〉
Liste complète des métadonnées

Littérature citée [12 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01155144
Contributeur : Philippe Clauss <>
Soumis le : vendredi 7 octobre 2016 - 15:24:06
Dernière modification le : mercredi 24 mai 2017 - 01:02:45
Document(s) archivé(s) le : dimanche 8 janvier 2017 - 12:15:23

Fichier

paper.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Imen Fassi, Philippe Clauss. XFOR: Filling the Gap between Automatic Loop Optimization and Peak Performance. IEEE. 14th International Symposium on Parallel and Distributed Computing, Jun 2015, Limassol, Cyprus. 2015, 〈10.1109/ISPDC.2015.19〉. 〈hal-01155144〉

Partager

Métriques

Consultations de la notice

221

Téléchargements de fichiers

78