On Instruction-Level Method for Reducing Cache Penalties in Embedded VLIW Processors

Abstract : Usual cache optimisation techniques for high performance computing are difficult to apply in embedded VLIW applications. First, embedded applications are not always well structured, and few regular loop nests exist. Real world applications in embedded computing contain hot loops with pointers, indirect arrays accesses, function calls, indirect function calls, non constant stride accesses, etc. Consequently, loop transformations for reducing cache misses are impossible to apply, especially at the back-end level. Second, the strides of memory accesses do not appear to be constant at source code level, because of indirect accesses. Hence, usual prefetching techniques are not applicable. Third, embedded VLIW processors are "cheap" products, they have limited hardware dynamic mechanisms compared to high performance processors : no out-of-order execution, reduced memory hierarchies, small direct mapped caches, lower clock frequencies, etc. Consequently, the code optimisations methods must be simple and take care of code size. This article presents a back-end code optimisation for tolerating non-blocking cache effects at the instruction level (not at the loop level). Our method is based on a robust combination of memory pre-loading with data prefetching, allowing us to optimise both regular and irregular applications at the assembly level. Our experiments with mediabench and SPEC2000 benchmarks suites on the ST231 VLIW processor show a positive performance gain (compared to codes generated with -O3 compiler optimisation flag). Our method induces negligible code size growth (less than 3.9 % in the extreme case).
Type de document :
Communication dans un congrès
11th IEEE International Conference on High Performance Computing and Communications, 2009 (HPCC '09), Jun 2009, Seoul, South Korea. IEEE, pp.273 -279, 2009, 〈10.1109/HPCC.2009.32〉
Liste complète des métadonnées

Littérature citée [14 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00636852
Contributeur : Sid Touati <>
Soumis le : vendredi 28 octobre 2011 - 14:33:11
Dernière modification le : jeudi 11 janvier 2018 - 06:21:30
Document(s) archivé(s) le : lundi 30 janvier 2012 - 11:17:48

Fichier

On_Instruction-Level.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Samir Ammenouche, Sid Touati, William Jalby. On Instruction-Level Method for Reducing Cache Penalties in Embedded VLIW Processors. 11th IEEE International Conference on High Performance Computing and Communications, 2009 (HPCC '09), Jun 2009, Seoul, South Korea. IEEE, pp.273 -279, 2009, 〈10.1109/HPCC.2009.32〉. 〈inria-00636852〉

Partager

Métriques

Consultations de la notice

121

Téléchargements de fichiers

109