Improving load/store queues usage in scientific computing - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2004

Improving load/store queues usage in scientific computing

Résumé

Memory disambiguation mechanisms, coupled with load/store queues in out-of-order processors, are crucial to increase instruction level parallelism (ILP), especially for memory-bound scientific codes. Designing ideal memory disambiguation mechanisms is too complex because it would require precise address bits comparators; thus, modern microprocessors implement simplified and imprecise ones that perform only partial address comparisons. In this paper, we study the impact of such simplifications on the sustained performance of some real processors such that Alpha 21264, Power 4 and Itanium 2. Despite all the advanced features of these processors, we demonstrate in this article that memory address disambiguation mechanisms can cause significant performance loss. We demonstrate that, even if data are located in low cache levels and enough ILP exist, the performance degradation can be up to 21 times slower if no care is taken on the order of accessing independent memory addresses. Instead of proposing a hardware solution to improve load/store queues, as done in [G. Chrysos et al., (1998), S. Sethumadhavan et al., (2003), I. Park et al., (2003), A. Yoaz et al., (1999), S. Onder (2002)], we show that a software (compilation) technique is possible. Such solution is based on the classical (and robust) Id/st vectorization. Our experiments highlight the effectiveness of such method on BLAS 1 codes that are representative of vector scientific loops.
Fichier principal
Vignette du fichier
Improving_Load.pdf (868.6 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

inria-00637256 , version 1 (31-10-2011)

Identifiants

Citer

Christophe Lemuet, William Jalby, Sid Touati. Improving load/store queues usage in scientific computing. International Conference on Parallel Processing (ICPP 2004), Aug 2004, Montréal, Canada. pp.38-45, ⟨10.1109/ICPP.2004.1327902⟩. ⟨inria-00637256⟩

Collections

CNRS UVSQ
101 Consultations
138 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More