Abstract : To keep up with a large degree of ILP, Itanium2 L2 cache system uses a complex organization scheme: load/store queues, banking and interleaving. In this paper, we study the impact of this cache system on memory instruction scheduling. We demonstrate that for scientific codes, "memory access vectorization" allows to generate very efficient code (up to the maximum of 4 loads per cycle). The impact of such "vectorization" on register pressure is analyzed: various register allocation schemes are proposed and evaluated.
https://hal.inria.fr/hal-00647124 Contributor : Sid TouatiConnect in order to contact the contributor Submitted on : Thursday, December 1, 2011 - 3:14:44 PM Last modification on : Wednesday, October 20, 2021 - 12:24:14 AM Long-term archiving on: : Friday, March 2, 2012 - 2:30:28 AM
William Jalby, Christophe Lemuet, Sid Touati. Efficient Code Optimization Technique for Itanium2 Cache System and Scientific Computing. Workshop on Compilers for Parallel Computers, Jan 2003, Amsterdam, Netherlands. ⟨hal-00647124⟩