Skip to Main content Skip to Navigation

A Path to Complexity-Effective Wide-Issue Superscalar Processors

André Seznec 1
1 CAPS - Compilation, parallel architectures and system
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : The advance of integration allows implementation of very wide issue superscalar processors on a single chip. Aggressive speculative execution as well as simultaneous multithreading can exploit such wide issue superscalar processors. Unfortunately, with the increase of issue width, processor designers are facing new difficulties to enable high clock frequency and to master silicon area and power consumption. Due to performance issues, when doubling the issue width from 4 to 8 instructions per cycle on a superscalar processor, one has also to double the number of physical registers. Combined with the doubling of the number of register ports, this leads to an eight fold increase of the silicon area devoted to the register file on conventional monolithic register file architecture while the silicon area devoted to functional units only doubles. At the same time, the peak power consumption of the register file also raises quasi-quadratically with the issue width. Moreover, read operations on the register file have to be deeply pipelined. Wake-up logic as well the bypass network in the processor are also becoming limiting factors when the issue width increases. In this paper, we present three mechanisms to reduce the number of read and write ports on every individual physical register in a wide-issue clustered superscalar processor, respectively limited read port arbitration, register write specialization and register read specialization. Then we show that combining register write specialization and register read specializa- tion, one can build a 8-way 4-cluster superscalar processor where each individual physical register is implemented as four identical (2-read, 2-write) registers instead of a single copy (16-read, 8-write) register in conventional designs.This dramatically reduces the silicon area, the peak power consumption and the access time of the register file. As a side effect, the complexities of the bypass network and of the wake-up logic are also significantly reduced. In particular, fast-forwarding is simplified on a 8-way 4-cluster processor. Limited read port arbitration can be used to further reduce the complexity of the register file. Such a complexity reduction can not come for free, but only costs some degrees of freedom on the policy for allocating instructions to clusters and some extra complexity in the register renaming process.
Document type :
Complete list of metadata

Cited literature [1 references]  Display  Hide  Download
Contributor : Rapport de Recherche Inria <>
Submitted on : Tuesday, May 23, 2006 - 8:28:49 PM
Last modification on : Thursday, January 7, 2021 - 4:28:53 PM
Long-term archiving on: : Sunday, April 4, 2010 - 11:04:46 PM


  • HAL Id : inria-00072345, version 1


André Seznec. A Path to Complexity-Effective Wide-Issue Superscalar Processors. [Research Report] RR-4242, INRIA. 2001. ⟨inria-00072345⟩



Record views


Files downloads