Revisiting Clustered Microarchitecture for Future Superscalar Cores: A Case for Wide Issue Clusters - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Article Dans Une Revue ACM Transactions on Architecture and Code Optimization Année : 2015

Revisiting Clustered Microarchitecture for Future Superscalar Cores: A Case for Wide Issue Clusters

Pierre Michaud
  • Fonction : Auteur
  • PersonId : 738135
  • IdHAL : pmichaud
Andrea Mondelli
  • Fonction : Auteur
  • PersonId : 969677
André Seznec

Résumé

During the past 10 years, the clock frequency of high-end superscalar processors has not increased. Performance keeps growing mainly by integrating more cores on the same chip and by introducing new instruction set extensions. However, this benefits only some applications and requires rewriting and/or recompiling these applications. A more general way to accelerate applications is to increase the IPC, the number of instructions executed per cycle. Although the focus of academic microarchitecture research moved away from IPC techniques, the IPC of commercial processors was continuously improved during these years. We argue that some of the benefits of technology scaling should be used to raise the IPC of future superscalar cores further. Starting from microarchitecture parameters similar to recent commercial high-end cores, we show that an effective way to increase the IPC is to allow the out-of-order engine to issue more micro-ops per cycle. But this must be done without impacting the clock cycle. We propose combining two techniques: clustering and register write specialization. Past research on clustered microarchitectures focused on narrow issue clusters, as the emphasis at that time was on allowing high clock frequencies. Instead, in this study, we consider wide issue clusters, with the goal of increasing the IPC under a constant clock frequency. We show that on a wide issue dual cluster, a very simple steering policy that sends 64 consecutive instructions to the same cluster, the next 64 instructions to the other cluster, and so forth, permits tolerating an intercluster delay of three cycles. We also propose a method for decreasing the energy cost of sending results from one cluster to the other cluster.
Fichier principal
Vignette du fichier
halversion2.pdf (422.98 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01193178 , version 1 (13-10-2015)

Identifiants

Citer

Pierre Michaud, Andrea Mondelli, André Seznec. Revisiting Clustered Microarchitecture for Future Superscalar Cores: A Case for Wide Issue Clusters. ACM Transactions on Architecture and Code Optimization, 2015, 13 (3), pp.22. ⟨10.1145/2800787⟩. ⟨hal-01193178⟩
744 Consultations
1194 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More