

### Control and Power Management in Presence of Workload Variations

Radu Marculescu

#### ▶ To cite this version:

Radu Marculescu. Control and Power Management in Presence of Workload Variations. ISCA tutorial on "Multi-domain Processors: Challenges, Design Methods, and Recent Developments", Jun 2010, Saint Malo, France. inria-00493906

#### HAL Id: inria-00493906 https://inria.hal.science/inria-00493906

Submitted on 21 Jun2010

**HAL** is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire **HAL**, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

# Outline

LD:: System Level Design Group



Carnegie Mellor

## **Power Management Unit**



RGM2-ISCA'10

[Rusu, A-SSCC 2009] 101

### Outline

- VFI partitioning
  - Multi-VFI NoC designs
  - Partitioning and voltage assignment
  - Examples
- On-line control
  - State-based model construction
  - Feedback control architecture
  - Stability issues
- Summary



Dynamic Voltage and Frequency Scaling (DVFS)

**On-line** 

# **VFI Partitioning Problem**

Given NoC architecture and a schedule for the driver application Maximum number of allowed VFIs and physical constraints Find **v** VFI partitioning (i.e., optimum number of VFIs,  $n \le M$ ) Assignment of the supply and threshold voltages to each island Such that the *total energy consumption* is minimized Number of VFIs  $E_{Total} = E_{App} + \sum_{i=1}^{n} E_{VFI} (i)$ Application (useful) Overhead of *i*th VFI energy consumption (comp+comm)  $E_{VFI} = E_{ClkGen} + E_{Vconv} + E_{MixClkFifo}$ 105 RGM2-ISCA'10 Voltage/Frequency Assignment Problem Given a VFI partitioning Find supply  $(V_i)$  and threshold  $(V_{ij})$  voltage assignments Such that application energy consumption is minimized  $min E_{App} = \sum_{\forall i \in T} E_i(V_i, V_{ii}) + \sum_{\forall i \in T} \sum_{\forall i \in T} vol(i, j) E_{bit}(i, j)$ Energy consumed when the task Communication energy is executed at  $(V_i, V_t) = R_i C_i V_i^2 + T_i k_i V_i e^{\left(-\frac{V_t}{S_t}\right)}$ Subject to the following deadline constraints per task t.

$$\frac{x_t}{f_t} + t_{Comm}^t \leq deadline_t - start_time_t$$

Execution time

Communication delay  $t_{comm}(src, dst) = \sum_{i \in P} \frac{\mu_s}{f_i} + t_{fifo} \left\lceil \frac{vol(src, dst)}{W} \right\rceil$ 

RGM2-ISCA'10

# **VFI Partitioning and Voltage Assignment Algorithm**



### **Voltage Assignment Algorithm**



# **Voltage Assignment Algorithm**



109



# **Experiments with Realistic Benchmarks**

- Several E3S benchmarks (consumer, network, auto-industry, telecom)
- Applications scheduled to NoCs ranging from 3×3 to 5×5



### Outline

- VFI partitioning
  - Multi-VFI NoC designs
  - Partitioning and voltage assignment
  - Examples
- On-line control
  - State-based model construction
  - Feedback control architecture
  - Stability issues
  - Summary

## Why On-line Control?



## **Distributed Power Management in Magali**



# **Local Control in Multi Clock Domain Processors**



- PID controller for voltage/frequency control proposed previously using only local queue information
  - Ignores interactions among multiple queues
  - **v** Works fine if frequency change in one clock domain has negligible impact on other domains
- For an MCD processor with arbitrary partitions and strong interactions among multiple queues, a centralized online DVFS scheme may be needed

RGM2-ISCA'10

#### **Design Methodology for Multi-VFI NoCs** Traditionally, PID controllers are used due to simplicity. However, state-space modeling brings new opportunities Precise controllability and stability analysis Pole placement, linear quadratic regulator, robust controller **Desired utilizations** for interface FIFOs $V_{1}, f_{1}$ NoC under Voltage- $V_{2}, f_{2}$ control frequency ÷ controller $V_N, f_N$

State

feedback

Actual utilization

of interface FIFOs

115

# **Design Methodology for Multi-VFI NoCs**



### **Formal Feedback Control**



# **Step-by-step Model Construction** (one queue)



### **Step-by-step Model Construction** (three queues)



- The topology of the VFIs determines the matrix B
- An algorithm automatically constructs B
- The structure of the model is the same regardless of B

$$Q(k)_{N \times 1} = Q(k-1)_{N \times 1} + TB_{N \times M}F(k-1)_{M \times 1}$$

# **System Controllability**

In the multiple voltage-frequency island system with M islands, utilization of at most M queues can be controlled.

The system is controllable *iff rank*(B) = N (i.e., number of controlled queues)



### **Feedback Control Architecture**



# **Feedback Control Architecture**





#### By finite value theorem, gain matrix K0 = K

#### Possible extensions

- Adaptive techniques, such as gain scheduling
- ▼ Monitor the workload and compute *K* or use values computed off-line

### **Experiments with MPEG-2 Encoder**

- The encoder is divided into three VFI islands and mixed clock FIFOs are used at the interfaces
- The frequency of Variable Length Encoder is set to achieve the desired encoding rate



# **Frequency Tracking Capabilities**



**f**<sub>1</sub> is set to meet the target,  $f_2$  and  $f_3$  follow  $f_1$ 



# **Results on FPGA prototyping**



- Inter-domain communication
  - Delay Locked Loops (DLLs) used to generate individual clock signals
  - Block-RAM based mixed-clock FIFOs
  - Voltage conversion not supported yet by Xilinx boards

MPEG-2 encoder design divided into three VFIs

Synchronous design utilizes 16966 LUTs

Design with three VFIs utilizes 19161 LUTs

13% overhead

- Power consumption obtained using XPower
  - Without voltage scaling, power drops from 277W to 259W
  - · Consistent with simulations

RGM2-ISCA'10

### **Clock Control Architecture**



127

# **Clock Control Architecture**



### **Clock Control Architecture**



## Summary

- Energy issues in multi-VFI NoCs are crucial
  - VFI synthesis via partitioning and voltage allocation
  - Other formulations are possible
- Dynamic V/F control yields significant power savings over static approaches while being robust to workload variations
  - DVFS controller smoothes out variations in workload characteristics
  - Precise controllability and stability conditions can be defined
- More work needed to address
  - Adaptive techniques for VFI control
  - Run-time optimizations for multiple applications
  - Impact of dynamic traffic on overall DVFS-based power management

| RGM2-ISCA'10 |
|--------------|
|--------------|

## Outline

- Part I: Multi-Domain Processors Design Overview (2:00-2:45PM)
  - ▼ Multi-domain server, cell phone, and media processors
  - Power management techniques
- Part II: Router Design and Synchronization Issues (2:45-3:30PM)
  - Asynchronous router design
  - Quality of Service and virtual channels in QNoC
- Part III: Control and Power Management in Presence of Workload Variations (4:00-4:45PM)
  - ▼ VFI partitioning and voltage assignment
  - ▼ Workload modeling and dynamic control of multi-VFI designs

### Part IV: DVFS in Presence of Process Variations (4:45-5:30PM)

- Impact of process variations on DVFS controller performance
- Technology-driven limits on DVFS controllability

# **References (Part III)**

- U. Y. Ogras, et al., 'Design and Management of Voltage-Frequency Island Partitioned Networks-on-Chip,' in IEEE Trans. VLSI, March 2009.
- P. Choudhary, D. Marculescu, 'Power Management of Voltage/Frequency Island-Based Systems Using Hardware Based Methods,' in IEEE Trans. on VLSI, March 2009
- T. Simunic, S. P. Boyd, P. Glynn, 'Managing power consumption in networks on chips,' in IEEE Trans. on VLSI, Jan. 2004.
- A. Alimonda, et al., 'Feedback-Based Approach to DVFS in Data-Flow Applications,' in IEEE Trans. on CAD of Integrated Circuits and Systems 28(11): 1691-1704 (2009)
- E. Beigne, et al., 'Dynamic voltage and frequency scaling architecture for units integration within a GALS NoC,' in Proc. Int. Symp. Netw. Chip, 2008, pp. 129–138.
- C.-L. Chou and R. Marculescu, 'Energy- and performance-aware incremental mapping for networks on chip with multiple voltage levels,' IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 27, no. 10, pp. 1866–1879, Oct. 2008
- Q. Wu, et al, 'Formal Online Methods for Voltage/Frequency Control in Multiple Clock Domain Microprocessors', in Proc. ASPLOS 2004
- G. Semeraro, et al, 'Energy efficient processor design using multiple clock domains with dynamic voltage and frequency scaling,' in Proc. HPCA 2002
- U. Y. Ogras, et. al, 'NoC Prototyping Using FPGAs: Challenges and Promising Results in NoC Prototyping Using FPGAs, ' in IEEE Micro, September/October 2007
- Clermidy et al, 'A 477mW NoC-Based Digital Baseband for MIMO 4G SDR,' IEEE ISSCC, February 2010
- P. Juang, et al. 'Coordinated, distributed, formal energy management of chip multiprocessors,' Proc. ISLPED 2005...

179

# **References (Part IV)**

- Y. Abulafia and A. Kornfeld. Estimation of FMAX and ISB in microprocessors. IEEE Trans. on VLSI Systems, 13(10), Oct 2006.
- A. Bonnoit, S. Herbert, D. Marculescu and L. Pileggi. Integrating Dynamic Voltage/Frequency Scaling and Adaptive Body Biasing using Test-time Voltage Selection. In Proc. of IEEE/ACM ISLPED, Aug. 2009.
- K. Bowman, S. Duvall, and J. Meindl. Impact of die-to die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration. IEEE Journal of Solid-State Circuits, 37(2), Feb 2002.
- S. Garg, D. Marculescu. System-Level Mitigation of WID Leakage Variations using Body-Bias Islands. In Proc. ACM/IEEE CODES+ISSS, Atlanta, GA, October 2008.
- S. Garg, D. Marculescu, R. Marculescu and U. Ogras. Technology-driven Limits on DVFS Controllability of Multiple Voltage-Frequency Island Designs. In Proc. of IEEE/ACM Design Automation Conference (DAC), Jul. 2009.
- S. Herbert and D. Marculescu. Analysis of dynamic voltage/frequency scaling in chip-multiprocessors. In ISLPED '07: Proc. of the 2007 ISLPED, 2007.
- S. Herbert and D. Marculescu. Variation-Aware Dynamic Voltage/Frequency Scaling. In Proc. of the 15th HPCA, Feb. 2009.
- C. Isci, A. Buyuktosunoglu, C.-Y. Cher, P. Bose, and M. Martonosi. An analysis of efficient multi-core global power management policies: Maximizing performance for a given power budget. In MICRO '06, 2006.
- S.R. Sarangi, B. Greskamp, R. Teodorescu, J. Nakano, A. Tiwari and J. Torrellas. VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects. IEEE Transactions on Semiconductor Manufacturing (IEEE TSM), February 2008.
- R. Teodorescu and J. Torrellas. Variation-aware application scheduling and power management for chip multiprocessors. In ISCA'08: Proc. of the 35th ISCA, 2008.
- J.Tschanz, J.T. Cao, S.G. Narendra, R. Nair, D.A. Antoniadis, A.P. Chandrakasan, V. De. Adaptive Body Bias for Reducing Impacts of Die-to-Die and Within-Die Parameter Variations on Microprocessor Frequency and Leakage. IEEE Journal of Solid-State Circuits, Vol. 37, No. 11, Nov. 2002.
- W. Zhao and Y. Cao. New generation of predictive technology model for sub-45nm early design exploration. IEEE Trans. Electron Devices, vol. 53, no. 11, pp. 2816--2823, Nov. 2006.
- This list of references is NOT exhaustive. There are many good contributions not mentioned here due to involuntary omissions or space limitations.
  180