, cores; Front-end 2-wide; 16 B fetch block size 14 stages (3-fetch, 3-decode, 3-rename, 2-dispatch, 3-commit), OoO Execution Cores -2 GHz

, 24-entry fetch buffer, 32-entry decode buffer, 32-entry ROB

, INT: 2-alu, 1-mul. and 1-div.; FP: 1-alu, 1-mul. and 1-div

, 1-load and 1-store functional units (1-1 cycle); MOB entries: 10-read and 10-write

, Branch Predictor -1 branch per fetch; 4 K-entry 4-way set-associative BTB

, Two-Level PAs predictor; 16 K-entry BHT, 2-bits prediction

L. Data and +. Inst, Cache -32 KB, 8-way, 2-cycle; 64 bytes line; LRU policy

, MSHR entries: 10-request, 8-write-back; Stride Prefetcher: 1-degree, 16-strides table

, L2 Cache -256 KB shared for every 2 cores; 8-way, 4-cycle; 64 bytes line; LRU policy; MSHR entries: 10-request, 6-write-back; Inclusive LLC; MOESI coherence

, Stream Prefetcher: 2-degree, 16 prefetch distance, p.32

, Low Power DDR3-1600 Controller and Interconnection -Bi-directional ring, 1?8-channels; 8 LP-DRAM banks, 8 KB row buffer per bank (1 KB per device), 8 burst length

. Open-row, . Cas, R. Rp, and C. Latency,

, HMC Module and Interconnection -Bi-directional ring, 1?4-links @ 8GHz; 32 Vaults, 16 LP-DRAM banks per Vault @ 800 MHz, 256 B row buffer per bank, 2 burst length

. Closed-row, . Cas, R. Rp, and C. Latency,

, References 1. Altera: Hybrid memory cube controller ip core user guide (2015), https

M. A. Alves, Increasing Energy Efficiency of Processor Caches via Line Usage Predictors, 2014.

M. A. Alves, M. Diener, and F. B. Moreira, Sinuca: A validated microarchitecture simulator, High Performance Computation Conf, 2015.

H. M. Consortium, Hybrid memory cube specification rev, 2011.

H. M. Consortium, Hybrid memory cube specification rev, 2013.

B. T. Davis, Modern dram architectures, 2001.

J. L. Henning, SPEC CPU2006 benchmark descriptions, ACM SIGARCH Computer Architecture News, vol.34, issue.4, pp.1-17, 2006.
DOI : 10.1145/1186736.1186737

, Intel: Intel Atom Processor E3800 Product Family, Tech. rep, 2015.

B. Jacob, S. Ng, and D. Wang, Memory systems: cache, DRAM, disk, 2008.

J. Jeddeloh and B. Keeth, Hybrid memory cube new DRAM architecture increases density and performance, 2012 Symposium on VLSI Technology (VLSIT), pp.87-88, 2012.
DOI : 10.1109/VLSIT.2012.6242474

J. Leidel and Y. Chen, Hmc-sim: A simulation framework for hybrid memory cube devices, Int. Parallel Distributed Processing Symp. Workshops, pp.1465-1474, 2014.

, Micron: 1gb: x4, x8, x16 ddr3 sdram features, 1Gb DDR3 SDRAM -Rev, 2006.

J. V. Olmen, A. Mercha, and G. Katti, 3d stacked ic demonstration using a through silicon via first approach. In: Int. Electronic Devices Meeting, 2008.

H. Patil, R. Cohn, and M. Charney, Pinpointing Representative Portions of Large Intel?? Itanium?? Programs with Dynamic Instrumentation, 37th International Symposium on Microarchitecture (MICRO-37'04), pp.81-92, 2004.
DOI : 10.1109/MICRO.2004.28

J. Pawlowski, Hybrid memory cube (HMC), 2011 IEEE Hot Chips 23 Symposium (HCS), 2011.
DOI : 10.1109/HOTCHIPS.2011.7477494

P. Rosenfeld, Performance Exploration of the Hybrid Memory Cube, 2014.

P. Rosenfeld, E. Cooper-balis, T. Farrell, D. Resnick, and B. Jacob, Peering over the memory wall: Design space and performance analysis of the hybrid memory cube, 2012.

H. Saito, G. Gaertner, and W. Jones, Large System Performance of SPEC OMP2001 Benchmarks, Int. Symp. on High Performance Computing, pp.370-379, 2006.
DOI : 10.1007/3-540-47847-7_34

T. Thanh-hoang, A. Shambayati, C. Deutschbein, H. Hoffmann, and A. Chien, Performance and energy limits of a processor-integrated FFT accelerator, 2014 IEEE High Performance Extreme Computing Conference (HPEC), pp.1-6, 2014.
DOI : 10.1109/HPEC.2014.7040951

T. Yoshida, M. Hondou, and T. Tabata, SPARC64??? XIfx: Fujitsu's next generation processor for HPC, 2014 IEEE Hot Chips 26 Symposium (HCS), pp.6-14, 2015.
DOI : 10.1109/HOTCHIPS.2014.7478806

Z. Zhu, Z. Zhang, and X. Zhang, Fine-grain priority scheduling on multi-channel memory systems, Int. Symp. on High-Performance Computer Architecture, pp.107-116, 2002.