, cores; Front-end 2-wide; 16 B fetch block size 14 stages (3-fetch, 3-decode, 3-rename, 2-dispatch, 3-commit), OoO Execution Cores -2 GHz
, 24-entry fetch buffer, 32-entry decode buffer, 32-entry ROB
, INT: 2-alu, 1-mul. and 1-div.; FP: 1-alu, 1-mul. and 1-div
, 1-load and 1-store functional units (1-1 cycle); MOB entries: 10-read and 10-write
, Branch Predictor -1 branch per fetch; 4 K-entry 4-way set-associative BTB
, Two-Level PAs predictor; 16 K-entry BHT, 2-bits prediction
Cache -32 KB, 8-way, 2-cycle; 64 bytes line; LRU policy ,
, MSHR entries: 10-request, 8-write-back; Stride Prefetcher: 1-degree, 16-strides table
, L2 Cache -256 KB shared for every 2 cores; 8-way, 4-cycle; 64 bytes line; LRU policy; MSHR entries: 10-request, 6-write-back; Inclusive LLC; MOESI coherence
, Stream Prefetcher: 2-degree, 16 prefetch distance, p.32
, Low Power DDR3-1600 Controller and Interconnection -Bi-directional ring, 1?8-channels; 8 LP-DRAM banks, 8 KB row buffer per bank (1 KB per device), 8 burst length
,
, HMC Module and Interconnection -Bi-directional ring, 1?4-links @ 8GHz; 32 Vaults, 16 LP-DRAM banks per Vault @ 800 MHz, 256 B row buffer per bank, 2 burst length
,
, References 1. Altera: Hybrid memory cube controller ip core user guide (2015), https
Increasing Energy Efficiency of Processor Caches via Line Usage Predictors, 2014. ,
Sinuca: A validated microarchitecture simulator, High Performance Computation Conf, 2015. ,
Hybrid memory cube specification rev, 2011. ,
Hybrid memory cube specification rev, 2013. ,
Modern dram architectures, 2001. ,
SPEC CPU2006 benchmark descriptions, ACM SIGARCH Computer Architecture News, vol.34, issue.4, pp.1-17, 2006. ,
DOI : 10.1145/1186736.1186737
, Intel: Intel Atom Processor E3800 Product Family, Tech. rep, 2015.
Memory systems: cache, DRAM, disk, 2008. ,
Hybrid memory cube new DRAM architecture increases density and performance, 2012 Symposium on VLSI Technology (VLSIT), pp.87-88, 2012. ,
DOI : 10.1109/VLSIT.2012.6242474
Hmc-sim: A simulation framework for hybrid memory cube devices, Int. Parallel Distributed Processing Symp. Workshops, pp.1465-1474, 2014. ,
, Micron: 1gb: x4, x8, x16 ddr3 sdram features, 1Gb DDR3 SDRAM -Rev, 2006.
, 3d stacked ic demonstration using a through silicon via first approach. In: Int. Electronic Devices Meeting, 2008.
Pinpointing Representative Portions of Large Intel?? Itanium?? Programs with Dynamic Instrumentation, 37th International Symposium on Microarchitecture (MICRO-37'04), pp.81-92, 2004. ,
DOI : 10.1109/MICRO.2004.28
Hybrid memory cube (HMC), 2011 IEEE Hot Chips 23 Symposium (HCS), 2011. ,
DOI : 10.1109/HOTCHIPS.2011.7477494
Performance Exploration of the Hybrid Memory Cube, 2014. ,
Peering over the memory wall: Design space and performance analysis of the hybrid memory cube, 2012. ,
Large System Performance of SPEC OMP2001 Benchmarks, Int. Symp. on High Performance Computing, pp.370-379, 2006. ,
DOI : 10.1007/3-540-47847-7_34
Performance and energy limits of a processor-integrated FFT accelerator, 2014 IEEE High Performance Extreme Computing Conference (HPEC), pp.1-6, 2014. ,
DOI : 10.1109/HPEC.2014.7040951
SPARC64??? XIfx: Fujitsu's next generation processor for HPC, 2014 IEEE Hot Chips 26 Symposium (HCS), pp.6-14, 2015. ,
DOI : 10.1109/HOTCHIPS.2014.7478806
Fine-grain priority scheduling on multi-channel memory systems, Int. Symp. on High-Performance Computer Architecture, pp.107-116, 2002. ,