The PARSEC benchmark suite, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT '08, pp.72-81, 2008. ,
DOI : 10.1145/1454115.1454128
Simultaneous branch and warp interweaving for sustained GPU performance, ACM SIGARCH Computer Architecture News, vol.40, issue.3, pp.49-60, 2012. ,
DOI : 10.1145/2366231.2337166
URL : https://hal.archives-ouvertes.fr/ensl-00649650
Dynamically Controlled Resource Allocation in SMT Processors, 37th International Symposium on Microarchitecture (MICRO-37'04), pp.171-182, 2004. ,
DOI : 10.1109/MICRO.2004.17
Stack-less simt reconvergence at low cost, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00622654
Benjamin Ashbaugh, and Subramaniam Maiyuran. SIMD re-convergence at thread frontiers, MICRO 44: Proceedings of the 44th annual IEEE/ACM International Symposium on Microarchitecture, 2011. ,
Front-end policies for improved issue efficiency in SMT processors, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings., pp.31-40, 2003. ,
DOI : 10.1109/HPCA.2003.1183522
A memory-level parallelism aware fetch policy for SMT processors, 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007, pp.10-14, 2007. ,
Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware, ACM Trans. Archit. Code Optim, vol.67, pp.1-7, 2009. ,
Branch prediction and simultaneous multithreading, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique, pp.169-173, 1996. ,
DOI : 10.1109/PACT.1996.552664
URL : https://hal.archives-ouvertes.fr/inria-00073847
Out-of-order execution may not be cost-effective on processors featuring simultaneous multithreading, Proceedings Fifth International Symposium on High-Performance Computer Architecture, pp.64-67, 1999. ,
DOI : 10.1109/HPCA.1999.744331
URL : https://hal.archives-ouvertes.fr/inria-00073298
HARP, ACM Transactions on Embedded Computing Systems, vol.13, issue.3s, p.114, 2014. ,
DOI : 10.1145/2567938
Multimedia extensions for general-purpose processors, 1997 IEEE Workshop on Signal Processing Systems. SiPS 97 Design and Implementation formerly VLSI Signal Processing, pp.9-23, 1997. ,
DOI : 10.1109/SIPS.1997.625683
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators, ACM Transactions on Computer Systems (TOCS), issue.3, p.316, 2013. ,
McPAT, Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Micro-42, pp.42-469, 2009. ,
DOI : 10.1145/1669112.1669172
Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp.337-348, 2010. ,
DOI : 10.1109/MICRO.2010.41
Pin, ACM SIGPLAN Notices, vol.40, issue.6, pp.190-200, 2005. ,
DOI : 10.1145/1064978.1065034
Boosting SMT performance by speculation control, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001, p.2, 2001. ,
DOI : 10.1109/IPDPS.2001.924929
Balancing thoughput and fairness in SMT processors, IEEE International Symposium on Performance Analysis of Systems and Software, pp.164-171, 2001. ,
An Evaluation of Vectorizing Compilers, 2011 International Conference on Parallel Architectures and Compilation Techniques, pp.372-382, 2011. ,
DOI : 10.1109/PACT.2011.68
Execution Drafting: Energy Efficiency through Computation Deduplication, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp.432-444, 2014. ,
DOI : 10.1109/MICRO.2014.43
Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling, 2009 IEEE International Conference on Computer Design, pp.282-288, 2009. ,
DOI : 10.1109/ICCD.2009.5413143
Dynamic warp subdivision for integrated branch and memory divergence tolerance, ACM SIGARCH Computer Architecture News, vol.38, issue.3, pp.235-246, 2010. ,
DOI : 10.1145/1816038.1815992
iGPU, ACM SIGARCH Computer Architecture News, pp.72-83, 2012. ,
DOI : 10.1145/2366231.2337168
Thread scheduling and memory coalescing for dynamic vectorization of SPMD workloads, Parallel Computing, vol.40, issue.9, pp.548-558, 2014. ,
DOI : 10.1016/j.parco.2014.03.006
URL : https://hal.archives-ouvertes.fr/hal-01087054
Improving GPU performance via large warps and two-level warp scheduling, Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44 '11, 2011. ,
DOI : 10.1145/2155620.2155656
The GPU Computing Era, IEEE Micro, vol.30, issue.2, pp.56-69, 2010. ,
DOI : 10.1109/MM.2010.41
Advanced compiler optimizations for supercomputers, Commun. ACM, vol.29, issue.12, pp.1184-1201, 1986. ,
Compiling c* programs for a hypercube multicomputer, In ACM SIGPLAN Notices, vol.23, pp.57-65, 1988. ,
The CRAY-1 computer system, Communications of the ACM, vol.21, issue.1, pp.63-72, 1978. ,
DOI : 10.1145/359327.359336
A new case for the TAGE branch predictor, Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44 '11, pp.117-127, 2011. ,
DOI : 10.1145/2155620.2155635
URL : https://hal.archives-ouvertes.fr/hal-00639193
Design tradeoffs for the alpha EV8 conditional branch predictor, 29th International Symposium on Computer Architecture, pp.25-29, 2002. ,
Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor, In ACM SIGARCH Computer Architecture News, vol.24, pp.191-202, 1996. ,
Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor, Proceedings of the 23rd Annual International Symposium on Computer Architecture, pp.191-202, 1996. ,
Simultaneous multithreading: Maximizing on-chip parallelism, Proceedings of the 22nd Annual International Symposium on Computer Architecture, ISCA '95, pp.392-403, 1995. ,
Benchmark package Quantifying sources of error in mcpat and potential impacts on architectural studies, 21st IEEE International Symposium on High Performance Computer Architecture, HPCA 2015, pp.577-589, 2006. ,