C. Bienia, S. Kumar, K. Singh, and . Li, The PARSEC benchmark suite, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT '08, pp.72-81, 2008.
DOI : 10.1145/1454115.1454128

N. Brunie, S. Collange, and G. Diamos, Simultaneous branch and warp interweaving for sustained GPU performance, ACM SIGARCH Computer Architecture News, vol.40, issue.3, pp.49-60, 2012.
DOI : 10.1145/2366231.2337166
URL : https://hal.archives-ouvertes.fr/ensl-00649650

F. J. Cazorla, A. Ramírez, M. Valero, and E. Fernández, Dynamically Controlled Resource Allocation in SMT Processors, 37th International Symposium on Microarchitecture (MICRO-37'04), pp.171-182, 2004.
DOI : 10.1109/MICRO.2004.17

S. Collange, Stack-less simt reconvergence at low cost, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00622654

G. Diamos, A. Kerr, H. Wu, and S. Yalamanchili, Benjamin Ashbaugh, and Subramaniam Maiyuran. SIMD re-convergence at thread frontiers, MICRO 44: Proceedings of the 44th annual IEEE/ACM International Symposium on Microarchitecture, 2011.

A. El-moursy and D. H. Albonesi, Front-end policies for improved issue efficiency in SMT processors, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings., pp.31-40, 2003.
DOI : 10.1109/HPCA.2003.1183522

S. Eyerman and L. Eeckhout, A memory-level parallelism aware fetch policy for SMT processors, 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007, pp.10-14, 2007.

W. L. Wilson, I. Fung, G. Sham, T. M. Yuan, and . Aamodt, Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware, ACM Trans. Archit. Code Optim, vol.67, pp.1-7, 2009.

S. Hily and A. Seznec, Branch prediction and simultaneous multithreading, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique, pp.169-173, 1996.
DOI : 10.1109/PACT.1996.552664
URL : https://hal.archives-ouvertes.fr/inria-00073847

S. Hily and A. Seznec, Out-of-order execution may not be cost-effective on processors featuring simultaneous multithreading, Proceedings Fifth International Symposium on High-Performance Computer Architecture, pp.64-67, 1999.
DOI : 10.1109/HPCA.1999.744331
URL : https://hal.archives-ouvertes.fr/inria-00073298

A. Lashgar, A. Khonsari, and A. Baniasadi, HARP, ACM Transactions on Embedded Computing Systems, vol.13, issue.3s, p.114, 2014.
DOI : 10.1145/2567938

R. Lee, Multimedia extensions for general-purpose processors, 1997 IEEE Workshop on Signal Processing Systems. SiPS 97 Design and Implementation formerly VLSI Signal Processing, pp.9-23, 1997.
DOI : 10.1109/SIPS.1997.625683

Y. Lee, R. Avizienis, A. Bishara, R. Xia, D. Lockhart et al., Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators, ACM Transactions on Computer Systems (TOCS), issue.3, p.316, 2013.

S. Li, J. Ho-ahn, D. Richard, . Strong, B. Jay et al., McPAT, Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Micro-42, pp.42-469, 2009.
DOI : 10.1145/1669112.1669172

G. Long, D. Franklin, S. Biswas, P. Ortiz, J. Oberg et al., Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp.337-348, 2010.
DOI : 10.1109/MICRO.2010.41

C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser et al., Pin, ACM SIGPLAN Notices, vol.40, issue.6, pp.190-200, 2005.
DOI : 10.1145/1064978.1065034

K. Luo, M. Franklin, S. S. Mukherjee, and A. Seznec, Boosting SMT performance by speculation control, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001, p.2, 2001.
DOI : 10.1109/IPDPS.2001.924929

K. Luo, J. Gummaraju, and M. Franklin, Balancing thoughput and fairness in SMT processors, IEEE International Symposium on Performance Analysis of Systems and Software, pp.164-171, 2001.

S. Maleki, Y. Gao, M. J. Garzaran, T. Wong, A. David et al., An Evaluation of Vectorizing Compilers, 2011 International Conference on Parallel Architectures and Compilation Techniques, pp.372-382, 2011.
DOI : 10.1109/PACT.2011.68

M. Mckeown, J. Balkind, and D. Wentzlaff, Execution Drafting: Energy Efficiency through Computation Deduplication, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp.432-444, 2014.
DOI : 10.1109/MICRO.2014.43

J. Meng and K. Skadron, Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling, 2009 IEEE International Conference on Computer Design, pp.282-288, 2009.
DOI : 10.1109/ICCD.2009.5413143

J. Meng, D. Tarjan, and K. Skadron, Dynamic warp subdivision for integrated branch and memory divergence tolerance, ACM SIGARCH Computer Architecture News, vol.38, issue.3, pp.235-246, 2010.
DOI : 10.1145/1816038.1815992

J. Menon, M. D. Kruijf, and K. Sankaralingam, iGPU, ACM SIGARCH Computer Architecture News, pp.72-83, 2012.
DOI : 10.1145/2366231.2337168

T. Milanez, S. Collange, F. M. , Q. Pereira, W. Meira et al., Thread scheduling and memory coalescing for dynamic vectorization of SPMD workloads, Parallel Computing, vol.40, issue.9, pp.548-558, 2014.
DOI : 10.1016/j.parco.2014.03.006
URL : https://hal.archives-ouvertes.fr/hal-01087054

C. Veynu-narasiman, M. Lee, R. Shebanow, O. Miftakhutdinov, Y. N. Mutlu et al., Improving GPU performance via large warps and two-level warp scheduling, Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44 '11, 2011.
DOI : 10.1145/2155620.2155656

J. Nickolls and W. J. Dally, The GPU Computing Era, IEEE Micro, vol.30, issue.2, pp.56-69, 2010.
DOI : 10.1109/MM.2010.41

A. David, M. J. Padua, and . Wolfe, Advanced compiler optimizations for supercomputers, Commun. ACM, vol.29, issue.12, pp.1184-1201, 1986.

J. Michael, . Quinn, J. Philip, . Hatcher, C. Karen et al., Compiling c* programs for a hypercube multicomputer, In ACM SIGPLAN Notices, vol.23, pp.57-65, 1988.

R. M. Russell, The CRAY-1 computer system, Communications of the ACM, vol.21, issue.1, pp.63-72, 1978.
DOI : 10.1145/359327.359336

A. Seznec, A new case for the TAGE branch predictor, Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44 '11, pp.117-127, 2011.
DOI : 10.1145/2155620.2155635
URL : https://hal.archives-ouvertes.fr/hal-00639193

A. Seznec, S. Felix, V. Krishnan, and Y. Sazeides, Design tradeoffs for the alpha EV8 conditional branch predictor, 29th International Symposium on Computer Architecture, pp.25-29, 2002.

M. Dean, . Tullsen, J. Susan, . Eggers, S. Joel et al., Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor, In ACM SIGARCH Computer Architecture News, vol.24, pp.191-202, 1996.

. Stamm, Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor, Proceedings of the 23rd Annual International Symposium on Computer Architecture, pp.191-202, 1996.

M. Dean, S. J. Tullsen, H. M. Eggers, and . Levy, Simultaneous multithreading: Maximizing on-chip parallelism, Proceedings of the 22nd Annual International Symposium on Computer Architecture, ISCA '95, pp.392-403, 1995.

W. University, H. M. Sam-likun-xi, P. Jacobson, . Bose, . Gu-yeon et al., Benchmark package Quantifying sources of error in mcpat and potential impacts on architectural studies, 21st IEEE International Symposium on High Performance Computer Architecture, HPCA 2015, pp.577-589, 2006.