A. Al-dujaili, F. Deragisch, A. Hagiescu, and W. Wong, Guppy: A GPU-like soft-core processor, Field-Programmable Technology (FPT), 2012 International Conference on. IEEE, pp.57-60, 2012.

K. Andryc, M. Merchant, and R. Tessier, FlexGrip: A soft GPGPU for FPGAs, Field-Programmable Technology (FPT), 2013 International Conference on. IEEE, pp.230-237, 2013.

A. Bakhoda, G. Yuan, W. L. Wilson, H. Fung, T. M. Wong et al., Analyzing CUDA Workloads Using a Detailed GPU Simulator, proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp.163-174, 2009.

R. Balasubramanian, V. Gangadhar, Z. Guo, C. Ho, C. Joseph et al., Enabling GPGPU low-level hardware explorations with MIAOW: an open-source RTL implementation of a GPGPU, ACM Transactions on Architecture and Code Optimization (TACO), vol.12, p.21, 2015.

N. Brunie, G. Sylvain-collange, and . Diamos, Simultaneous Branch and Warp Interweaving for Sustained GPU Performance, 39th Annual International Symposium on Computer Architecture (ISCA), pp.49-60, 2012.
URL : https://hal.archives-ouvertes.fr/ensl-00649650

J. Bush, P. Dexter, T. N. Miller, and A. Carpenter, Nyami: a synthesizable GPU architectural model for general-purpose and graphics-specific workloads, Performance Analysis of Systems and Software (ISPASS), pp.173-182, 2015.

G. Chrysos, Intel® Xeon Phi? Coprocessor-the Architecture, Intel Whitepaper, 2014.

C. Collange, Stack-less SIMT reconvergence at low cost, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00622654

C. Collange and N. Brunie, Path list traversal: a new class of SIMT flow tracking mechanisms, Inria Rennes -Bretagne Atlantique, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01533085

M. Sylvain-collange, D. Daumas, D. Defour, and . Parello, Barra: a Parallel Functional Simulator for GPGPU, IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), pp.351-360, 2010.

A. Eltantawy, M. Tor, and . Aamodt, MIMD Synchronization on SIMT Architectures, 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016.

X. Gong, R. Ubal, and D. Kaeli, Multi2Sim Kepler: A detailed architectural GPU simulator, 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp.269-278, 2017.

L. John, D. Hennessy, and . Patterson, Computer architecture: a quantitative approach, 2011.

S. Kalathingal, . Sylvain-collange, A. Bharath-narasimha-swamy, and . Seznec, Dynamic Inter-Thread Vectorization Architecture: extracting DLP from TLP, IEEE International Symposium on Computer Architecture and High-Performance Computing, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01356202

H. Kim, J. Lee, B. Nagesh, J. Lakshminarayana, J. Sim et al., Macsim: A CPU-GPU heterogeneous simulation framework user guide, 2012.

J. Kingyens and . Gregory-steffan, A GPU-inspired soft processor for high-throughput acceleration, Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), pp.1-8, 2010.

C. Kozyrakis and D. Patterson, Overcoming the limitations of conventional vector processors, ACM SIGARCH Computer Architecture News, vol.31, pp.399-409, 2003.

Y. Lee, R. Avizienis, A. Bishara, R. Xia, D. Lockhart et al., Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators, ACM Transactions on Computer Systems (TOCS), vol.31, p.6, 2013.

Y. Lee, A. Waterman, R. Avizienis, H. Cook, C. Sun et al., A 45nm 1.3 GHz 16.7 doubleprecision GFLOPS/W RISC-V processor with vector accelerators, European Solid State Circuits Conference (ESSCIRC), pp.199-202, 2014.

J. E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, NVIDIA Tesla: A Unified Graphics and Computing Architecture, IEEE Micro, vol.28, pp.39-55, 2008.

S. Liu, J. E. Lindholm, M. Y. Siu, W. Brett, . Coon et al., , 2010.

J. Meng and K. Skadron, A reconfigurable simulator for largescale heterogeneous multicore architectures, IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp.119-120, 2011.

J. Meng, D. Tarjan, and K. Skadron, Dynamic warp subdivision for integrated branch and memory divergence tolerance, 2010.

, Archit. News, vol.38, pp.235-246, 2010.

J. Nickolls, J. William, and . Dally, The GPU Computing Era, IEEE Micro, vol.30, pp.56-69, 2010.

A. Severance, J. Edwards, H. Omidian, and G. Lemieux, Soft vector processors with streaming pipelines, Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays, pp.117-126, 2014.

A. Waterman, Y. Lee, A. David, K. Patterson, and . Asanovi?, The RISC-V Instruction Set Manual, vol.1, 2014.