, AMD. Southern Islands Series Instruction Set Architecture, 2012.

N. Brunie and S. Collange, Reconvergence de contrôle implicite pour les architectures SIMT, Revue des Sciences et Technologies de l'Information -Série TSI : Technique et Science Informatiques, vol.32, pp.153-178, 2013.

. Inria,

N. Brunie, G. Sylvain-collange, and . Diamos, Simultaneous Branch and Warp Interweaving for Sustained GPU Performance, 39th Annual International Symposium on Computer Architecture (ISCA), pp.49-60, 2012.
URL : https://hal.archives-ouvertes.fr/ensl-00649650

L. Chen, Executing subroutines in a multi-threaded processing system, US Patent, vol.9, p.721, 2016.

C. Collange, Stack-less SIMT reconvergence at low cost, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00622654

C. Collange, Simty: a synthesizable general-purpose SIMT processor, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01351689

M. Sylvain-collange, D. Daumas, D. Defour, and . Parello, Étude comparée et simulation d'algorithmes de branchements pour le GPGPU, SYMPosium en Architectures nouvelles de machines (SYMPA), 2009.

G. Frederick-diamos, R. C. Johnson, V. Grover, O. Giroux, H. Jack et al., Execution of divergent threads using a convergence barrier, vol.265, 2015.

A. Eltantawy, M. Tor, and . Aamodt, MIMD synchronization on SIMT architectures, 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016.

A. Eltantawy, J. W. Ma, M. O. Connor, and T. Aamodt, A scalable multi-path microarchitecture for efficient GPU control flow, International Symposium on High Performance Computer Architecture (HPCA), 2014.

W. L. Wilson, I. Fung, G. Sham, T. Yuan, and . Aamodt, Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware, ACM Transactions on Architecture and Code Optimization (TACO), vol.6, issue.2, p.7, 2009.

M. Harris, CUDA 9 Features Revealed: Volta, Cooperative Groups and More. NVIDIA Parallel ForAll, 2017.

R. Holm and D. Mansell, Scheduling program instructions with a runner-up execution position, US Patent, vol.9, p.473, 2016.

W. Hwu, Heterogeneous System Architecture: A new compute platform infrastructure, 2015.

A. Lashgar, A. Khonsari, and A. Baniasadi, HARP: Harnessing inactive threads in many-core processors, ACM TECS, vol.13, issue.3s, 2014.

Y. Lee, R. Avizienis, A. Bishara, R. Xia, D. Lockhart et al., Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators, In ACM SIGARCH Computer Architecture News, vol.39, pp.129-140, 2011.

A. Levinthal and T. Porter, Chap -a SIMD graphics processor, Proceedings of the 11th annual conference on Computer graphics and interactive techniques, SIGGRAPH '84, pp.77-82, 1984.

J. Meng, D. Tarjan, and K. Skadron, Dynamic warp subdivision for integrated branch and memory divergence tolerance, SIGARCH Comput. Archit. News, vol.38, issue.3, pp.235-246, 2010.

M. Rhu and M. Erez, The dual-path execution model for efficient GPU control flow, International Symposium on High Performance Computer Architecture (HPCA2013), pp.591-602, 2013.