N. Brunie, S. Collange, and G. Diamos, Simultaneous branch and warp interweaving for sustained GPU performance, Isca'12: Proceedings of the 39th annual international symposium on computer architecture, 2012.
URL : https://hal.archives-ouvertes.fr/ensl-00649650

C. S. Boyer, M. Meng, J. Tarjan, D. Sheaffer, J. W. Lee et al., Rodinia: A benchmark suite for heterogeneous computing, IEEE Workload Characterization Symposium, vol.0, pp.44-54, 2009.

S. Collange, Analyse de l'architecture GPU Tesla. Rapport technique n o hal- 00443875. HAL-CCSD. Consulté sur http, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00443875

S. Collange, Une architecture unifiée pour traiter la divergence de contrôle et la divergence mémoire en SIMT, SYMPosium en Architectures. Saint-Malo, France. Consulté sur http, 2011.

S. Collange, M. Daumas, D. Defour, and D. Parello, Étude comparée et simulation d'algorithmes de branchements pour le GPGPU Consulté sur http, Symposium en architectures nouvelles de machines (sympa), 2009.

S. Collange, M. Daumas, D. Defour, and D. Parello, Barra: A Parallel Functional Simulator for GPGPU, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp.351-360, 2010.
DOI : 10.1109/MASCOTS.2010.43

S. Collange, D. Defour, and A. Tisserand, Power Consumption of GPUs from a Software Perspective, ICCS 2009 Consulté sur http, pp.922-931, 2009.
DOI : 10.1007/978-3-642-01970-8_92
URL : https://hal.archives-ouvertes.fr/hal-00348672

S. Collange, D. Defour, and Y. Zhang, Dynamic Detection of Uniform and Affine Vectors in GPGPU Computations, Europar 3rd workshop on highly parallel processing on a chip (hppc) Consulté sur http, pp.46-55, 2009.
DOI : 10.1007/978-3-642-14122-5_8
URL : https://hal.archives-ouvertes.fr/hal-00396719

J. D. Collins, D. M. Tullsen, and H. Wang, Control Flow Optimization Via Dynamic Reconvergence Prediction, 37th International Symposium on Microarchitecture (MICRO-37'04), pp.129-140, 2004.
DOI : 10.1109/MICRO.2004.13
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.59.6434

B. W. Coon and J. E. Lindholm, System and method for managing divergent threads in a SIMD architecture, 2008.

B. W. Coon, J. R. Nickolls, J. E. Lindholm, and S. D. Tzvetkov, Structured programming control flow in a SIMD architecture, 2011.

E. Demers, Evolution of AMD's graphics core, and preview of Graphics Core Next. AMD Fusion Developer Summit keynote. Consulté sur http, 2011.

G. Diamos, A. Kerr, H. Wu, S. Yalamanchili, B. Ashbaugh et al., SIMD re-convergence at thread frontiers, Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44 '11, 2011.
DOI : 10.1145/2155620.2155676

W. Fung and T. Aamodt, Thread block compaction for efficient SIMT control flow, 2011 IEEE 17th International Symposium on High Performance Computer Architecture, pp.25-36, 2011.
DOI : 10.1109/HPCA.2011.5749714

W. W. Fung, I. Sham, G. Yuan, and T. M. Aamodt, Dynamic warp formation, ACM Transactions on Architecture and Code Optimization, vol.6, issue.2, pp.7-8, 2009.
DOI : 10.1145/1543753.1543756

M. Garland, L. Grand, S. Nickolls, J. Anderson, J. Hardwick et al., Parallel Computing Experiences with CUDA, IEEE Micro, vol.28, issue.4, pp.13-27, 2008.
DOI : 10.1109/MM.2008.57

M. Gebhart, D. R. Johnson, D. Tarjan, S. W. Keckler, W. J. Dally et al., Energy-efficient mechanisms for managing thread context in throughput processors, Proceeding of the 38th annual international symposium on computer architecture, pp.235-246, 2011.

R. Keryell and N. Paris, Activity counter: New optimization for the dynamic scheduling of SIMD control flow Consulté sur http, Proceedings of the 1993 international conference on parallel processing, pp.184-18736, 1993.

D. H. Lawrie, T. Layman, D. Baer, and J. M. Randal, Glypnir---a programming language for Illiac IV, Communications of the ACM, vol.18, issue.3, pp.157-164, 1975.
DOI : 10.1145/360680.360687

A. Levinthal and T. Porter, Chap -a SIMD graphics processor, Proceedings of the 11th annual conference on computer graphics and interactive techniques Consulté sur http, pp.77-82, 1984.

J. E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, NVIDIA Tesla: A Unified Graphics and Computing Architecture, IEEE Micro, vol.28, issue.2, pp.39-55, 2008.
DOI : 10.1109/MM.2008.31

R. A. Lorie and H. R. Strong, A SIMD data processing system, 1984.

J. Meng, D. Tarjan, and K. Skadron, Dynamic warp subdivision for integrated branch and memory divergence tolerance, ACM SIGARCH Computer Architecture News, vol.38, issue.3, pp.235-246, 2010.
DOI : 10.1145/1816038.1815992

. Mesa-gallium3d-nvfx-graphics-driver-manuel-de-logiciel, Source code, 2011.

J. Nickolls and W. J. Dally, The GPU Computing Era, IEEE Micro, vol.30, issue.2, pp.56-69, 2010.
DOI : 10.1109/MM.2010.41

M. J. Quinn, P. J. Hatcher, and K. C. Jourdenais, Compiling C* programs for a hypercube multicomputer, ACM SIGPLAN Notices, vol.23, issue.9, pp.57-65, 1988.
DOI : 10.1145/62116.62122

Y. Takahashi, A mechanism for SIMD execution of SPMD programs, Proceedings High Performance Computing on the Information Superhighway. HPC Asia '97, pp.529-534, 1997.
DOI : 10.1109/HPC.1997.592203

F. Zhang, H. D. Hollander, and E. , Using hammock graphs to structure programs, IEEE Transactions on Software Engineering, vol.30, issue.4, pp.231-245, 2004.
DOI : 10.1109/TSE.2004.1274043