G. Blelloch and S. Chatterjee, Vcode: a data-parallel intermediate language, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation, pp.471-480, 1990.
DOI : 10.1109/FMPC.1990.89498

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

S. Boudier, Memory system on Fusion APUs, 2011.

L. Bougé and J. Levaire, Control structures for data-parallel SIMD languages: semantics and implementation, Future Generation Computer Systems, vol.8, issue.4, pp.363-378, 1992.
DOI : 10.1016/0167-739X(92)90069-N

. Bouknight, . Denenberg, . Mcintyre, . Randall, and S. Sameh, The Illiac IV system, Proceedings of the IEEE, pp.369-388, 1972.
DOI : 10.1109/PROC.1972.8647

P. Briggs, K. D. Cooper, and L. Torczon, Rematerialization, PLDI, pp.311-321, 1992.
DOI : 10.1145/143095.143143

K. Brockmann and R. Wanka, Efficient oblivious parallel sorting on the MasPar MP-1, Proceedings of the Thirtieth Hawaii International Conference on System Sciences, p.200, 1997.
DOI : 10.1109/HICSS.1997.667215

Z. Budimlic, K. D. Cooper, T. J. Harvey, K. Kennedy, T. S. Oberg et al., Fast copy coalescing and live-range identification, ACM SIGPLAN Notices, vol.37, issue.5, pp.25-32, 2002.
DOI : 10.1145/543552.512534

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

D. Callahan, K. D. Cooper, K. Kennedy, and L. Torczon, Interprocedural constant propagation, ACM SIGPLAN Notices, vol.21, issue.7, pp.152-161, 1986.
DOI : 10.1145/13310.13327

S. Carrillo, J. Siegel, L. , and X. , A control-structure splitting optimization for GPGPU, Proceedings of the 6th ACM conference on Computing frontiers, CF '09, pp.147-150, 2009.
DOI : 10.1145/1531743.1531766

D. Cederman and P. Tsigas, GPU-Quicksort, Journal of Experimental Algorithmics, vol.14, pp.4-5, 2010.
DOI : 10.1145/1498698.1564500

G. J. Chaitin, Register allocation & spilling via graph coloring, Proceedings of the 1982 SIGPLAN symposium on Compiler construction, SIGPLAN '82, pp.98-105, 1982.
DOI : 10.1145/872726.806984

S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer et al., Rodinia: A benchmark suite for heterogeneous computing, 2009 IEEE International Symposium on Workload Characterization (IISWC), pp.44-54, 2009.
DOI : 10.1109/IISWC.2009.5306797

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

J. Choi, R. Cytron, and J. Ferrante, Automatic construction of sparse data flow evaluation graphs, Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages , POPL '91, pp.55-66, 1991.
DOI : 10.1145/99583.99594

S. Collange, D. Defour, and Y. Zhang, Dynamic Detection of Uniform and Affine Vectors in GPGPU Computations, Proceedings of the 2009 international conference on Parallel processing, Euro-Par'09, pp.46-55, 2010.
DOI : 10.1007/978-3-642-14122-5_8

URL : https://hal.archives-ouvertes.fr/hal-00396719

P. Cousot and R. Cousot, Abstract interpretation, Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages , POPL '77, pp.238-252, 1977.
DOI : 10.1145/512950.512973

URL : https://hal.archives-ouvertes.fr/inria-00528590

B. Coutinho, D. Sampaio, F. M. Pereira, M. Jr, and W. , Performance Debugging of GPGPU Applications with the Divergence Map, 2010 22nd International Symposium on Computer Architecture and High Performance Computing, pp.33-40, 2010.
DOI : 10.1109/SBAC-PAD.2010.38

B. Coutinho, D. Sampaio, F. M. Pereira, M. Jr, and W. , Divergence Analysis and Optimizations, 2011 International Conference on Parallel Architectures and Compilation Techniques, pp.320-329, 2011.
DOI : 10.1109/PACT.2011.63

R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck, Efficiently computing static single assignment form and the control dependence graph, ACM Transactions on Programming Languages and Systems, vol.13, issue.4, 1991.
DOI : 10.1145/115372.115320

F. Darema, D. A. George, V. A. Norton, and G. F. Pfister, A single-programmultiple-data computational model for epex/fortran, Parallel Computing, pp.11-24, 1988.
DOI : 10.1016/0167-8191(88)90094-4

G. Diamos, B. Ashbaugh, S. Maiyuran, A. Kerr, H. Wu et al., SIMD re-convergence at thread frontiers, Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44 '11, pp.477-488, 2011.
DOI : 10.1145/2155620.2155676

G. F. Diamos, A. R. Kerr, S. Yalamanchili, C. , and N. , Ocelot, Proceedings of the 19th international conference on Parallel architectures and compilation techniques, PACT '10, pp.353-364, 2010.
DOI : 10.1145/1854273.1854318

C. A. Farrell and D. H. Kieronska, Formal specification of parallel SIMD execution, Theoretical Computer Science, vol.169, issue.1, pp.39-65, 1996.
DOI : 10.1016/S0304-3975(96)00113-2

URL : https://doi.org/10.1016/s0304-3975(96)00113-2

J. Ferrante, K. J. Ottenstein, W. , and J. D. , The program dependence graph and its use in optimization, ACM Transactions on Programming Languages and Systems, vol.9, issue.3, pp.319-349, 1987.
DOI : 10.1145/24039.24041

M. J. Flynn, Some Computer Organizations and Their Effectiveness, IEEE Transactions on Computers, vol.21, issue.9, pp.948-960, 1972.
DOI : 10.1109/TC.1972.5009071

W. W. Fung, I. Sham, G. Yuan, and T. M. Aamodt, Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007), pp.407-420, 2007.
DOI : 10.1109/MICRO.2007.30

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

M. Garland and D. B. Kirk, Understanding throughput-oriented architectures, Communications of the ACM, vol.53, issue.11, pp.58-66, 2010.
DOI : 10.1145/1839676.1839694

URL : http://dl.acm.org/ft_gateway.cfm?id=1839694&type=pdf

A. Habermaier and A. Knapp, On the Correctness of the SIMT Execution Model of GPUs, ESOP, pp.316-335, 2012.
DOI : 10.1007/978-3-642-28869-2_16

S. Hack and G. Goos, Optimal register allocation for SSA-form programs in polynomial time, Information Processing Letters, vol.98, issue.4, pp.150-155, 2006.
DOI : 10.1016/j.ipl.2006.01.008

T. D. Han and T. S. Abdelrahman, Reducing branch divergence in GPU programs, Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, pp.1-3, 2011.
DOI : 10.1145/1964179.1964184

J. L. Hennessy and D. A. Patterson, Computer Architecture, Fourth Edition: A Quantitative Approach, 2006.

W. D. Hillis, J. Steele, and G. L. , Data parallel algorithms, Communications of the ACM, vol.29, issue.12, pp.1170-1183, 1986.
DOI : 10.1145/7902.7903

P. Hoogvorst, R. Keryell, P. Matherat, P. , and N. , Pomp or how to design a massively parallel machine with small developments, PARLE (1), pp.83-100, 1991.
URL : https://hal.archives-ouvertes.fr/hal-01166357

B. Jang, D. Schaa, P. Mistry, and D. Kaeli, Static memory access pattern analysis on a massively parallel gpu, 2010.

R. Karrenberg and S. Hack, Whole-function vectorization, Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pp.141-150, 2011.
DOI : 10.1007/978-3-658-10113-8_6

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

A. Kerr, G. Diamos, Y. , and S. , Dynamic compilation of dataparallel kernels for vector processors, Proceedings of the Tenth International Symposium on Code Generation and Optimization, CGO '12, pp.23-32, 2012.

G. Khronos, The OpenCL Specification, 2011.

S. Kung, K. S. Arun, R. J. Gal-ezer, B. Rao, and D. V. , Wavefront array processor: Language, architecture, and applications, IEEE Trans. Comput, issue.11, pp.311054-1066, 1982.

A. Lashgar and A. Baniasadi, Performance in gpu architectures: Potentials and distances, 9th Annual Workshop on Duplicating, Deconstructing, and Debunking (WDDD11), in conjunction with ISCA-38, pp.75-81, 2011.

D. H. Lawrie, T. Layman, D. Baer, R. , and J. M. , Glypnir---a programming language for Illiac IV, Communications of the ACM, vol.18, issue.3, pp.157-164, 1975.
DOI : 10.1145/360680.360687

S. Lee, S. Min, and R. Eigenmann, OpenMP to GPGPU, ACM SIGPLAN Notices, vol.44, issue.4, pp.101-110, 2009.
DOI : 10.1145/1594835.1504194

Y. Lee, R. Avizienis, A. Bishara, R. Xia, D. Lockhart et al., Exploring the tradeoffs between programmability and efficiency in dataparallel accelerators, SIGARCH Comput. Archit. News, issue.3, pp.39129-140, 2011.

R. Leissa, S. Hack, W. , and I. , Extending a c-like language for portable simd programming, SIGPLAN Not, issue.8, pp.4765-74, 2012.

J. Meng, D. Tarjan, and K. Skadron, Dynamic warp subdivision for integrated branch and memory divergence tolerance, ACM SIGARCH Computer Architecture News, vol.38, issue.3, pp.235-246, 2010.
DOI : 10.1145/1816038.1815992

S. Mu, X. Zhang, N. Zhang, J. Lu, Y. S. Deng et al., Ip routing processing with graphic processors, Proceedings of the Conference on Design, Automation and Test in Europe, DATE '10, pp.93-98, 2010.

V. Narasiman, M. Shebanow, C. J. Lee, R. Miftakhutdinov, O. Mutlu et al., Improving GPU performance via large warps and two-level warp scheduling, Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44 '11, pp.308-317, 2011.
DOI : 10.1145/2155620.2155656

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

J. Nickolls and W. J. Dally, The GPU Computing Era, IEEE Micro, vol.30, issue.2, pp.56-69, 2010.
DOI : 10.1109/MM.2010.41

F. Nielson, H. R. Nielson, and C. Hankin, Principles of Program Analysis, 1999.
DOI : 10.1007/978-3-662-03811-6

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

N. , C. , K. J. Ballance, R. A. Maccabe, and A. B. , NVIDIA CUDA C -Programming Guide The program dependence web: a representation supporting control-, data-, and demand-driven interpretation of imperative languages, Bibliography Ottenstein SIGPLAN Not, vol.25, issue.6, pp.257-271, 1990.

D. A. Patterson and J. L. Hennessy, Computer Organization and Design -The Hardware / Software Interface (Revised 4th Edition) The Morgan Kaufmann Series in Computer Architecture and Design, 2012.

R. H. Perrott, A Language for Array and Vector Processors, ACM Transactions on Programming Languages and Systems, vol.1, issue.2, pp.177-195, 1979.
DOI : 10.1145/357073.357075

M. Pharr and W. Mark, ispc: A SPMD compiler for high-performance CPU programming, 2012 Innovative Parallel Computing (InPar), 2012.
DOI : 10.1109/InPar.2012.6339601

M. Poletto and V. Sarkar, Linear scan register allocation, ACM Transactions on Programming Languages and Systems, vol.21, issue.5, pp.895-913, 1999.
DOI : 10.1145/330249.330250

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

T. Prabhu, S. Ramalingam, M. Might, and M. Hall, EigenCFA, ACM SIGPLAN Notices, vol.46, issue.1, pp.511-522, 2011.
DOI : 10.1145/1925844.1926445

T. G. Rogers, M. O-'connor, and T. M. Aamodt, Cache-Conscious Wavefront Scheduling, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012.
DOI : 10.1109/MICRO.2012.16

S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk et al., Optimization principles and application performance evaluation of a multithreaded GPU using CUDA, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming , PPoPP '08, pp.73-82, 2008.
DOI : 10.1145/1345206.1345220

B. Saha, X. Zhou, H. Chen, Y. Gao, S. Yan et al., Programming model for a heterogeneous x86 platform, ACM SIGPLAN Notices, vol.44, issue.6, pp.431-440, 2009.
DOI : 10.1145/1543135.1542525

M. Samadi, A. Hormati, M. Mehrara, J. Lee, and S. Mahlke, Adaptive input-aware compilation for graphics engines, ACM SIGPLAN Notices, vol.47, issue.6, pp.13-22, 2012.
DOI : 10.1145/1806596.1806606

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

E. F. Sandes and A. C. De-melo, CUDAlign, ACM SIGPLAN Notices, vol.45, issue.5, pp.137-146, 2010.
DOI : 10.1145/1837853.1693473

J. A. Stratton, V. Grover, J. Marathe, B. Aarts, M. Murphy et al., Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs, Proceedings of the 8th annual IEEE/ ACM international symposium on Code generation and optimization, CGO '10, pp.111-119, 2010.
DOI : 10.1145/1772954.1772971

P. Tu and D. Padua, Efficient building and placing of gating functions. SIG- PLAN Not, pp.47-55, 1995.
DOI : 10.1145/207110.207115

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

M. Weiser, Program Slicing, Proceedings of the 5th international conference on Software engineering, ICSE '81, pp.439-449, 1981.
DOI : 10.1109/TSE.1984.5010248

Y. Yang, P. Xiang, J. Kong, and H. Zhou, A GPGPU compiler for memory optimization and parallelism management, ACM SIGPLAN Notices, vol.45, issue.6, pp.86-97, 2010.
DOI : 10.1145/1809028.1806606

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

E. Z. Zhang, Y. Jiang, Z. Guo, and X. Shen, Streamlining GPU applications on the fly, Proceedings of the 24th ACM International Conference on Supercomputing, ICS '10, pp.115-126, 2010.
DOI : 10.1145/1810085.1810104

E. Z. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen, On-the-fly elimination of dynamic irregularities for GPU computing, ACM SIGARCH Computer Architecture News, vol.39, issue.1, pp.369-380, 2011.
DOI : 10.1145/1961295.1950408

Y. Zhang and J. D. Owens, A quantitative performance analysis model for GPU architectures, 2011 IEEE 17th International Symposium on High Performance Computer Architecture, pp.382-393, 2011.
DOI : 10.1109/HPCA.2011.5749745

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=