J. Mccarthy, Recursive functions symbolic expressions and their computation by machine, Part I, Communications of the ACM, vol.3, issue.4, pp.184-195, 1960.
DOI : 10.1145/367177.367199

L. Dagum and R. Menon, OpenMP: an industry standard API for shared-memory programming, IEEE Computational Science and Engineering, vol.5, issue.1, pp.46-55, 1998.
DOI : 10.1109/99.660313

A. Sukumaran-rajam, J. M. Martinez, W. Wolff, A. Jimborean, and P. Clauss, Speculative Program Parallelization with Scalable and Decentralized Runtime Verification, LNCS, vol.8734, pp.124-139, 2014.
DOI : 10.1007/978-3-319-11164-3_11
URL : https://hal.archives-ouvertes.fr/hal-01070610

J. A. Stratton, C. Rodrigues, I. Sung, N. Obeid, L. Chang et al., The Parboil technical report, tech. rep., IMPACT Technical Report, pp.12-13, 2012.

C. Bienia and K. Li, Parsec 2.0: A new benchmark suite for chipmultiprocessors, Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation, 2009.

S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer et al., Rodinia: A benchmark suite for heterogeneous computing, 2009 IEEE International Symposium on Workload Characterization (IISWC), pp.44-54, 2009.
DOI : 10.1109/IISWC.2009.5306797

H. L. Van-der-spek, E. M. Bakker, and H. A. Wijshoff, SPARK00: A benchmark package for the compiler evaluation of irregular/sparse codes, p.805, 2008.

L. N. Pouchet, Polybench: The polyhedral benchmark suite, 2010.

A. Sukumaran-rajam, L. E. Campostrini, M. J. Manuel, and P. Clauss, Speculative Runtime Parallelization of Loop Nests: Towards Greater Scope and Efficiency, 20th International Workshop on High-level Parallel Programming Models and Supportive Environments, held in conjunction with 29th IEEE International Parallel & Distributed Processing Symposium, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01155172

J. Cohen, P. Cohen, S. G. West, and L. S. Aiken, Applied Multiple Regression/- Correlation Analysis for the Behavioral Sciences, Routledge, 2002.

G. Fursin and O. Temam, Collective optimization, ACM Transactions on Architecture and Code Optimization, vol.7, issue.4, pp.1-2029, 2010.
DOI : 10.1145/1880043.1880047
URL : https://hal.archives-ouvertes.fr/inria-00445326

K. Barker, T. Benson, D. Campbell, D. Ediger, R. Gioiosa et al., PERFECT (Power Efficiency Revolution For Embedded Computing Technologies) Benchmark Suite Manual, Pacific Northwest National Laboratory and Georgia Tech Research Institute, 2013.

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, A practical automatic polyhedral parallelizer and locality optimizer, PLDI '08, 2008.

P. Feautrier and C. Lengauer, Polyhedron model, Encyclopedia of Parallel Computing, pp.1581-1592, 2011.

R. M. Karp, R. E. Miller, and S. Winograd, The Organization of Computations for Uniform Recurrence Equations, Journal of the ACM, vol.14, issue.3, pp.563-590, 1967.
DOI : 10.1145/321406.321418

P. Quinton, The systematic design of systolic arrays, Centre National De Recherche Scientifique on Automata Networks in Computer Science: Theory and Applications, pp.229-260, 1987.
URL : https://hal.archives-ouvertes.fr/inria-00076342

U. K. Banerjee, Dependence Analysis for Supercomputing, 1988.
DOI : 10.1007/978-1-4684-6894-6

X. Kong, D. Klappholz, and K. Psarris, The I test: an improved dependence test for automatic parallelization and vectorization, IEEE Transactions on Parallel and Distributed Systems, vol.2, issue.3, pp.342-349, 1991.
DOI : 10.1109/71.86109

W. Pugh, A practical algorithm for exact array dependence analysis, Communications of the ACM, vol.35, issue.8, pp.102-114, 1992.
DOI : 10.1145/135226.135233

M. Wolfe and C. W. Tseng, The power test for data dependence, IEEE Transactions on Parallel and Distributed Systems, vol.3, issue.5, pp.591-601, 1992.
DOI : 10.1109/71.159042

]. P. Feautrier, Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time, International Journal of Parallel Programming, vol.2, issue.4, 1992.
DOI : 10.1007/BF01379404

P. Feautrier, Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time, International Journal of Parallel Programming, vol.2, issue.4, pp.389-420, 1992.
DOI : 10.1007/BF01379404

P. Feautrier, Dataflow analysis of array and scalar references, International Journal of Parallel Programming, vol.24, issue.4, pp.23-53, 1991.
DOI : 10.1007/BF01407931

S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello et al., Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies, International Journal of Parallel Programming, vol.20, issue.1, pp.261-317, 2006.
DOI : 10.1007/s10766-006-0012-3
URL : https://hal.archives-ouvertes.fr/hal-01257288

D. A. Padua and M. J. Wolfe, Advanced compiler optimizations for supercomputers, Communications of the ACM, vol.29, issue.12, pp.1184-1201, 1986.
DOI : 10.1145/7902.7904

P. Feautrier, Array expansion, Proceedings of the 2nd International Conference on Supercomputing, ICS '88, pp.429-441, 1988.
URL : https://hal.archives-ouvertes.fr/hal-01099746

P. Feautrier, Dataflow analysis of array and scalar references, International Journal of Parallel Programming, vol.24, issue.4, 1991.
DOI : 10.1007/BF01407931

A. Cohen, S. Girbal, and O. Temam, A Polyhedral Approach to Ease the Composition of Program Transformations, Euro-Par 2004 Parallel Processing, pp.292-303, 2004.
DOI : 10.1007/978-3-540-27866-5_38
URL : https://hal.archives-ouvertes.fr/hal-01257301

C. Bastoul, A. Cohen, S. Girbal, S. Sharma, and O. Temam, Putting Polyhedral Loop Transformations to Work, LCPC'16 Intl. Workshop on Languages and Compilers for Parallel ComputersCollege Station), pp.209-225, 2003.
DOI : 10.1007/978-3-540-24644-2_14
URL : https://hal.archives-ouvertes.fr/inria-00071681

. Pluto, An automatic parallelizer and locality optimizer for multicores

P. Feautrier, Toward automatic partitioning of arrays on distributed memory computers, Proceedings of the 7th international conference on Supercomputing , ICS '93, pp.175-184, 1993.
DOI : 10.1145/165939.165968

P. Feautrier, Parametric integer programming, RAIRO - Operations Research, vol.22, issue.3, pp.243-268, 1988.
DOI : 10.1051/ro/1988220302431
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.9957

P. Clauss and V. Loechner, Parametric analysis of polyhedral iteration spaces, Proceedings of International Conference on Application Specific Systems, Architectures and Processors: ASAP '96, pp.179-194, 1998.
DOI : 10.1109/ASAP.1996.542833
URL : https://hal.archives-ouvertes.fr/inria-00534840

A. I. Barvinok, Computing the volume, counting integral points, and exponential sums, Discrete & Computational Geometry, vol.21, issue.4, pp.123-141, 1993.
DOI : 10.1007/BF02573970

S. Verdoolaege, isl: An Integer Set Library for the Polyhedral Model, Mathematical Software -ICMS 2010, pp.299-302, 2010.
DOI : 10.1007/978-3-642-15582-6_49

C. Bastoul, Extracting polyhedral representation from high level languages, tech. rep, 2008.

C. Bastoul, Openscop: A specification and a library for data exchange in polyhedral compilation tools, tech. rep, 2011.

C. Bastoul, Contributions to High-Level Program Optimization, 2012.

I. Fassi and P. Clauss, XFOR: Filling the Gap between Automatic Loop Optimization and Peak Performance, 2015 14th International Symposium on Parallel and Distributed Computing, 2015.
DOI : 10.1109/ISPDC.2015.19
URL : https://hal.archives-ouvertes.fr/hal-01155144

P. Clauss, I. Fassi, and A. Jimborean, Software-controlled processor stalls for time and energy efficient data locality optimization, 2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), pp.199-206, 2014.
DOI : 10.1109/SAMOS.2014.6893212
URL : https://hal.archives-ouvertes.fr/hal-01003228

R. T. Mullapudi, V. Vasista, and U. Bondhugula, PolyMage, ACM SIGARCH Computer Architecture News, vol.43, issue.1, pp.429-443, 2015.
DOI : 10.1145/2786763.2694364

C. Bastoul, Generating loops for scanning polyhedra, Tech. Rep, vol.23, 2002.

]. T. Grosser, A. Größlinger, and C. Lengauer, POLLY ??? PERFORMING POLYHEDRAL OPTIMIZATIONS ON A LOW-LEVEL INTERMEDIATE REPRESENTATION, Parallel Processing Letters, 2012.
DOI : 10.1142/S0129626412500107

S. Pop, G. Silber, A. Cohen, C. Bastoul, S. Girbal et al., GRAPHITE: Polyhedral analyses and optimizations for GCC, Contribution to the GNU Compilers Collection Developers Summit, 2006.

M. Griebl and C. Lengauer, The loop parallelizer loopo, Proc. Sixth Workshop on Compilers for Parallel Computers, pp.311-320, 1996.

L. Pouchet, C. Bastoul, and A. Cohen, LetSee: the LEgal Transformation SpacE Explorator, Third International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems (ACACES'07), L'Aquila, Italia, pp.247-251, 2007.

C. Chen, J. Chame, and M. Hall, Chill: A framework for composing high-level loop transformations, 2008.

S. Verdoolaege, J. C. Juega, A. Cohen, J. I. Gómez, C. Tenllado et al., Polyhedral parallel code generation for CUDA, ACM Transactions on Architecture and Code Optimization, vol.9, issue.4, pp.1-5423, 2013.
DOI : 10.1145/2400682.2400713
URL : https://hal.archives-ouvertes.fr/hal-00786677

R. Lethin, P. Mattson, E. Shweitz, A. Leung, V. Litvinov et al., R-stream 3.0: Technologies for high level embedded application mapping, Proceedings of the 8th Annual High Performance Embedded Computing (HPEC) Workshops, 2004.

E. Schweitz, R. Lethin, A. Leung, and B. Meister, A parametric high level compiler, Proceedings of the High Performance Embedded Computing Workshop (HPEC), 2006.

J. Dollinger and V. Loechner, Adaptive Runtime Selection for GPU, 2013 42nd International Conference on Parallel Processing, pp.70-79, 2013.
DOI : 10.1109/ICPP.2013.16
URL : https://hal.archives-ouvertes.fr/hal-00869652

J. Dollinger and V. Loechner, CPU+GPU Load Balance Guided by Execution Time Prediction, Fifth International Workshop on Polyhedral Compilation Techniques, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01095890

J. F. Martínez and J. Torrellas, Speculative synchronization: Applying threadlevel speculation to explicitly parallel applications, Proceedings of the Tenth Symposium on Architectural Support for Programming Languages and Operating Systems, 2002.

P. Zhou, F. Qin, W. Liu, Y. Zhou, and J. Torrellas, iWatcher, 31st Int. Symp. on Computer Architecture (ISCA), pp.224-237, 2004.
DOI : 10.1145/1028176.1006720

M. Prvulovic and J. Torrellas, ReEnact, ACM SIGARCH Computer Architecture News, vol.31, issue.2, pp.110-121, 2003.
DOI : 10.1145/871656.859632

. Intel, Intel® 64 and ia-32 architectures software developer's manual https:// www-ssl.intel.com/content/dam-ia-32-architectures-software-developer-instruction-set\ -reference-manual-325383, 2015.

R. Haring, M. Ohmacht, T. Fox, M. Gschwind, D. Satterfield et al., The IBM Blue Gene/Q Compute Chip, IEEE Micro, vol.32, issue.2, pp.48-60, 2012.
DOI : 10.1109/MM.2011.108

S. Chaudhry, R. Cypher, M. Ekman, M. Karlsson, A. Landin et al., Rock: A High-Performance Sparc CMT Processor, IEEE Micro, vol.29, issue.2, pp.6-16, 2009.
DOI : 10.1109/MM.2009.34

J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry, A scalable approach to thread-level speculation, ACM SIGARCH Computer Architecture News, vol.28, issue.2, pp.1-12, 2000.
DOI : 10.1145/342001.339650

J. Renau, K. Strauss, L. Ceze, W. Liu, S. Sarangi et al., Thread-Level Speculation on a CMP can be energy efficient, Proceedings of the 19th annual international conference on Supercomputing , ICS '05, pp.219-228, 2005.
DOI : 10.1145/1088149.1088178

J. G. Steffan, C. Colohan, A. Zhai, and T. C. Mowry, The STAMPede approach to thread-level speculation, ACM Transactions on Computer Systems, vol.23, issue.3, pp.253-300, 2005.
DOI : 10.1145/1082469.1082471

L. Rauchwerger, N. M. Amato, and D. A. Padua, A scalable method for run-time loop parallelization, International Journal of Parallel Programming, vol.4, issue.1, pp.26-32, 1995.
DOI : 10.1007/BF02577866

A. Barreto, P. Dragojevic, R. Ferreira, R. Filipe, and . Guerraoui, Unifying Thread-Level Speculation and Transactional Memory, Proceedings of the 13th International Middleware Conference, pp.187-207, 2012.
DOI : 10.1109/SRDS.2011.16

L. Rauchwerger and D. Padua, The LRPD Test: Speculative Run-time Parallelization of Loops with Privatization and Reduction Parallelization, Proceedings of the ACM SIGPLAN 1995 Conference on Programming Language Design and Implementation, PLDI '95, pp.218-232, 1995.

F. Dang, H. Yu, and L. Rauchwerger, The R-LRPD test: speculative parallelization of partially parallel loops, Proceedings 16th International Parallel and Distributed Processing Symposium, 2002.
DOI : 10.1109/IPDPS.2002.1015493

A. Jimborean, Adapting the polytope model for dynamic and speculative parallelization, 2012.
URL : https://hal.archives-ouvertes.fr/tel-00733850

A. Jimborean, L. Mastrangelo, V. Loechner, and P. Clauss, VMAD: An Advanced Dynamic Program Analysis and Instrumentation Framework, Proceedings of the 21st International Conference on Compiler Construction, pp.220-239, 2012.
DOI : 10.1007/978-3-642-28652-0_12

S. Aldea, D. Llanos, and A. González-escribano, Support for Thread-Level Speculation into OpenMP, OpenMP in a Heterogeneous World, pp.275-278, 2012.
DOI : 10.1007/978-3-642-30961-8_25

A. Jimborean, P. Clauss, J. M. Martinez, and A. Sukumaran-rajam, Online Dynamic Dependence Analysis for Speculative Polyhedral Parallelization, Euro-Par 2013, pp.191-202, 2013.
DOI : 10.1007/978-3-642-40047-6_21
URL : https://hal.archives-ouvertes.fr/hal-00825744

W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss et al., POSH, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming , PPoPP '06, 2006.
DOI : 10.1145/1122971.1122997

T. A. Johnson, R. Eigenmann, and T. N. Vijaykumar, Speculative thread decomposition through empirical optimization, Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming , PPoPP '07, 2007.
DOI : 10.1145/1229428.1229474

E. Raman, N. Vachharajani, R. Rangan, and D. I. , Spice, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization , CGO '08, 2008.
DOI : 10.1145/1356058.1356082

M. Ravishankar, J. Eisenlohr, L. Pouchet, J. Ramanujam, A. Rountev et al., Code generation for parallel execution of a class of irregular loops on distributed memory systems, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, p.12, 2012.
DOI : 10.1109/SC.2012.30

E. D. Berger and B. G. Zorn, DieHard: probabilistic memory safety for unsafe languages, PLDI '06, pp.158-168, 2006.

G. Novark and E. D. Berger, DieHarder, Proceedings of the 17th ACM conference on Computer and communications security, CCS '10, pp.573-584, 2010.
DOI : 10.1145/1866307.1866371

M. Griebl and J. Collard, Generation of synchronous code for automatic parallelization of while loops, Euro-Par '95 Parallel Processing, First International Euro-Par Conference Proceedings, pp.315-326, 1995.
DOI : 10.1007/BFb0020474

J. Collard, Automatic parallelization ofwhile-loops using speculative execution, International Journal of Parallel Programming, vol.634, issue.1, pp.191-219, 1995.
DOI : 10.1007/BF02577789

S. J. Geuns, M. J. Bekooij, T. Bijlsma, and H. Corporaal, Parallelization of while loops in nested loop programs for shared-memory multiprocessor systems, 2011 Design, Automation & Test in Europe, pp.1-6, 2011.
DOI : 10.1109/DATE.2011.5763118

C. Lengauer and M. Griebl, On the parallelization of loop nests containing while loops, Proceedings the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis, 1994.
DOI : 10.1109/AISPAS.1995.401360

J. Collard, D. Barthou, and P. Feautrier, Fuzzy Array Dataflow Analysis, Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP '95, pp.92-101, 1995.

A. Venkat, M. Shantharam, M. Hall, and M. M. Strout, Non-affine Extensions to Polyhedral Code Generation, Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '14, pp.185185-185194, 2014.
DOI : 10.1145/2581122.2544141