, Curabitur dictum gravida mauris. Nam arcu libero, nonummy eget, consectetuer id, vulputate a, magna. Donec vehicula augue eu neque. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra metus rhoncus sem. Nulla et lectus vestibulum urna fringilla ultrices. Phasellus eu tellus sit amet tortor gravida placerat. Integer sapien est, iaculis in, pretium quis, viverra ac, nunc. Praesent eget sem vel leo ultrices bibendum. Aenean faucibus. Morbi dolor nulla, malesuada eu, pulvinar at, mollis ac, nulla. Curabitur auctor semper nulla. Donec varius orci eget risus, Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis

M. Abadi, M. Isard, and D. G. Murray, A computational model for tensorflow: an introduction, Proceedings of the 1st ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, pp.1-7, 2017.

J. R. Allen and K. Kennedy, Automatic loop interchange, Proceedings of the 1984 SIGPLAN Symposium on Compiler Construction, SIGPLAN '84, pp.233-246, 1984.

R. Allen and K. Kennedy, Automatic translation of fortran programs to vector form, ACM Trans. Program. Lang. Syst, vol.9, issue.4, pp.491-542, 1987.

J. Ansel, S. Kamil, K. Veeramachaneni, J. Ragan-kelley, J. Bosboom et al., Opentuner: An extensible framework for program autotuning, Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT '14, pp.303-316, 2014.

U. Bondhugula, A. Acharya, and A. Cohen, The Pluto+ Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests, ACM Transactions on Programming Languages and Systems, vol.38, issue.3, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01425546

. Baidu-research and . Deepbench,

C. Bastoul, Code Generation in the Polyhedral Model Is Easier Than You Think, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, PACT '04, pp.7-16, 2004.
URL : https://hal.archives-ouvertes.fr/hal-00017260

G. Somashekaracharya, U. Bhaskaracharya, and . Bondhugula, Polyglot: a polyhedral loop transformation framework for a graphical dataflow language, International Conference on Compiler Construction, pp.123-143, 2013.

. +-15]-riyadh, U. Baghdadi, A. Beaugnon, T. Cohen, M. Grosser et al., Pencil: A platform-neutral compute intermediate language for accelerator programming, Proc. Parallel Architectures and Compilation Techniques (PACT'15), 2015.

. +-08]-uday, M. Bondhugula, S. Baskaran, and . Krishnamoorthy, Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model, Compiler Construction, pp.132-146, 2008.

R. Baghdadi, A. Cohen, T. Grosser, S. Verdoolaege, and A. Lokhmotov, Javed Absar, Sven Van Haastregt, Alexey Kravets, and Alastair Donaldson. PENCIL Language Specification, INRIA, 2015.

U. Bondhugula, S. Dash, O. Gunluk, and L. Renganarayanan, A model for fusion and code motion in an automatic parallelizing compiler, Parallel Architectures and Compilation Techniques (PACT), 2010 19th International Conference on, pp.343-352, 2010.

U. Bondhugula, O. Gunluk, S. Dash, and L. Renganarayanan, A Model for Fusion and Code Motion in an Automatic Parallelizing Compiler, Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT '10, pp.343-352, 2010.

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, A Practical Automatic Polyhedral Parallelizer and Locality Optimizer, ACM SIGPLAN Notices, vol.43, issue.6, pp.101-113, 2008.

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, A practical automatic polyhedral parallelizer and locality optimizer, In ACM SIGPLAN conference on Programming Language Design and Implementation, vol.43, pp.101-113, 2008.

C. Chen, J. Chame, M. Hall-;-tianqi-chen, T. Moreau, Z. Jiang et al., Chill: A framework for composing high-level loop transformations, Tvm: End-to-end optimization stack for deep learning, 2008.

J. Steven, . Deitz, L. Bradford, L. Chamberlain, and . Snyder, High-level language support for user-defined reductions, The Journal of Supercomputing, vol.23, issue.1, pp.23-37, 2002.

J. Dean and S. Ghemawat, Mapreduce: Simplified data processing on large clusters, Commun. ACM, vol.51, issue.1, pp.107-113, 2008.

J. Doerfert, K. Streit, S. Hack, and Z. Benaissa, Polly's polyhedral scheduling in the presence of reductions, 2015.

P. Feautrier, Parametric Integer Programming. Revue française d'automatique, d'informatique et de recherche opérationnelle, vol.22, pp.243-268, 1988.

P. Feautrier, Dataflow Analysis of Array and Scalar References, International Journal of Parallel Programming, vol.20, issue.1, pp.23-53, 1991.

P. Feautrier, Some Efficient Solutions to the Affine Scheduling Problem. I. One-Dimensional Time, vol.21, pp.313-347, 1992.

P. Feautrier, Some Efficient Solutions to the Affine Scheduling Problem. Part II. Multidimensional Time, International Journal of Parallel Programming, vol.21, issue.6, pp.389-420, 1992.

M. Frigo, P. Halpern, C. E. Leiserson, and S. Lewin-berlin, Reducers and other cilk++ hyperobjects, Proceedings of the Twenty-first Annual Symposium on Parallelism in Algorithms and Architectures, SPAA '09, pp.79-90, 2009.

P. Feautrier and C. Lengauer, Polyhedron Model, Encyclopedia of Parallel Computing, pp.1581-1592, 2011.

T. Grosser, A. Groesslinger, and C. Lengauer, Polly -Performing Polyhedral Optimizations on a Low-Level Intermediate Representation, Parallel Processing Letters, vol.22, issue.04, p.1250010, 2012.

T. Grosser, A. Groesslinger, and C. Lengauer, Polly-performing polyhedral optimizations on a low-level intermediate representation, Parallel Processing Letters, vol.22, issue.04, p.1250010, 2012.

. Google, Tensor flow xla

G. Gupta and . Sanjay-v-rajopadhye, Simplifying reductions, POPL, vol.6, pp.30-41, 2006.

T. Henretty, R. Veras, F. Franchetti, L. Pouchet, J. Ramanujam et al., A stencil compiler for short-vector simd architectures, Proceedings of the 27th international ACM conference on International conference on supercomputing, pp.13-24, 2013.

A. Handa, T. Whelan, J. Mcdonald, and A. Davison, A benchmark for rgb-d visual odometry, 3d reconstruction and slam, Robotics and automation (ICRA), 2014 IEEE international conference on, pp.1524-1531, 2014.

, The ANSI C standard (C99), 1999.

F. Irigoin and R. Triolet, Supernode Partitioning, Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '88, pp.319-329, 1988.

A. Jangda and U. Bondhugula, An effective fusion and tile size model for optimizing image processing pipelines, Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.261-275

P. Jouvelot and B. Dehbonei, A unified semantic approach for the vectorization and parallelization of generalized reductions, Proceedings of the 3rd international conference on Supercomputing, pp.186-194, 1989.

P. Jouvelot, Parallelization by semantic detection of reductions, ESOP 86, pp.223-236, 1986.

K. Kennedy and J. R. Allen, Optimizing Compilers for Modern Architectures: A Dependence-Based Approach, 2002.

K. Kennedy and K. Mckinley, Maximizing loop parallelism and improving data locality via loop fusion and distribution, Languages and Compilers for Parallel Computing, pp.301-320, 1993.

W. Kelly and W. Pugh, A unifying framework for iteration reordering transformations, Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing, vol.1, pp.153-162, 1995.

M. Kong, R. Veras, K. Stock, F. Franchetti, L. Pouchet et al., When polyhedral transformations meet simd code generation, In ACM SIGPLAN Notices, vol.48, pp.127-138, 2013.

T. Li, M. Gharbi, A. Adams, F. Durand, and J. Ragan-kelley, Differentiable programming for image processing and deep learning in halide, ACM Transactions on Graphics (TOG), vol.18, issue.4, p.139, 2018.

M. Harris, Optimizing parallel reduction in cuda

A. Ravi-teja-mullapudi, D. Adams, J. Sharlet, K. Ragan-kelley, and . Fatahalian, Automatically scheduling halide image processing pipelines, ACM Transactions on Graphics (TOG), vol.35, issue.4, p.83, 2016.

. Microsoft,

. Mpif-mpif, Mpi-2: Extensions to the message-passing interface, 1996.

V. Ravi-teja-mullapudi, U. Vasista, and . Bondhugula, Polymage: Automatic optimization for image processing pipelines, In ACM SIGARCH Computer Architecture News, vol.43, pp.429-443, 2015.

L. Nardi, B. Bodin, M. Zia, J. Mawer, A. Nisbet et al., Introducing slambench, a performance and accuracy benchmarking methodology for, Mixed and augmented reality (ISMAR), 2011 10th IEEE international symposium on, pp.127-136, 2011.

. Nvidia, Cub's collective primitives

, Nvidia. Thrust c++ library

, Nvidia forum. Faster parallel reductions on kepler

, Openmp 3.0 specification

, OpenMP forum

L. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam et al., Combined iterative and modeldriven optimization in an automatic parallelization framework, Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp.549-562, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00551067

L. Pouchet, C. Bastoul, A. Cohen, and N. Vasilache, Iterative optimization in the polyhedral model: Part i, one-dimensional time, Code Generation and Optimization, 2007. CGO'07. International Symposium on, pp.179-197, 2006.
URL : https://hal.archives-ouvertes.fr/hal-01257281

A. Paszke, S. Chintala, R. Collobert, K. Kavukcuoglu, C. Farabet et al., Pytorch: Tensors and dynamic neural networks in python with strong gpu acceleration, 2017.

B. Pottenger and R. Eigenmann, Idiom recognition in the polaris parallelizing compiler, Proceedings of the 9th international conference on Supercomputing, pp.444-448, 1995.

S. Shlomit, R. Y. Pinter, and . Pinter, Program optimization and parallelization using idioms, ACM Transactions on Programming Languages and Systems (TOPLAS), vol.16, issue.3, pp.305-327, 1994.

W. Pugh and D. Wonnacott, Static Analysis of Upper and Lower Bounds on Dependences and Parallelism, ACM Trans. Program. Lang. Syst, vol.16, issue.4, pp.1248-1278, 1994.

W. Pugh and D. Wonnacott, Static analysis of upper and lower bounds on dependences and parallelism, ACM Transactions on Programming Languages and Systems (TOPLAS), vol.16, issue.4, pp.1248-1278, 1994.

J. Reinders, Intel Threading Building Blocks, 2007.

X. Redon and P. Feautrier, Detection of recurrences in sequential programs with loops, PARLE'93 Parallel Architectures and Languages Europe, pp.132-145, 1993.

X. Redon and P. Feautrier, Scheduling reductions, Proceedings of the 8th international conference on Supercomputing, pp.117-125, 1994.

X. Redon and P. Feautrier, Detection of scans, PARALLEL ALGORITHMS AND APPLICATION, vol.15, issue.3-4, pp.229-263, 2000.

C. +-13]-jonathan-ragan-kelley, A. Barnes, S. Adams, F. Paris, S. Durand et al., Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines, ACM SIGPLAN Notices, vol.48, issue.6, pp.519-530, 2013.

L. Rauchwerger, A. David, and . Padua, The lrpd test: Speculative run-time parallelization of loops with privatization and reduction parallelization. Parallel and Distributed Systems, IEEE Transactions on, vol.10, issue.2, pp.160-180, 1999.

V. Sarkar, Automatic Selection of High Order Transformations in the IBM XL Fortran Compilers, IBM J. Res. & Dev, vol.41, issue.3, 1997.

K. Stock, M. Kong, T. Grosser, L. Pouchet, F. Rastello et al., A framework for enhancing data reuse via associative reordering, In ACM SIGPLAN Notices, vol.49, pp.65-76, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01016093

T. Suganuma, H. Komatsu, and T. Nakatani, Detection and global optimization of reduction operations for distributed parallel machines, Proceedings of the 10th international conference on Supercomputing, pp.18-25, 1996.

E. Schweitz, R. Lethin, A. Leung, and B. Meister, R-stream: A parametric high level compiler, Proceedings of HPEC, 2006.

J. Shirako, L. N. Pouchet, and V. Sarkar, Oil and Water Can Mix: An Integration of Polyhedral and AST-Based Transformations, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, pp.287-298, 2014.

A. Sbîrlea, J. Shirako, L. Pouchet, and V. Sarkar, Graphite two years after: First lessons learned from real-world polyhedral compilation, International Workshop on Languages and Compilers for Parallel Computing, pp.57-72, 2010.

. The-portland-group, Pgi accelerator compilers with openacc directives

K. Trifunovic, D. Nuzman, A. Cohen, A. Zaks, and I. Rosen, Polyhedral-model guided loop-nest auto-vectorization, Parallel Architectures and Compilation Techniques, 2009. PACT'09. 18th International Conference on, pp.327-337, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00645325

S. Verdoolaege and A. Cohen, Live Range Reordering, 6th Workshop on Polyhedral Compilation Techniques (IMPACT, Associated with HiPEAC), 2016.
URL : https://hal.archives-ouvertes.fr/hal-01257224

. Vcjc-+-13a]-sven, J. C. Verdoolaege, A. Juega, J. I. Cohen, C. Gomez et al., Polyhedral parallel code generation for cuda, ACM Transactions on Architecture and Code Optimization (TACO), vol.9, issue.4, p.54, 2013.

. Vcjc-+-13b]-sven, J. C. Verdoolaege, A. Juega, and . Cohen, José Ignacio Gómez, Christian Tenllado, and Francky Catthoor. Polyhedral Parallel Code Generation for CUDA, vol.9, 2013.

S. Verdoolaege, Isl: An Integer Set Library for the Polyhedral Model, Mathematical Software -ICMS 2010, number 6327 in Lecture Notes in Computer Science, pp.299-302, 2010.

S. Verdoolaege, Counting Affine Calculator and Applications, First International Workshop on Polyhedral Compilation Techniques (IMPACT'11), 2011.

S. Verdoolaege, S. Guelton, T. Grosser, and A. Cohen, Schedule Trees, 4th Workshop on Polyhedral Compilation Techniques (IMPACT, Associated with HiPEAC), p.9, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00911894

S. Verdoolaege and G. Janssens, Scheduling for ppcg, Report CW, vol.706, 2017.

S. Verdoolaege, J. C. Juega, A. Cohen, J. I. Gómez, C. Tenllado et al., Polyhedral parallel code generation for CUDA, ACM Transactions on Architecture and Code Optimization (TACO), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00786677

N. Vasilache, B. Meister, M. Baskaran, and R. Lethin, Joint scheduling and layout optimization to enable multi-level vectorization, IMPACT-2: 2nd International Workshop on Polyhedral Compilation Techniques, 2012.

S. Verdoolaege, R. Seghir, K. Beyls, V. Loechner, and M. Bruynooghe, Counting integer points in parametric polytopes using barvinok's rational functions, Algorithmica, vol.48, issue.1, pp.37-66, 2007.

A. Venkat, M. Shantharam, M. Hall, M. M. Strout-;-nicolas-vasilache, O. Zinenko et al., Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions, Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, p.185, 2014.

M. Wolfe, Loop skewing: The wavefront method revisited, Int. J. Parallel Program, vol.15, issue.4, pp.279-293, 1986.

M. Wolfe, Iteration space tiling for memory hierarchies, Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing, pp.357-361, 1989.

M. Wolfe, High Performance Compilers for Parallel Computing, 1995.

S. Wienke, P. Springer, C. Terboven, and . Dieter-an-mey, Openacc-first experiences with real-world applications, Euro-Par 2012 Parallel Processing, pp.859-870, 2012.

N. Dana, S. Xu, Z. Khoo, and . Hu, Ptype system: A featherweight parallelizability detector, Programming Languages and Systems, pp.197-212

. Springer, , 2004.

Y. Zou and S. Rajopadhye, Scan detection and parallelization in inherently sequential nested loop programs, Proceedings of the Tenth International Symposium on Code Generation and Optimization, pp.74-83, 2012.