#. Inputs and P. , s_1 >= 0 and s_2 >= 0 }; Domain := [M

#. Set, TiledWrite)); Out := Coalesce.TiledWrite

C. Alias, A. Darte, and A. Plesco, Optimizing DDR-SDRAM communications at C-level for automatically-generated hardware accelerators an experience with the Altera C2H HLS tool, ASAP 2010, 21st IEEE International Conference on Application-specific Systems, Architectures and Processors, pp.329-332, 2010.
DOI : 10.1109/ASAP.2010.5540967

URL : https://hal.archives-ouvertes.fr/inria-00482035

C. Alias, A. Darte, and A. Plesco, Kernel offloading with optimized remote accesses, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00611179

C. Alias, A. Darte, and A. Plesco, Optimizing remote accesses for offloaded kernels: Application to HLS for FPGA, Design, Automation and Test in Europe (DATE'13), pp.575-580, 2013.

U. Muthu-manikandan-baskaran, S. Bondhugula, J. Krishnamoorthy, A. Ramanujam, P. Rountev et al., Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories, 13th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP'08), pp.1-10, 2008.

N. Muthu-manikandan-baskaran, B. Vasilache, R. Meister, and . Lethin, Automatic communication optimizations through memory reuse strategies, 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'12), pp.277-278, 2012.

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, A practical automatic polyhedral parallelizer and locality optimizer, ACM International Conference on Programming Languages Design and Implementation (PLDI'08), pp.101-113, 2008.
DOI : 10.1145/1375581.1375595

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.151.5126

S. Boppu, F. Hannig, and J. Teich, Loop program mapping and compact code generation for programmable hardware accelerators, 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors, pp.10-17, 2013.
DOI : 10.1109/ASAP.2013.6567544

M. Bourgoin, E. Chailloux, and J. L. Lamotte, Efficient Abstractions for GPGPU Programming, International Journal of Parallel Programming, vol.34, issue.5, pp.583-600, 2014.
DOI : 10.1007/s10766-013-0261-x

URL : https://hal.archives-ouvertes.fr/hal-01216144

B. Creusillet and F. Irigoin, Interprocedural array region analyses, International Workshop on Languages and Compilers for Parallel Computing (LCPC'96), pp.46-60, 1996.
URL : https://hal.archives-ouvertes.fr/hal-00752611

A. Darte and A. Isoard, Parametric tiling with inter-tile data reuse, 4th International Workshop on Polyhedral Compilation Techniques (IMPACT'14), 2014.
URL : https://hal.archives-ouvertes.fr/hal-00915831

A. Darte and A. Isoard, Exact and Approximated Data-Reuse Optimizations for Tiling with Parametric Sizes, 24th International Conference on Compiler Construction (CC'15), part of ETAPS'15, 2015.
DOI : 10.1007/978-3-662-46663-6_8

URL : https://hal.archives-ouvertes.fr/hal-01099017

A. Darte, R. Schreiber, and G. Villard, Lattice-Based Memory Allocation, IEEE Transactions on Computers, vol.54, issue.10, pp.1242-1257, 2005.
DOI : 10.1109/TC.2005.167

URL : https://hal.archives-ouvertes.fr/hal-01272969

P. Feautrier, Parametric integer programming Corresponding software tool PIP, RAIRO Recherche Opérationnelle, vol.22, issue.3, pp.243-268, 1988.
DOI : 10.1051/ro/1988220302431

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.9957

P. Feautrier and C. Lengauer, The polyhedron model, Encyclopedia of Parallel Programming, 2011.

I. Georgios, M. Goumas, N. Athanasaki, and . Koziris, An efficient code generation technique for tiled iteration spaces, IEEE Transactions on Parallel and Distributed Systems, vol.14, issue.10, pp.1021-1034, 2003.

A. Größlinger, Precise Management of Scratchpad Memories for Localising Array Accesses in Scientific Codes, 18th International Conference on Compiler Construction (CC'09), pp.236-250, 2009.
DOI : 10.1007/978-3-642-00722-4_17

S. Guelton, M. Amini, and B. Creusillet, Beyond Do Loops: Data Transfer Generation with Convex Array Regions
DOI : 10.1007/978-3-642-37658-0_17

URL : https://hal.archives-ouvertes.fr/hal-00742583

S. Guelton, R. Keryell, and F. Irigoin, Compilation pour cible hétérogènes: automatisation des analyses, transformations et décisions nécessaires, 20ème Rencontres Françaises du Parallélisme (Renpar'11), 2011.

A. Hartono, J. Muthu-manikandan-baskaran, P. Ramanujam, and . Sadayappan, DynTile: Parametric tiled loop generation for parallel execution on multicore processors, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp.1-12, 2010.
DOI : 10.1109/IPDPS.2010.5470459

F. Irigoin and R. Triolet, Supernode partitioning, Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages , POPL '88, pp.319-329, 1988.
DOI : 10.1145/73560.73588

I. Issenin, E. Borckmeyer, M. Miranda, and N. Dutt, DRDU, ACM Transactions on Design Automation of Electronic Systems, vol.12, issue.2, 2007.
DOI : 10.1145/1230800.1230807

M. Kandemir, I. Kadayif, A. Choudhary, J. Ramanujam, and I. Kolcu, Compiler-directed scratch pad memory optimization for embedded multiprocessors, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.12, issue.3, pp.281-287, 2004.
DOI : 10.1109/TVLSI.2004.824299

J. Kim, H. Kim, J. Hwan-lee, and J. Lee, Achieving a single compute device image in OpenCL for multiple GPUs, 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP'11), pp.277-288, 2011.

S. Lee and R. Eigenmann, OpenMPC: Extended OpenMP programming and tuning for GPUs, ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC'10), pp.1-11, 2010.

V. Lefebvre and P. Feautrier, Automatic storage management for parallel programs, Parallel Computing, vol.24, issue.3-4, pp.649-671, 1998.
DOI : 10.1016/S0167-8191(98)00029-5

S. Pai, R. Govindarajan, and M. J. Thazhuthaveetil, Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme, Proceedings of the 21st international conference on Parallel architectures and compilation techniques, PACT '12, pp.33-42, 2012.
DOI : 10.1145/2370816.2370824

L. Pouchet, P. Zhang, P. Sadayappan, and J. Cong, Polyhedralbased data reuse optimization for configurable computing, ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA'13), pp.29-38, 2013.
DOI : 10.1145/2435264.2435273

URL : http://cadlab.cs.ucla.edu/~cong/papers/fpga13_2.pdf

L. Pouchet, PolyBench/C, the polyhedral benchmark suite

L. Renganarayanan, D. Kim, S. V. Rajopadhye, and M. M. Strout, Parameterized tiled loops for free, ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI'07), pp.405-414, 2007.
DOI : 10.1145/1273442.1250780

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.125.4572

C. Ramakrishnaupadrasta, Sub-polyhedral scheduling using (unit-) two-variable-per-inequality polyhedra, The 40th Annual ACM SIGPLAN- SIGACT Symposium on Principles of Programming Languages (POPL'13), pp.483-496, 2013.

S. Verdoolaege, isl: An Integer Set Library for the Polyhedral Model, Mathematical Software -ICMS 2010, pp.299-302, 2010.
DOI : 10.1007/978-3-642-15582-6_49

S. Verdoolaege, Counting affine calculator and applications, 1st International Workshop on Polyhedral Compilation Techniques (IMPACT'11), 2011.

S. Verdoolaege, J. C. Juega, A. Cohen, J. I. Gómez, C. Tenllado et al., Polyhedral parallel code generation for CUDA, ACM Transactions on Architecture and Code Optimization, vol.9, issue.4, p.54, 2013.
DOI : 10.1145/2400682.2400713

URL : https://hal.archives-ouvertes.fr/hal-00786677

M. Wolf and M. Lam, A data locality optimizing algorithm, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'91), pp.30-44, 1991.

J. Xue, On Tiling as a Loop Transformation, Parallel Processing Letters, vol.07, issue.04, pp.409-424, 1997.
DOI : 10.1142/S0129626497000401

J. Xue, Loop Tiling for Parallelism Sample Schedule Sequential Memory Size Pipelined Memory Size Stencils fdtd-2d S0(t,j) ?(t, 2000.