s_1 >= 0 and s_2 >= 0 }; Domain := [M ,
TiledWrite)); Out := Coalesce.TiledWrite ,
Optimizing DDR-SDRAM communications at C-level for automatically-generated hardware accelerators an experience with the Altera C2H HLS tool, ASAP 2010, 21st IEEE International Conference on Application-specific Systems, Architectures and Processors, pp.329-332, 2010. ,
DOI : 10.1109/ASAP.2010.5540967
URL : https://hal.archives-ouvertes.fr/inria-00482035
Kernel offloading with optimized remote accesses, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00611179
Optimizing remote accesses for offloaded kernels: Application to HLS for FPGA, Design, Automation and Test in Europe (DATE'13), pp.575-580, 2013. ,
Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories, 13th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP'08), pp.1-10, 2008. ,
Automatic communication optimizations through memory reuse strategies, 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'12), pp.277-278, 2012. ,
A practical automatic polyhedral parallelizer and locality optimizer, ACM International Conference on Programming Languages Design and Implementation (PLDI'08), pp.101-113, 2008. ,
DOI : 10.1145/1375581.1375595
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.151.5126
Loop program mapping and compact code generation for programmable hardware accelerators, 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors, pp.10-17, 2013. ,
DOI : 10.1109/ASAP.2013.6567544
Efficient Abstractions for GPGPU Programming, International Journal of Parallel Programming, vol.34, issue.5, pp.583-600, 2014. ,
DOI : 10.1007/s10766-013-0261-x
URL : https://hal.archives-ouvertes.fr/hal-01216144
Interprocedural array region analyses, International Workshop on Languages and Compilers for Parallel Computing (LCPC'96), pp.46-60, 1996. ,
URL : https://hal.archives-ouvertes.fr/hal-00752611
Parametric tiling with inter-tile data reuse, 4th International Workshop on Polyhedral Compilation Techniques (IMPACT'14), 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00915831
Exact and Approximated Data-Reuse Optimizations for Tiling with Parametric Sizes, 24th International Conference on Compiler Construction (CC'15), part of ETAPS'15, 2015. ,
DOI : 10.1007/978-3-662-46663-6_8
URL : https://hal.archives-ouvertes.fr/hal-01099017
Lattice-Based Memory Allocation, IEEE Transactions on Computers, vol.54, issue.10, pp.1242-1257, 2005. ,
DOI : 10.1109/TC.2005.167
URL : https://hal.archives-ouvertes.fr/hal-01272969
Parametric integer programming Corresponding software tool PIP, RAIRO Recherche Opérationnelle, vol.22, issue.3, pp.243-268, 1988. ,
DOI : 10.1051/ro/1988220302431
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.9957
The polyhedron model, Encyclopedia of Parallel Programming, 2011. ,
An efficient code generation technique for tiled iteration spaces, IEEE Transactions on Parallel and Distributed Systems, vol.14, issue.10, pp.1021-1034, 2003. ,
Precise Management of Scratchpad Memories for Localising Array Accesses in Scientific Codes, 18th International Conference on Compiler Construction (CC'09), pp.236-250, 2009. ,
DOI : 10.1007/978-3-642-00722-4_17
Beyond Do Loops: Data Transfer Generation with Convex Array Regions ,
DOI : 10.1007/978-3-642-37658-0_17
URL : https://hal.archives-ouvertes.fr/hal-00742583
Compilation pour cible hétérogènes: automatisation des analyses, transformations et décisions nécessaires, 20ème Rencontres Françaises du Parallélisme (Renpar'11), 2011. ,
DynTile: Parametric tiled loop generation for parallel execution on multicore processors, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp.1-12, 2010. ,
DOI : 10.1109/IPDPS.2010.5470459
Supernode partitioning, Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages , POPL '88, pp.319-329, 1988. ,
DOI : 10.1145/73560.73588
DRDU, ACM Transactions on Design Automation of Electronic Systems, vol.12, issue.2, 2007. ,
DOI : 10.1145/1230800.1230807
Compiler-directed scratch pad memory optimization for embedded multiprocessors, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.12, issue.3, pp.281-287, 2004. ,
DOI : 10.1109/TVLSI.2004.824299
Achieving a single compute device image in OpenCL for multiple GPUs, 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP'11), pp.277-288, 2011. ,
OpenMPC: Extended OpenMP programming and tuning for GPUs, ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC'10), pp.1-11, 2010. ,
Automatic storage management for parallel programs, Parallel Computing, vol.24, issue.3-4, pp.649-671, 1998. ,
DOI : 10.1016/S0167-8191(98)00029-5
Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme, Proceedings of the 21st international conference on Parallel architectures and compilation techniques, PACT '12, pp.33-42, 2012. ,
DOI : 10.1145/2370816.2370824
Polyhedralbased data reuse optimization for configurable computing, ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA'13), pp.29-38, 2013. ,
DOI : 10.1145/2435264.2435273
URL : http://cadlab.cs.ucla.edu/~cong/papers/fpga13_2.pdf
PolyBench/C, the polyhedral benchmark suite ,
Parameterized tiled loops for free, ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI'07), pp.405-414, 2007. ,
DOI : 10.1145/1273442.1250780
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.125.4572
Sub-polyhedral scheduling using (unit-) two-variable-per-inequality polyhedra, The 40th Annual ACM SIGPLAN- SIGACT Symposium on Principles of Programming Languages (POPL'13), pp.483-496, 2013. ,
isl: An Integer Set Library for the Polyhedral Model, Mathematical Software -ICMS 2010, pp.299-302, 2010. ,
DOI : 10.1007/978-3-642-15582-6_49
Counting affine calculator and applications, 1st International Workshop on Polyhedral Compilation Techniques (IMPACT'11), 2011. ,
Polyhedral parallel code generation for CUDA, ACM Transactions on Architecture and Code Optimization, vol.9, issue.4, p.54, 2013. ,
DOI : 10.1145/2400682.2400713
URL : https://hal.archives-ouvertes.fr/hal-00786677
A data locality optimizing algorithm, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'91), pp.30-44, 1991. ,
On Tiling as a Loop Transformation, Parallel Processing Letters, vol.07, issue.04, pp.409-424, 1997. ,
DOI : 10.1142/S0129626497000401
Loop Tiling for Parallelism Sample Schedule Sequential Memory Size Pipelined Memory Size Stencils fdtd-2d S0(t,j) ?(t, 2000. ,