C. Alias, Improving communication patterns in polyhedral process networks, Sixth International Workshop on High Performance Energy Efficient Embedded Systems, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01725143

C. Alias, Program Optimization by Template Recognition and Replacement, 2005.
URL : https://hal.archives-ouvertes.fr/tel-01892198

C. Alias, Tema : an efficient tool to find high-performance library patterns in source code, International Workshop on Patterns in High-Performance Computing (PatHPC'05), 2005.
URL : https://hal.archives-ouvertes.fr/ensl-01663997

C. Alias, FIFO recovery by depth-partitioning is complete on data-aware process networks, INRIA, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01818585

C. Alias, F. Baray, and A. Darte, Bee+cl@k : An implementation of latticebased array contraction in the source-to-source translator rose, ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES'07), 2007.

C. Alias and D. Barthou, Algorithm recognition based on demand-driven dataflow analysis, IEEE Working Conference on Reverse Engineering (WCRE'03), 2003.
URL : https://hal.archives-ouvertes.fr/ensl-01663748

C. Alias and D. Barthou, On the recognition of algorithm templates, International Workshop on Compiler Optimization meets Compiler Verification (COCV'03), 2003.

C. Alias and D. Barthou, Deciding where to call performance libraries, European Conference on Parallel Processing (Euro-Par'05), 2005.
URL : https://hal.archives-ouvertes.fr/hal-00141074

C. Alias and D. Barthou, On domain specific languages re-engineering, IEEE/ACM International Conference on Generative Programming and Component Engineering (GPCE'05), 2005.

C. Alias, A. Darte, P. Feautrier, and L. Gonnord, Multi-dimensional rankings, program termination, and complexity bounds of flowchart programs, International Static Analysis Symposium (SAS'10), 2010.
URL : https://hal.archives-ouvertes.fr/inria-00523298

C. Alias, A. Darte, P. Feautrier, and L. Gonnord, Rank: A tool to check program termination and computational complexity, International Workshop on Constraints in Software Testing Verification and Analysis (CSTVA'13), p.238, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00801571

C. Alias, A. Darte, and A. Plesco, Optimizing DDR-SDRAM communications at c-level for automatically-generated hardware accelerators. an experience with the altera C2H HLS tool, IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP'10), 2010.
URL : https://hal.archives-ouvertes.fr/hal-01664033

C. Alias, A. Darte, and A. Plesco, Optimizing remote accesses for offloaded kernels: Application to high-level synthesis for FPGA, 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00761533

C. Alias, A. Darte, and A. Plesco, Optimizing remote accesses for offloaded kernels: Application to high-level synthesis for FPGA, 2nd International Workshop on Polyhedral Compilation Techniques (IMPACT'12), 2012.
URL : https://hal.archives-ouvertes.fr/hal-00761533

C. Alias, A. Darte, and A. Plesco, Optimizing remote accesses for offloaded kernels: Application to high-level synthesis for FPGA, ACM SIGDA Intl. Conference on Design, Automation and Test in Europe (DATE'13), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00761533

C. Alias, C. Fuhs, and L. Gonnord, Estimation of Parallel Complexity with Rewriting Techniques, 15th International Workshop on Termination (WST'16), 2016.
URL : https://hal.archives-ouvertes.fr/hal-01345914

C. Alias, B. Pasca, and A. Plesco, Automatic generation of FPGAspecific pipelined accelerators, International Symposium on Applied Reconfigurable Computing (ARC'11), 2011.
URL : https://hal.archives-ouvertes.fr/ensl-00549682

C. Alias, B. Pasca, and A. Plesco, FPGA-specific synthesis of loopnests with pipeline computational cores, Microprocessors and Microsystems, vol.36, issue.8, pp.606-619, 2012.

C. Alias and A. Plesco, Method of automatic synthesis of circuits, device and computer program associated therewith. Patent FR1453308, 2014.

C. Alias and A. Plesco, Data-aware Process Networks, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01158726

C. Alias and A. Plesco, Optimizing Affine Control with Semantic Factorizations, ACM Transactions on Architecture and Code Optimization (TACO), vol.14, issue.4, p.27, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01470873

G. Andrieu, C. Alias, and L. Gonnord, SToP: Scalable termination analysis of (C) programs (tool presentation), In International Workshop on Tools for Automatic Program Analysis (TAPAS'12), 2012.
URL : https://hal.archives-ouvertes.fr/hal-00760926

G. Iooss, S. Rajopadhye, and C. Alias, Semantic tiling, Workshop on Leveraging Abstractions and Semantics in High-performance Computing (LASH-C'13), 2013.
URL : https://hal.archives-ouvertes.fr/hal-01664051

G. Iooss, C. Alias, and S. Rajopadhye, On program equivalence with reductions, 21st International Static Analysis Symposium (SAS'14), 2014.
URL : https://hal.archives-ouvertes.fr/hal-01096110

G. Iooss, C. Alias, and S. Rajopadhye, Monoparametric tiling of polyhedral programs, INRIA, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01952593

G. Iooss, S. Rajopadhye, C. Alias, and Y. Zou, CART: Constant aspect ratio tiling, 4th International Workshop on Polyhedral Compilation Techniques (IMPACT'14), 2014.
URL : https://hal.archives-ouvertes.fr/hal-00915827

Q. Lu, C. Alias, U. Bondhugula, T. Henretty, J. Sriram-krishnamoorthy et al., Haibo Lin, and Tin fook Ngai. Data layout transformations for enhancing data locality on NUCA chip multiprocessors, ACM/IEEE International Conference on Parallel Architectures and Compilation Techniques (PACT'09), 2009.

S. Rus, G. He, C. Alias, and L. Rauchwerger, Region array SSA, ACM/IEEE International Conference on Parallel Architectures and Compilation Techniques (PACT'06), 2006.

, General references, vol.29

E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak et al., Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, In Journal of Physics: Conference Series, vol.180, p.12037, 2009.

C. Ancourt and F. Irigoin, Scanning polyhedra with DO loops, Proceedings of the Third ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), pp.39-50, 1991.
URL : https://hal.archives-ouvertes.fr/hal-00752774

E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel et al., LAPACK Users' Guide. Society for Industrial and Applied Mathematics, 1999.

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, Starpu: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, vol.23, pp.187-198, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00384363

F. Baader and W. Snyder, Unification theory, chapter 8, 2001.

V. Bandishti, I. Pananilath, and U. Bondhugula, Tiling stencil computations to maximize parallelism, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pp.1-11, 2012.

J. Baras and G. Theodorakopoulos, Path Problems in Networks, 2010.

D. Barthou, P. Feautrier, and X. Redon, On the Equivalence of Two Systems of Affine Recurrence Equations (Research Note), Proceedings of the 8th International EuroPar Conference on Parallel Processing, pp.309-313, 2002.

U. Muthu-manikandan-baskaran, . Bondhugula, J. Sriram-krishnamoorthy, A. Ramanujam, P. Rountev et al., Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories, 13th ACM SIG-PLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'08), pp.1-10, 2008.

A. Muthu-manikandan-baskaran, S. Hartono, T. Tavarageri, J. Henretty, P. Ramanujam et al., Parameterized tiling revisited, Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '10, pp.200-209, 2010.

N. Muthu-manikandan-baskaran, B. Vasilache, R. Meister, and . Lethin, Automatic communication optimizations through memory reuse strategies, ACM SIGPLAN Notices, vol.47, pp.277-278, 2012.

C. Bastoul, A. Cohen, S. Girbal, S. Sharma, and O. Temam, Putting polyhedral loop transformations to work, International Workshop on Languages and Compilers for Parallel Computing (LCPC), 2003.
URL : https://hal.archives-ouvertes.fr/inria-00071681

C. Bastoul, Efficient code generation for automatic parallelization and optimization, 2nd International Symposium on Parallel and Distributed Computing (ISPDC 2003, pp.23-30, 2003.

C. Bastoul, Code generation in the polyhedral model is easier than you think, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pp.7-16, 2004.
URL : https://hal.archives-ouvertes.fr/hal-00017260

C. Bastoul, A. Cohen, S. Girbal, S. Sharma, and O. Temam, Putting polyhedral loop transformations to work, LCPC, pp.209-225, 2003.
URL : https://hal.archives-ouvertes.fr/inria-00071681

S. Bayliss, A. George, and . Constantinides, Optimizing sdram bandwidth for custom fpga loop accelerators, Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays, pp.195-204, 2012.

M. Bednara and J. Teich, Automatic synthesis of fpga processor arrays from loop algorithms, The Journal of Supercomputing, vol.26, issue.2, pp.149-165, 2003.

M. Benabderrahmane, L. Pouchet, A. Cohen, and C. Bastoul, The polyhedral model is more widely applicable than you think, International Conference on Compiler Construction, pp.283-303, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00551087

S. Bhansali and J. R. Hagemeister, A pattern-matching approach for reusing software libraries in parallel systems, First International Workshop on Knowledgebased Systems for the ReUse of Program Libraries, 1995.

J. Bilmes, K. Asanovic, C. Chin, and J. Demmel, Optimizing matrix multiply using phipac: a portable, high-performance, ansi c coding methodology, Proceedings of the 11th international conference on Supercomputing, pp.340-347, 1997.

M. Blott, Reconfigurable future for hpc, High Performance Computing & Simulation (HPCS), 2016 International Conference on, pp.130-131, 2016.

B. Boigelot, Symbolic methods for exploring infinite state spaces, 1998.

U. Bondhugula, V. Bandishti, and I. Pananilath, Diamond tiling: Tiling techniques to maximize parallelism for stencil computations, IEEE Transactions on Parallel and Distributed Systems, 2016.

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, A practical automatic polyhedral parallelizer and locality optimizer, Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, pp.101-113, 2008.

P. Boulet and P. Feautrier, Scanning polyhedra without Do-loops, IEEE International Conference on Parallel Architectures and Compilation Techniques (PACT'98), pp.4-9, 1998.
URL : https://hal.archives-ouvertes.fr/inria-00564990

J. Bu, E. F. Deprettere, and P. Dewilde, A design methodology for fixed-size systolic arrays, Proceedings of the International Conference on, pp.591-602, 1990.

, Altera C2H: Nios II C-to-hardware acceleration compiler

P. Charles, C. Grothoff, V. Saraswat, C. Donawa, and A. Kielstra, Kemal Ebcioglu, Christoph Von Praun, and Vivek Sarkar. X10: an object-oriented approach to non-uniform cluster computing, Acm Sigplan Notices, vol.40, pp.519-538, 2005.

D. Chavarría, -. Miranda, and J. Mellor-crummey, Effective communication coalescing for data-parallel applications, ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'05), pp.14-25, 2005.

W. Chen, C. Iancu, and K. Yelick, Communication optimizations for finegrained UPC applications, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05), pp.267-278, 2005.

H. Chitsaz, R. Salari, S. C. Sahinalp, and R. Backofen, A partition function algorithm for interacting nucleic acid strands, Bioinformatics, vol.25, issue.12, pp.365-373, 2009.

J. Choi, J. Demmel, I. Dhillon, J. Dongarra, S. Ostrouchov et al., Scalapack: A portable linear algebra library for distributed memory computers-design issues and performance, Computer Physics Communications, vol.97, issue.1-2, pp.1-15, 1996.

P. Clauss, Counting solutions to linear and nonlinear constraints through ehrhart polynomials: Applications to analyze and transform scientific programs, Proceedings of the 10th international conference on Supercomputing, pp.278-285, 1996.
URL : https://hal.archives-ouvertes.fr/hal-01100306

J. Cong, H. Huang, C. Liu, and Y. Zou, A reuse-aware prefetching scheme for scratchpad memory, Proceedings of the 48th annual Design Automation Conference (DAC'11), pp.960-965, 2011.

, NVidia Corporation. Cuda

, OpenACC Non-Profit Corporation. The openacc application programming interface version 2, 2013.

P. Coussy and A. Morawiec, High-Level Synthesis: From Algorithm to Digital Circuit, 2008.

A. Bruno-da-silva, . Braeken, H. Erik, A. Hollander, and . Touhafi, Performance modeling for FPGAs: extending the roofline model with high-level synthesis tools, International Journal of Reconfigurable Computing, issue.7, 2013.

A. Darte, Regular partitioning for synthesizing fixed-size systolic arrays, INTEGRATION, the VLSI journal, vol.12, issue.3, pp.293-304, 1991.

A. Darte and A. Isoard, Exact and approximated data-reuse optimizations for tiling with parametric sizes, CC, pp.151-170, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01099017

A. Darte, R. Schreiber, and G. Villard, Lattice-based memory allocation, Proceedings of the 2003 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, CASES '03, pp.298-308, 2003.
URL : https://hal.archives-ouvertes.fr/hal-02101912

A. Darte, R. Schreiber, B. R. Rau, and F. Vivien, Constructing and exploiting linear schedules with prescribed parallelism, ACM Transactions on Design Automation of Electronic Systems (ACM TODAES), vol.7, issue.1, pp.159-172, 2002.
URL : https://hal.archives-ouvertes.fr/hal-00807410

A. Darte, R. Schreiber, and G. Villard, Lattice-based memory allocation, IEEE Transactions on Computers, vol.54, issue.10, pp.1242-1257, 2005.
URL : https://hal.archives-ouvertes.fr/hal-02101912

B. Florent-de-dinechin and . Pasca, Designing custom arithmetic data paths with flopoco, 2011.

P. Florent-de-dinechin, T. Quinton, and . Risset, Structuration of the Alpha language, Massively Parallel Programming Models, pp.18-24, 1995.

F. Eddy-de-greef, H. Catthoor, and . De-man, Memory size reduction through storage order optimization for embedded parallel multimedia applications, Parallel Computing, vol.23, pp.1811-1837, 1997.

E. F. Deprettere, E. Rijpkema, P. Lieverse, and B. Kienhuis, Compaan: Deriving process networks from Matlab for embedded signal processing architectures, 8th International Workshop on Hardware/Software Codesign (CODES'2000), 2000.

S. Derrien and S. Rajopadhye, Loop tiling for reconfigurable accelerators, International Conference on Field Programmable Logic and Applications, pp.398-408

. Springer, , 2001.

R. Dolbeau, S. Bihan, and F. Bodin, Hmpp: A hybrid multi-core parallel programming environment, Workshop on general purpose processing on graphics processing units, vol.28, 2007.

D. Ebner, F. Brandner, B. Scholz, A. Krall, P. Wiedermann et al., Generalized instruction selection using ssa-graphs, ACM Sigplan Notices, vol.43, pp.31-40, 2008.

P. Feautrier, Parametric integer programming, RAIRO-Operations Research, vol.22, issue.3, pp.243-268, 1988.

P. Feautrier, Dataflow analysis of array and scalar references, International Journal of Parallel Programming, vol.20, issue.1, pp.23-53, 1991.

P. Feautrier, Some efficient solutions to the affine scheduling problem. Part I. onedimensional time, International Journal of Parallel Programming, vol.21, issue.5, pp.313-348, 1992.

P. Feautrier, Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time, International Journal of Parallel Programming, vol.21, issue.6, pp.389-420, 1992.

X. Feng and A. J. Hu, Cutpoints for Formal Equivalence Verification of Embedded Software, Proceedings of the 5th ACM International Conference on Embedded Software, pp.307-316, 2005.

J. Fifield, R. Keryell, H. Ratigner, H. Styles, and J. Wu, Optimizing OpenCL applications on xilinx fpga, Proceedings of the 4th International Workshop on OpenCL, 2016.

M. Frigo, G. Steven, and . Johnson, FFTW: An adaptive software architecture for the fft, Proceedings of the 1998 IEEE International Conference on, vol.3, pp.1381-1384, 1998.

A. Geist, A. Daniel, and . Reed, A survey of high-performance computing scaling challenges, ternational Journal of High Performance Computing Applications, p.1094342015597083, 2015.

B. Godlin and O. Strichman, Regression Verification, Proceedings of the 46th Annual Design Automation Conference, pp.466-471, 2009.

L. Gonnord, Accélération abstraite pour l'amélioration de la précision en Analyse des Relations Linéaires, 2007.

M. Graphics, Mentor CatapultC high-level synthesis

T. Grosser, A. Cohen, J. Holewinski, P. Sadayappan, and S. Verdoolaege, Hybrid hexagonal/classical tiling for GPUs, Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '14, pp.66-75, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00911177

T. Grosser, H. Zheng, R. Aloor, and A. Simbürger, Armin Größlinger, and Louis-Noël Pouchet. Polly -polyhedral optimization in LLVM, 1st International Workshop on Polyhedral Compilation Techniques (IMPACT), pp.1-6, 2011.

A. Größlinger, Precise management of scratchpad memories for localizing array accesses in scientific codes, International Conference on Compiler Construction (CC'09), vol.5501, pp.236-250, 2009.

S. Guelton, F. Irigoin, and R. Keryell, Compilation for heterogeneous computing: Automating analysis, transformations, and decisions, 2011.

G. Gupta and S. Rajopadhye, The Z-polyhedral model, Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, pp.237-248, 2007.

S. Hack, D. Grund, and G. Goos, Register allocation for programs in SSAform, International Conference on Compiler Construction, pp.247-262, 2006.

T. Hoare, The Verifying Compiler: A Grand Challenge for Computing Research, Proceedings of the 2003 Joint Modular Languages Conference, pp.25-35, 2003.

G. Iooss, Detection of linear algebra operations in polyhedral programs, 2016.
URL : https://hal.archives-ouvertes.fr/tel-01370553

F. Irigoin and R. Triolet, Supernode partitioning, Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL'88, pp.319-329, 1988.

E. Ilya-issenin, M. Borckmeyer, N. Miranda, and . Dutt, DRDU: A data reuse analysis technique for efficient scratch-pad memory management, ACM Transactions on Design Automation of Electronics Systems (ACM TODAES), vol.12, issue.2, 2007.

, Double data rate (DDR) SDRAM specification JESD79F, JEDEC

G. Kahn, The semantics of simple language for parallel programming, IFIP Congress 74, pp.471-475, 1974.

C. Karfa, K. Banerjee, D. Sarkar, and C. Mandal, Verification of loop and arithmetic transformations of array-intensive behaviors, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol.32, issue.11, pp.1787-1800, 2013.

M. Richard, R. E. Karp, S. Miller, and . Winograd, The organization of computations for uniform recurrence equations, Journal of the ACM, vol.14, issue.3, pp.563-590, 1967.

W. Kelly, W. Pugh, E. Rosser, and T. Shpeisman, Transitive closure of infinite graphs and its applications, International Journal of Parallel Programming, vol.24, issue.6, pp.579-598, 1996.

C. W. Kessler, Pattern-driven automatic parallelization, Scientific Programming, vol.5, issue.3, pp.251-274, 1996.

D. Kim and S. Rajopadhye, Efficient tiled loop generation: D-tiling, Proceedings of the 22Nd International Conference on Languages and Compilers for Parallel Computing, LCPC'09, pp.293-307, 2010.

S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev et al., Effective automatic parallelization of stencil computations. SIG-PLAN conference of Programing Language Design and Implementation, vol.42, pp.235-244, 2007.

M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala et al., Optimistic parallelism requires abstractions, Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '07, pp.211-222, 2007.

S. Kundu, Z. Tatlock, and S. Lerner, Proving Optimizations Correct using Parameterized Program Equivalence, Proceedings of the 30th ACM SIGPLAN conference on Programming Language Design and Implementation, pp.327-337, 2009.

D. Monica, E. E. Lam, M. E. Rothberg, and . Wolf, The cache performance and optimizations of blocked algorithms, In ACM SIGARCH Computer Architecture News, vol.19, pp.63-74, 1991.

C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh, Basic linear algebra subprograms for fortran usage, ACM Trans. Math. Softw, vol.5, issue.3, pp.308-323, 1979.

H. L. Verge, C. Mauras, and P. Quinton, The ALPHA language and its use for the design of systolic arrays, Journal of VLSI Signal Processing, vol.3, issue.3, pp.173-182, 1991.

V. Lefebvre and P. Feautrier, Automatic storage management for parallel programs, Parallel Computing, vol.24, pp.649-671, 1998.

A. Leung, N. Vasilache, B. Meister, . Muthu-manikandan, D. Baskaran et al., A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction, 3rd Workshop on General Purpose Processing on Graphics Processing Units (GPGPU'10), Held with ASPLOS XVI, pp.51-61, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00551084

E. Martin, O. Sentieys, H. Dubois, and J. Philippe, Gaut: An architectural synthesis tool for dedicated signal processors, Design Automation Conference with EURO-VHDL'93 (EURO-DAC), 1993.

C. Mauras, ALPHA: un Langage Équationnel pour la Conception et la Programmation d'Architectures Parallèles Synchrones, 1989.

S. Meijer, Transformations for Polyhedral Process Networks, 2010.

R. Metzger and Z. Wen, Automatic Algorithm Recognition: A New Approach to Program Optimization, 2000.

M. Kandemir and A. Choudhary, Compiler-directed scratch pad memory hierarchy design and management, Proceedings of the 39th annual Design Automation Conference (DAC'02), pp.628-633, 2002.

D. I. Moldovan and J. A. Fortes, Partitioning and mapping algorithms into fixed size systolic arrays, IEEE transactions on computers, issue.1, pp.1-12, 1986.

A. Munshi, The opencl specification, Hot Chips 21 Symposium (HCS), pp.1-314, 2009.

G. C. Necula, Translation Validation for an Optimizing Compiler, Proceedings of the 21st ACM SIGPLAN Conference on Programming Language Design and Implementation, pp.83-95, 2000.

M. Josep, R. M. Perez, J. Badia, and . Labarta, A dependency-aware task-based programming environment for multi-core architectures, IEEE International Conference on, pp.142-151, 2008.

S. Shlomit, R. Y. Pinter, and . Pinter, Program optimization and parallelization using idioms, Proceedings of the 18th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL'91, pp.79-92, 1991.

A. Plesco, Program Transformations and Memory Architecture Optimizations for High-Level Synthesis of Hardware Accelerators, 2010.
URL : https://hal.archives-ouvertes.fr/tel-00544349

A. Pnueli, M. Siegel, and F. Singerman, Translation Validation, Proceedings of the 4th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pp.151-166, 1998.

S. Pop, A. Cohen, C. Bastoul, and S. Girbal, Geogres-André Silber, and Nicolas Vasilache. GRAPHITE: Loop optimizations based on the polyhedral model for GCC, Proceedings of the 4th GCC Developper's Summit, pp.1-18, 2006.

L. Pouchet, Polybench: The polyhedral benchmark suite, 2012.

L. Pouchet, P. Zhang, P. Sadayappan, and J. Cong, Polyhedral-based data reuse optimization for configurable computing, Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '13, pp.29-38, 2013.

W. Pugh and D. Wonnacott, Going beyond integer programming with the omega test to eliminate false data dependences, IEEE Transactions on Parallel and Distributed Systems, vol.6, issue.2, pp.204-211, 1995.

M. Püschel, M. F. José, B. Moura, J. Singer, J. Xiong et al., Spiral: A generator for platform-adapted libraries of signal processing algorithms, The International Journal of High Performance Computing Applications, vol.18, issue.1, pp.21-45, 2004.

F. Quilleré and S. Rajopadhye, Optimizing memory usage in the polyhedral model, ACM Transactions on Programming Languages and Systems (TOPLAS), vol.22, issue.5, pp.773-815, 2000.

F. Quilleré, S. Rajopadhye, and D. Wilde, Generation of efficient nested loops from polyhedra, International Journal of Parallel Programming, vol.28, issue.5, pp.469-498, 2000.

P. Quinton and V. Van-dongen, Journal of VLSI signal processing systems for signal, image and video technology, vol.1, pp.95-113, 1989.

S. Sanjay-v-rajopadhye, R. Purushothaman, and . Fujimoto, On synthesizing systolic arrays from recurrence equations with linear dependencies, International Conference on Foundations of Software Technology and Theoretical Computer Science, pp.488-503

. Springer, , 1986.

F. Rastello and T. Dauxois, Efficient tiling for an ODE discrete integration program: Redundant tasks instead of trapezoidal shaped-tiles, 16th International Parallel and Distributed Processing Symposium (IPDPS 2002, pp.15-19, 2002.

D. A. Reed, L. M. Adams, and M. L. Partick, Stencils and problem partitionings: Their influence on the performance of multiple processor systems, IEEE Transactions on Computers, vol.36, issue.7, pp.845-858, 1987.

D. Lakshminarayanan-renganarayanan, . Kim, V. Sanjay, M. M. Rajopadhye, and . Strout, Parameterized loop tiling. ACM Trans. Program. Lang. Syst, vol.34, issue.1, p.3, 2012.

E. Rijpkema, E. F. Deprettere, and B. Kienhuis, Deriving process networks from nested loop algorithms, Parallel Processing Letters, vol.10, issue.02n03, pp.165-176, 2000.

R. Schreiber, S. Aditya, V. Ramakrishna-rau, S. Kathail, S. Mahlke et al., High-level synthesis of nonprogrammable hardware accelerators, Application-Specific Systems, Architectures, and Processors, pp.113-124, 2000.

R. Schreiber, J. Jack, and . Dongarra, Automatic blocking of nested loops, 1990.

W. Shang and J. A. Fortes, Independent partitioning of algorithms with uniform dependencies, IEEE Transactions on Computers, vol.41, issue.2, pp.190-206, 1992.

K. C. Shashidhar, M. Bruynooghe, F. Catthoor, and G. Janssens, Verification of source code transformations by program equivalence checking, Lecture Notes in Computer Science, vol.3443, pp.221-236, 2005.

M. M. Strout, A. Lamielle, L. Carter, J. Ferrante, B. Kreaseck et al., An approach for code generation in the sparse polyhedral framework, Parallel Computing, vol.53, pp.32-57, 2016.

J. M. Tarela and . Martinez, Region configurations for realizability of lattice piecewise-linear models, Mathematical and Computer Modelling, vol.30, pp.17-27, 1999.

J. Teich, A. Tanase, and F. Hannig, Symbolic parallelization of loop programs for massively parallel processor arrays, Application-Specific Systems, Architectures and Processors (ASAP), pp.1-9, 2013.

J. Teich, A. Tanase, and F. Hannig, Symbolic mapping of loop programs onto processor arrays, Journal of Signal Processing Systems, vol.77, issue.1-2, pp.31-59, 2014.

J. Teich and L. Thiele, Partitioning of processor arrays: A piecewise regular approach. Integration, the VLSI journal, vol.14, pp.297-332, 1993.

J. Teich, L. Thiele, and L. Z. Zhang, Journal of VLSI signal processing systems for signal, image and video technology, vol.17, pp.5-20, 1997.

W. Frederick and T. , Language and compiler support for stream programs, 2009.

K. Trifunovic and A. Cohen, Enabling more optimizations in GRAPHITE: ignoring memory-based dependences, Proceedings of the 8th GCC Developper's Summit, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00551509

A. Turjan, Compiling nested loop programs to process networks, 2007.

A. Turjan, B. Kienhuis, and E. Deprettere, Realizations of the extended linearization model. Domain-specific processors: systems, architectures, modeling, and simulation, pp.171-191, 2002.

A. Turjan, B. Kienhuis, and E. Deprettere, Classifying interprocess communication in process network representation of nested-loop programs, ACM Transactions on Embedded Computing Systems (TECS), vol.6, issue.2, p.13, 2007.

, Ugh: User-guided high-level synthesis

B. Sven-van-haastregt and . Kienhuis, Enabling automatic pipeline utilization improvement in polyhedral process network implementations, Application-Specific Systems, Architectures and Processors (ASAP), pp.173-176, 2012.

A. Venkat, M. Hall, and M. Strout, Loop and data transformations for sparse matrix code, ACM SIGPLAN Notices, vol.50, pp.521-532, 2015.

S. Verdoolaege, ISL: An integer set library for the polyhedral model, ICMS, vol.6327, pp.299-302, 2010.

S. Verdoolaege, Handbook of Signal Processing Systems, Polyhedral Process Networks, pp.931-965, 2010.

S. Verdoolaege, Integer set coalescing, 5th International Workshop on Polyhedral Compilation Techniques (IMPACT'15), 2015.

S. Verdoolaege, A. Cohen, and A. Beletska, Transitive closures of affine integer tuple relations and their overapproximations, International Static Analysis Symposium, pp.216-232, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00578052

S. Verdoolaege, G. Janssens, and M. Bruynooghe, Equivalence checking of static affine programs using widening to handle recurrences, ACM Transactions on Programming Languages and Systems, vol.34, issue.3, pp.1-35, 2012.

S. Verdoolaege, H. Nikolov, and T. Stefanov, Pn: a tool for improved derivation of process networks, EURASIP journal on Embedded Systems, issue.1, pp.19-19, 2007.

X. Vivado and H. ,

D. B. West, Introduction to Graph Theory, 1999.

R. , C. Whaley, and J. J. Dongarra, Automatically tuned linear algebra software, Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, SC '98, pp.1-27, 1998.

S. Williams, A. Waterman, and D. Patterson, Roofline: an insightful visual performance model for multicore architectures, Communications of the ACM, vol.52, issue.4, pp.65-76, 2009.

L. M. Wills, Automated Program Recognition by Graph Parsing. PhD thesis, MIT, 1992.

E. Michael, M. S. Wolf, and . Lam, A data locality optimizing algorithm, ACM Sigplan Notices, vol.26, pp.30-44, 1991.

M. Wolfe, Iteration space tiling for memory hierarchies, Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing, pp.357-361, 1989.

J. Xue, Loop Tiling for Parallelism, 2000.

A. Yarkhan, J. Kurzak, and J. Dongarra, Quark users' guide, 2011.

T. Yuki, G. Gupta, D. Kim, T. Pathan, and S. V. Rajopadhye, Alphaz: A system for design space exploration in the polyhedral model, Languages and Compilers for Parallel Computing, 25th International Workshop, LCPC 2012, pp.17-31, 2012.

C. Zhang, D. Wu, J. Sun, G. Sun, G. Luo et al., Energy-efficient cnn implementation on a deeply pipelined fpga cluster, Proceedings of the 2016 International Symposium on Low Power Electronics and Design, pp.326-331, 2016.

X. Zhou, J. Giacalone, M. J. Garzarán, H. Robert, Y. Kuhn et al., Hierarchical overlapped tiling, Proceedings of the Tenth International Symposium on Code Generation and Optimization, pp.207-218, 2012.

C. Zissulescu, . Turjan, E. Kienhuis, and . Deprettere, Solving out of order communication using CAM memory: an implementation, 13th Annual Workshop on Circuits, Systems and Signal Processing, 2002.

L. Zuck, A. Pnueli, B. Goldberg, C. Barrett, Y. Fang et al., Translation and Run-Time Validation of Loop Transformations, Formal Methods in System Design, vol.27, issue.3, pp.335-360, 2005.