Improving communication patterns in polyhedral process networks, Sixth International Workshop on High Performance Energy Efficient Embedded Systems, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01725143
Program Optimization by Template Recognition and Replacement, 2005. ,
URL : https://hal.archives-ouvertes.fr/tel-01892198
Tema : an efficient tool to find high-performance library patterns in source code, International Workshop on Patterns in High-Performance Computing (PatHPC'05), 2005. ,
URL : https://hal.archives-ouvertes.fr/ensl-01663997
FIFO recovery by depth-partitioning is complete on data-aware process networks, INRIA, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01818585
Bee+cl@k : An implementation of latticebased array contraction in the source-to-source translator rose, ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES'07), 2007. ,
Algorithm recognition based on demand-driven dataflow analysis, IEEE Working Conference on Reverse Engineering (WCRE'03), 2003. ,
URL : https://hal.archives-ouvertes.fr/ensl-01663748
On the recognition of algorithm templates, International Workshop on Compiler Optimization meets Compiler Verification (COCV'03), 2003. ,
Deciding where to call performance libraries, European Conference on Parallel Processing (Euro-Par'05), 2005. ,
URL : https://hal.archives-ouvertes.fr/hal-00141074
On domain specific languages re-engineering, IEEE/ACM International Conference on Generative Programming and Component Engineering (GPCE'05), 2005. ,
Multi-dimensional rankings, program termination, and complexity bounds of flowchart programs, International Static Analysis Symposium (SAS'10), 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00523298
Rank: A tool to check program termination and computational complexity, International Workshop on Constraints in Software Testing Verification and Analysis (CSTVA'13), p.238, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00801571
Optimizing DDR-SDRAM communications at c-level for automatically-generated hardware accelerators. an experience with the altera C2H HLS tool, IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP'10), 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-01664033
Optimizing remote accesses for offloaded kernels: Application to high-level synthesis for FPGA, 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00761533
Optimizing remote accesses for offloaded kernels: Application to high-level synthesis for FPGA, 2nd International Workshop on Polyhedral Compilation Techniques (IMPACT'12), 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00761533
Optimizing remote accesses for offloaded kernels: Application to high-level synthesis for FPGA, ACM SIGDA Intl. Conference on Design, Automation and Test in Europe (DATE'13), 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00761533
Estimation of Parallel Complexity with Rewriting Techniques, 15th International Workshop on Termination (WST'16), 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01345914
Automatic generation of FPGAspecific pipelined accelerators, International Symposium on Applied Reconfigurable Computing (ARC'11), 2011. ,
URL : https://hal.archives-ouvertes.fr/ensl-00549682
FPGA-specific synthesis of loopnests with pipeline computational cores, Microprocessors and Microsystems, vol.36, issue.8, pp.606-619, 2012. ,
Method of automatic synthesis of circuits, device and computer program associated therewith. Patent FR1453308, 2014. ,
Data-aware Process Networks, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01158726
Optimizing Affine Control with Semantic Factorizations, ACM Transactions on Architecture and Code Optimization (TACO), vol.14, issue.4, p.27, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01470873
SToP: Scalable termination analysis of (C) programs (tool presentation), In International Workshop on Tools for Automatic Program Analysis (TAPAS'12), 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00760926
Semantic tiling, Workshop on Leveraging Abstractions and Semantics in High-performance Computing (LASH-C'13), 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-01664051
On program equivalence with reductions, 21st International Static Analysis Symposium (SAS'14), 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01096110
Monoparametric tiling of polyhedral programs, INRIA, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01952593
CART: Constant aspect ratio tiling, 4th International Workshop on Polyhedral Compilation Techniques (IMPACT'14), 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00915827
Haibo Lin, and Tin fook Ngai. Data layout transformations for enhancing data locality on NUCA chip multiprocessors, ACM/IEEE International Conference on Parallel Architectures and Compilation Techniques (PACT'09), 2009. ,
Region array SSA, ACM/IEEE International Conference on Parallel Architectures and Compilation Techniques (PACT'06), 2006. ,
, General references, vol.29
Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, In Journal of Physics: Conference Series, vol.180, p.12037, 2009. ,
Scanning polyhedra with DO loops, Proceedings of the Third ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), pp.39-50, 1991. ,
URL : https://hal.archives-ouvertes.fr/hal-00752774
, LAPACK Users' Guide. Society for Industrial and Applied Mathematics, 1999.
, Starpu: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, vol.23, pp.187-198, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00384363
Unification theory, chapter 8, 2001. ,
Tiling stencil computations to maximize parallelism, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pp.1-11, 2012. ,
Path Problems in Networks, 2010. ,
On the Equivalence of Two Systems of Affine Recurrence Equations (Research Note), Proceedings of the 8th International EuroPar Conference on Parallel Processing, pp.309-313, 2002. ,
Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories, 13th ACM SIG-PLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'08), pp.1-10, 2008. ,
Parameterized tiling revisited, Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '10, pp.200-209, 2010. ,
Automatic communication optimizations through memory reuse strategies, ACM SIGPLAN Notices, vol.47, pp.277-278, 2012. ,
Putting polyhedral loop transformations to work, International Workshop on Languages and Compilers for Parallel Computing (LCPC), 2003. ,
URL : https://hal.archives-ouvertes.fr/inria-00071681
Efficient code generation for automatic parallelization and optimization, 2nd International Symposium on Parallel and Distributed Computing (ISPDC 2003, pp.23-30, 2003. ,
Code generation in the polyhedral model is easier than you think, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pp.7-16, 2004. ,
URL : https://hal.archives-ouvertes.fr/hal-00017260
Putting polyhedral loop transformations to work, LCPC, pp.209-225, 2003. ,
URL : https://hal.archives-ouvertes.fr/inria-00071681
Optimizing sdram bandwidth for custom fpga loop accelerators, Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays, pp.195-204, 2012. ,
Automatic synthesis of fpga processor arrays from loop algorithms, The Journal of Supercomputing, vol.26, issue.2, pp.149-165, 2003. ,
The polyhedral model is more widely applicable than you think, International Conference on Compiler Construction, pp.283-303, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00551087
A pattern-matching approach for reusing software libraries in parallel systems, First International Workshop on Knowledgebased Systems for the ReUse of Program Libraries, 1995. ,
Optimizing matrix multiply using phipac: a portable, high-performance, ansi c coding methodology, Proceedings of the 11th international conference on Supercomputing, pp.340-347, 1997. ,
Reconfigurable future for hpc, High Performance Computing & Simulation (HPCS), 2016 International Conference on, pp.130-131, 2016. ,
Symbolic methods for exploring infinite state spaces, 1998. ,
Diamond tiling: Tiling techniques to maximize parallelism for stencil computations, IEEE Transactions on Parallel and Distributed Systems, 2016. ,
A practical automatic polyhedral parallelizer and locality optimizer, Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, pp.101-113, 2008. ,
Scanning polyhedra without Do-loops, IEEE International Conference on Parallel Architectures and Compilation Techniques (PACT'98), pp.4-9, 1998. ,
URL : https://hal.archives-ouvertes.fr/inria-00564990
A design methodology for fixed-size systolic arrays, Proceedings of the International Conference on, pp.591-602, 1990. ,
, Altera C2H: Nios II C-to-hardware acceleration compiler
Kemal Ebcioglu, Christoph Von Praun, and Vivek Sarkar. X10: an object-oriented approach to non-uniform cluster computing, Acm Sigplan Notices, vol.40, pp.519-538, 2005. ,
Effective communication coalescing for data-parallel applications, ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'05), pp.14-25, 2005. ,
Communication optimizations for finegrained UPC applications, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05), pp.267-278, 2005. ,
A partition function algorithm for interacting nucleic acid strands, Bioinformatics, vol.25, issue.12, pp.365-373, 2009. ,
Scalapack: A portable linear algebra library for distributed memory computers-design issues and performance, Computer Physics Communications, vol.97, issue.1-2, pp.1-15, 1996. ,
Counting solutions to linear and nonlinear constraints through ehrhart polynomials: Applications to analyze and transform scientific programs, Proceedings of the 10th international conference on Supercomputing, pp.278-285, 1996. ,
URL : https://hal.archives-ouvertes.fr/hal-01100306
A reuse-aware prefetching scheme for scratchpad memory, Proceedings of the 48th annual Design Automation Conference (DAC'11), pp.960-965, 2011. ,
, NVidia Corporation. Cuda
, OpenACC Non-Profit Corporation. The openacc application programming interface version 2, 2013.
High-Level Synthesis: From Algorithm to Digital Circuit, 2008. ,
Performance modeling for FPGAs: extending the roofline model with high-level synthesis tools, International Journal of Reconfigurable Computing, issue.7, 2013. ,
Regular partitioning for synthesizing fixed-size systolic arrays, INTEGRATION, the VLSI journal, vol.12, issue.3, pp.293-304, 1991. ,
Exact and approximated data-reuse optimizations for tiling with parametric sizes, CC, pp.151-170, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01099017
Lattice-based memory allocation, Proceedings of the 2003 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, CASES '03, pp.298-308, 2003. ,
URL : https://hal.archives-ouvertes.fr/hal-02101912
Constructing and exploiting linear schedules with prescribed parallelism, ACM Transactions on Design Automation of Electronic Systems (ACM TODAES), vol.7, issue.1, pp.159-172, 2002. ,
URL : https://hal.archives-ouvertes.fr/hal-00807410
Lattice-based memory allocation, IEEE Transactions on Computers, vol.54, issue.10, pp.1242-1257, 2005. ,
URL : https://hal.archives-ouvertes.fr/hal-02101912
Designing custom arithmetic data paths with flopoco, 2011. ,
Structuration of the Alpha language, Massively Parallel Programming Models, pp.18-24, 1995. ,
Memory size reduction through storage order optimization for embedded parallel multimedia applications, Parallel Computing, vol.23, pp.1811-1837, 1997. ,
Compaan: Deriving process networks from Matlab for embedded signal processing architectures, 8th International Workshop on Hardware/Software Codesign (CODES'2000), 2000. ,
Loop tiling for reconfigurable accelerators, International Conference on Field Programmable Logic and Applications, pp.398-408 ,
, , 2001.
Hmpp: A hybrid multi-core parallel programming environment, Workshop on general purpose processing on graphics processing units, vol.28, 2007. ,
Generalized instruction selection using ssa-graphs, ACM Sigplan Notices, vol.43, pp.31-40, 2008. ,
Parametric integer programming, RAIRO-Operations Research, vol.22, issue.3, pp.243-268, 1988. ,
Dataflow analysis of array and scalar references, International Journal of Parallel Programming, vol.20, issue.1, pp.23-53, 1991. ,
Some efficient solutions to the affine scheduling problem. Part I. onedimensional time, International Journal of Parallel Programming, vol.21, issue.5, pp.313-348, 1992. ,
Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time, International Journal of Parallel Programming, vol.21, issue.6, pp.389-420, 1992. ,
Cutpoints for Formal Equivalence Verification of Embedded Software, Proceedings of the 5th ACM International Conference on Embedded Software, pp.307-316, 2005. ,
Optimizing OpenCL applications on xilinx fpga, Proceedings of the 4th International Workshop on OpenCL, 2016. ,
FFTW: An adaptive software architecture for the fft, Proceedings of the 1998 IEEE International Conference on, vol.3, pp.1381-1384, 1998. ,
A survey of high-performance computing scaling challenges, ternational Journal of High Performance Computing Applications, p.1094342015597083, 2015. ,
Regression Verification, Proceedings of the 46th Annual Design Automation Conference, pp.466-471, 2009. ,
Accélération abstraite pour l'amélioration de la précision en Analyse des Relations Linéaires, 2007. ,
Mentor CatapultC high-level synthesis ,
Hybrid hexagonal/classical tiling for GPUs, Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '14, pp.66-75, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00911177
Armin Größlinger, and Louis-Noël Pouchet. Polly -polyhedral optimization in LLVM, 1st International Workshop on Polyhedral Compilation Techniques (IMPACT), pp.1-6, 2011. ,
Precise management of scratchpad memories for localizing array accesses in scientific codes, International Conference on Compiler Construction (CC'09), vol.5501, pp.236-250, 2009. ,
Compilation for heterogeneous computing: Automating analysis, transformations, and decisions, 2011. ,
The Z-polyhedral model, Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, pp.237-248, 2007. ,
Register allocation for programs in SSAform, International Conference on Compiler Construction, pp.247-262, 2006. ,
The Verifying Compiler: A Grand Challenge for Computing Research, Proceedings of the 2003 Joint Modular Languages Conference, pp.25-35, 2003. ,
Detection of linear algebra operations in polyhedral programs, 2016. ,
URL : https://hal.archives-ouvertes.fr/tel-01370553
Supernode partitioning, Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL'88, pp.319-329, 1988. ,
DRDU: A data reuse analysis technique for efficient scratch-pad memory management, ACM Transactions on Design Automation of Electronics Systems (ACM TODAES), vol.12, issue.2, 2007. ,
, Double data rate (DDR) SDRAM specification JESD79F, JEDEC
The semantics of simple language for parallel programming, IFIP Congress 74, pp.471-475, 1974. ,
Verification of loop and arithmetic transformations of array-intensive behaviors, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol.32, issue.11, pp.1787-1800, 2013. ,
The organization of computations for uniform recurrence equations, Journal of the ACM, vol.14, issue.3, pp.563-590, 1967. ,
Transitive closure of infinite graphs and its applications, International Journal of Parallel Programming, vol.24, issue.6, pp.579-598, 1996. ,
Pattern-driven automatic parallelization, Scientific Programming, vol.5, issue.3, pp.251-274, 1996. ,
Efficient tiled loop generation: D-tiling, Proceedings of the 22Nd International Conference on Languages and Compilers for Parallel Computing, LCPC'09, pp.293-307, 2010. ,
Effective automatic parallelization of stencil computations. SIG-PLAN conference of Programing Language Design and Implementation, vol.42, pp.235-244, 2007. ,
Optimistic parallelism requires abstractions, Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '07, pp.211-222, 2007. ,
Proving Optimizations Correct using Parameterized Program Equivalence, Proceedings of the 30th ACM SIGPLAN conference on Programming Language Design and Implementation, pp.327-337, 2009. ,
The cache performance and optimizations of blocked algorithms, In ACM SIGARCH Computer Architecture News, vol.19, pp.63-74, 1991. ,
Basic linear algebra subprograms for fortran usage, ACM Trans. Math. Softw, vol.5, issue.3, pp.308-323, 1979. ,
The ALPHA language and its use for the design of systolic arrays, Journal of VLSI Signal Processing, vol.3, issue.3, pp.173-182, 1991. ,
Automatic storage management for parallel programs, Parallel Computing, vol.24, pp.649-671, 1998. ,
A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction, 3rd Workshop on General Purpose Processing on Graphics Processing Units (GPGPU'10), Held with ASPLOS XVI, pp.51-61, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00551084
Gaut: An architectural synthesis tool for dedicated signal processors, Design Automation Conference with EURO-VHDL'93 (EURO-DAC), 1993. ,
ALPHA: un Langage Équationnel pour la Conception et la Programmation d'Architectures Parallèles Synchrones, 1989. ,
Transformations for Polyhedral Process Networks, 2010. ,
Automatic Algorithm Recognition: A New Approach to Program Optimization, 2000. ,
Compiler-directed scratch pad memory hierarchy design and management, Proceedings of the 39th annual Design Automation Conference (DAC'02), pp.628-633, 2002. ,
Partitioning and mapping algorithms into fixed size systolic arrays, IEEE transactions on computers, issue.1, pp.1-12, 1986. ,
The opencl specification, Hot Chips 21 Symposium (HCS), pp.1-314, 2009. ,
Translation Validation for an Optimizing Compiler, Proceedings of the 21st ACM SIGPLAN Conference on Programming Language Design and Implementation, pp.83-95, 2000. ,
A dependency-aware task-based programming environment for multi-core architectures, IEEE International Conference on, pp.142-151, 2008. ,
Program optimization and parallelization using idioms, Proceedings of the 18th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL'91, pp.79-92, 1991. ,
Program Transformations and Memory Architecture Optimizations for High-Level Synthesis of Hardware Accelerators, 2010. ,
URL : https://hal.archives-ouvertes.fr/tel-00544349
Translation Validation, Proceedings of the 4th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pp.151-166, 1998. ,
Geogres-André Silber, and Nicolas Vasilache. GRAPHITE: Loop optimizations based on the polyhedral model for GCC, Proceedings of the 4th GCC Developper's Summit, pp.1-18, 2006. ,
Polybench: The polyhedral benchmark suite, 2012. ,
Polyhedral-based data reuse optimization for configurable computing, Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '13, pp.29-38, 2013. ,
Going beyond integer programming with the omega test to eliminate false data dependences, IEEE Transactions on Parallel and Distributed Systems, vol.6, issue.2, pp.204-211, 1995. ,
Spiral: A generator for platform-adapted libraries of signal processing algorithms, The International Journal of High Performance Computing Applications, vol.18, issue.1, pp.21-45, 2004. ,
Optimizing memory usage in the polyhedral model, ACM Transactions on Programming Languages and Systems (TOPLAS), vol.22, issue.5, pp.773-815, 2000. ,
Generation of efficient nested loops from polyhedra, International Journal of Parallel Programming, vol.28, issue.5, pp.469-498, 2000. ,
Journal of VLSI signal processing systems for signal, image and video technology, vol.1, pp.95-113, 1989. ,
On synthesizing systolic arrays from recurrence equations with linear dependencies, International Conference on Foundations of Software Technology and Theoretical Computer Science, pp.488-503 ,
, , 1986.
Efficient tiling for an ODE discrete integration program: Redundant tasks instead of trapezoidal shaped-tiles, 16th International Parallel and Distributed Processing Symposium (IPDPS 2002, pp.15-19, 2002. ,
Stencils and problem partitionings: Their influence on the performance of multiple processor systems, IEEE Transactions on Computers, vol.36, issue.7, pp.845-858, 1987. ,
, Parameterized loop tiling. ACM Trans. Program. Lang. Syst, vol.34, issue.1, p.3, 2012.
Deriving process networks from nested loop algorithms, Parallel Processing Letters, vol.10, issue.02n03, pp.165-176, 2000. ,
High-level synthesis of nonprogrammable hardware accelerators, Application-Specific Systems, Architectures, and Processors, pp.113-124, 2000. ,
Automatic blocking of nested loops, 1990. ,
Independent partitioning of algorithms with uniform dependencies, IEEE Transactions on Computers, vol.41, issue.2, pp.190-206, 1992. ,
Verification of source code transformations by program equivalence checking, Lecture Notes in Computer Science, vol.3443, pp.221-236, 2005. ,
An approach for code generation in the sparse polyhedral framework, Parallel Computing, vol.53, pp.32-57, 2016. ,
Region configurations for realizability of lattice piecewise-linear models, Mathematical and Computer Modelling, vol.30, pp.17-27, 1999. ,
Symbolic parallelization of loop programs for massively parallel processor arrays, Application-Specific Systems, Architectures and Processors (ASAP), pp.1-9, 2013. ,
Symbolic mapping of loop programs onto processor arrays, Journal of Signal Processing Systems, vol.77, issue.1-2, pp.31-59, 2014. ,
Partitioning of processor arrays: A piecewise regular approach. Integration, the VLSI journal, vol.14, pp.297-332, 1993. ,
Journal of VLSI signal processing systems for signal, image and video technology, vol.17, pp.5-20, 1997. ,
Language and compiler support for stream programs, 2009. ,
Enabling more optimizations in GRAPHITE: ignoring memory-based dependences, Proceedings of the 8th GCC Developper's Summit, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00551509
Compiling nested loop programs to process networks, 2007. ,
Realizations of the extended linearization model. Domain-specific processors: systems, architectures, modeling, and simulation, pp.171-191, 2002. ,
Classifying interprocess communication in process network representation of nested-loop programs, ACM Transactions on Embedded Computing Systems (TECS), vol.6, issue.2, p.13, 2007. ,
, Ugh: User-guided high-level synthesis
Enabling automatic pipeline utilization improvement in polyhedral process network implementations, Application-Specific Systems, Architectures and Processors (ASAP), pp.173-176, 2012. ,
Loop and data transformations for sparse matrix code, ACM SIGPLAN Notices, vol.50, pp.521-532, 2015. ,
ISL: An integer set library for the polyhedral model, ICMS, vol.6327, pp.299-302, 2010. ,
Handbook of Signal Processing Systems, Polyhedral Process Networks, pp.931-965, 2010. ,
Integer set coalescing, 5th International Workshop on Polyhedral Compilation Techniques (IMPACT'15), 2015. ,
Transitive closures of affine integer tuple relations and their overapproximations, International Static Analysis Symposium, pp.216-232, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00578052
Equivalence checking of static affine programs using widening to handle recurrences, ACM Transactions on Programming Languages and Systems, vol.34, issue.3, pp.1-35, 2012. ,
Pn: a tool for improved derivation of process networks, EURASIP journal on Embedded Systems, issue.1, pp.19-19, 2007. ,
,
Introduction to Graph Theory, 1999. ,
Automatically tuned linear algebra software, Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, SC '98, pp.1-27, 1998. ,
Roofline: an insightful visual performance model for multicore architectures, Communications of the ACM, vol.52, issue.4, pp.65-76, 2009. ,
Automated Program Recognition by Graph Parsing. PhD thesis, MIT, 1992. ,
A data locality optimizing algorithm, ACM Sigplan Notices, vol.26, pp.30-44, 1991. ,
Iteration space tiling for memory hierarchies, Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing, pp.357-361, 1989. ,
Loop Tiling for Parallelism, 2000. ,
Quark users' guide, 2011. ,
Alphaz: A system for design space exploration in the polyhedral model, Languages and Compilers for Parallel Computing, 25th International Workshop, LCPC 2012, pp.17-31, 2012. ,
Energy-efficient cnn implementation on a deeply pipelined fpga cluster, Proceedings of the 2016 International Symposium on Low Power Electronics and Design, pp.326-331, 2016. ,
Hierarchical overlapped tiling, Proceedings of the Tenth International Symposium on Code Generation and Optimization, pp.207-218, 2012. ,
Solving out of order communication using CAM memory: an implementation, 13th Annual Workshop on Circuits, Systems and Signal Processing, 2002. ,
Translation and Run-Time Validation of Loop Transformations, Formal Methods in System Design, vol.27, issue.3, pp.335-360, 2005. ,