, Compilers: Principles, Techniques, and Tools, 2007.
Optimal Code Generation for Expression Trees, Proceedings of Seventh Annual ACM Symposium on Theory of Computing (STOC '75), pp.207-217, 1975. ,
DOI : 10.1145/800116.803770
Code Generation for Expressions with Common Subexpressions, J. ACM, vol.24, pp.146-160, 1977. ,
DOI : 10.1145/321992.322001
Generalization of the SethiUllman Algorithm for Register Allocation, Softw. Pract. Exper, vol.17, issue.6, pp.417-421, 1987. ,
Tiling Stencil Computations to Maximize Parallelism, Proceedings of the International Conference on High Performance Computing, 2012. ,
DOI : 10.1109/sc.2012.107
Compiler-Directed Transformation for Higher-Order Stencils, Parallel and Distributed Processing Symposium (IPDPS), pp.313-323, 2015. ,
DOI : 10.1109/ipdps.2015.103
URL : https://cloudfront.escholarship.org/dist/prd/content/qt2vh6s0wb/qt2vh6s0wb.pdf?t=ooy3al
Integrated Instruction Scheduling and Register Allocation Techniques, Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing (LCPC '98), pp.247-262, 1999. ,
DOI : 10.1007/3-540-48319-5_16
URL : http://www.cs.pitt.edu/~soffa/research/Comp/lcpc98.ps
A Practical Automatic Polyhedral Parallelizer and Locality Optimizer, Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '08), pp.101-113, 2008. ,
DOI : 10.1145/1375581.1375595
URL : http://www.cse.ohio-state.edu/~bondhugu/publications/uday-pldi08.pdf
Rematerialization, Proceedings of the ACM SIGPLAN 1992 Conference on Programming Language Design and Implementation (PLDI '92), pp.311-321, 1992. ,
DOI : 10.1145/143103.143143
Improvements to Graph Coloring Register Allocation, ACM Trans. Program. Lang. Syst, vol.16, issue.3, pp.428-455, 1994. ,
DOI : 10.1145/177492.177575
URL : http://www.cs.rice.edu/~grosul/612s01/toplas94.pdf
Register Allocation & Spilling via Graph Coloring, Proceedings of the 1982 SIGPLAN Symposium on Compiler Construction (SIGPLAN '82), pp.98-105, 1982. ,
DOI : 10.1145/872726.806984
A unified modulo scheduling and register allocation technique for clustered processors, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques, pp.175-184, 2001. ,
DOI : 10.1109/pact.2001.953298
URL : http://upcommons.upc.edu/bitstream/2117/101361/1/00953298.pdf
Graph-coloring and treescan register allocation using repairing, 2011 Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES), pp.45-54, 2011. ,
DOI : 10.1145/2038698.2038708
URL : http://www1.cs.ucr.edu/faculty/philip/papers/conferences/cases11/cases11-treescan.pdf
Introducing the Semi-stencil Algorithm, Proceedings of the 8th International Conference on Parallel Processing and Applied Mathematics: Part I (PPAM'09), pp.496-506, 2010. ,
Eliminating Redundancies in Sum-of-product Array Computations, Proceedings of the 15th International Conference on Supercomputing (ICS '01), pp.65-77, 2001. ,
Register Allocation and Promotion Through Combined Instruction Scheduling and Loop Unrolling, Proceedings of the 25th International Conference on Compiler Construction, pp.143-151, 2016. ,
, ExaCT: Center for Exascale Simulation of Combustion in Turbulence: Proxy App Software, 2013.
The Design and Implementation of FFTW3, Proc. IEEE, vol.93, issue.2, pp.216-231, 2005. ,
Anatomy of Highperformance Matrix Multiplication, ACM Trans. Math. Softw, vol.34, 2008. ,
Minimum Register Instruction Sequence Problem: Revisiting Optimal Code Generation for DAGs, Proceedings of the 15th International Parallel &Amp; Distributed Processing Symposium (IPDPS '01), pp.26-33, 2001. ,
Hybrid Hexagonal/Classical Tiling for GPUs, Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '14), vol.66, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00911177
MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures, Proceedings of the 29th ACM on International Conference on Supercomputing (ICS '15), pp.177-186, 2015. ,
Loop Transformation Recipes for Code Generation and Auto-tuning, Proceedings of the 22Nd International Conference on Languages and Compilers for Parallel Computing (LCPC'09), pp.50-64, 2010. ,
Orion: A Framework for GPU Occupancy Tuning, Proceedings of the 17th International Middleware Conference (Middleware '16), vol.18, p.13, 2016. ,
A Stencil Compiler for Shortvector SIMD Architectures, Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (ICS '13), pp.13-24, 2013. ,
, High-Performance Geometric Multigrid, HPGMG 2016, 2016.
I/O Complexity: The Red-blue Pebble Game, Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing (STOC '81), pp.326-333, 1981. ,
Libra: An Automated Code Generation and Tuning Framework for Registerlimited Stencils on GPUs, Proceedings of the ACM International Conference on Computing Frontiers (CF '16), pp.92-99, 2016. ,
A Global Progressive Register Allocator, Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '06), pp.204-215, 2006. ,
FFT Compiler Techniques, Compiler Construction: 13th International Conference, pp.217-231, 2004. ,
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation, Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (CGO '04), p.75, 2004. ,
Critical points based register-concurrency autotuning for GPUs, 2016 Design, Automation Test in Europe Conference Exhibition (DATE, pp.1273-1278, 2016. ,
Fusion-based Register Allocation, ACM Trans. Program. Lang. Syst, vol.22, issue.3, pp.431-470, 2000. ,
Linear Scan Register Allocation in the Context of SSA Form and Register Constraints, pp.229-246, 2002. ,
Combining Register Allocation and Instruction Scheduling, 1995. ,
PolyMage: Automatic Optimization for Image Processing Pipelines, Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15), pp.429-443, 2015. ,
A scheduler-sensitive global register allocator, Supercomputing '93. Proceedings, pp.804-813, 1993. ,
, NVIDIA CUDA Compiler Driver NVCC. docs.nvidia.com/ cuda/cuda-compiler-driver-nvcc, NVCC 2017, 2017.
, NVIDIA Profiler, 2017.
Register Allocation with Instruction Scheduling, Proceedings of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation (PLDI '93), pp.248-257, 1993. ,
Linear Scan Register Allocation, ACM Trans. Program. Lang. Syst, vol.21, pp.895-913, 1999. ,
Register Allocation by Puzzle Solving, Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '08), pp.216-226, 2008. ,
Forma: A DSL for Image Processing Applications to Target GPUs and Multi-core CPUs, Proc. 8th Workshop on General Purpose Processing Using GPUs, pp.109-120, 2015. ,
Resource Conscious Reuse-Driven Tiling for GPUs, Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT '16), pp.99-111, 2016. ,
Tree Register Allocation, Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture, pp.67-77, 2009. ,
Extended Linear Scan: An Alternate Foundation for Global Register Allocation, Proceedings of the 16th International Conference on Compiler Construction (CC'07), 2007. ,
, , pp.141-155
The Generation of Optimal Code for Arithmetic Expressions, J. ACM, vol.17, pp.715-728, 1970. ,
A Generalized Algorithm for Graph-coloring Register Allocation, Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI '04), pp.277-288, 2004. ,
Using The GNU Compiler Collection: A GNU Manual For GCC, 2009. ,
A Framework for Enhancing Data Reuse via Associative Reordering, Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '14, pp.65-76, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01016093
, SW4 2014. Seismic Wave Modelling (SW4)-Computational Infrastructure for Geodynamics, 2014.
Early Periodic Register Allocation on ILP Processors, Parallel Processing Letters, vol.14, issue.2, pp.287-313, 2004. ,
URL : https://hal.archives-ouvertes.fr/hal-00130623
Automatic Restructuring of GPU Kernels for Exploiting Inter-thread Data Locality, Proceedings of the 21st International Conference on Compiler Construction (CC'12), pp.21-40, 2012. ,
Scalable Kernel Fusion for Memory-bound GPU Applications, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '14), pp.191-202, 2014. ,
Automated GPU Kernel Transformations in Large-Scale Production Stencil Applications, Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '15), pp.259-270, 2015. ,
Software Pipelining with Register Allocation and Spilling, Proceedings of the 27th Annual International Symposium on Microarchitecture (MICRO 27, pp.95-99, 1994. ,
gpucc: An Open-source GPGPU Compiler, Proceedings of the 2016 International Symposium on Code Generation and Optimization (CGO '16, pp.105-116, 2016. ,
Enabling Coordinated Register Allocation and Thread-level Parallelism Optimization for GPUs, Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48), pp.395-406, 2015. ,
On Tiling as a Loop Transformation, Parallel Processing Letters, vol.07, pp.409-424, 1997. ,