J. Ansel, S. Kamil, K. Veeramachaneni, J. Ragan-kelley, J. Bosboom et al., OpenTuner: An Extensible Framework for Program Autotuning, Proc. of the 23rd Intl. Conf. on Parallel Architectures and Compilation (PACT '14), pp.303-316, 2014.

M. Aubry, S. Paris, S. W. Hasinoff, J. Kautz, and F. Durand, Fast Local Laplacian Filters: Theory and Applications, ACM Trans. Graph, vol.33, p.167, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01063419

D. F. Bacon, S. L. Graham, and O. J. Sharp, Compiler Transformations for High-performance Computing, ACM Comput. Surv, vol.26, pp.345-420, 1994.

V. Bandishti, I. Pananilath, and U. Bondhugula, Tiling Stencil Computations to Maximize Parallelism, Proc. of the Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC '12), vol.40, 2012.

U. Bondhugula, V. Bandishti, and A. Cohen, Tiling and Optimizing Time-iterated Computations on Periodic Domains, Proc. of the 23rd Intl. Conf. on Parallel Architectures and Compilation (PACT '14), pp.39-50, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01257240

U. Bondhugula, V. Bandishti, and I. Pananilath, Diamond tiling: Tiling techniques to maximize parallelism for stencil computations, IEEE Transactions on Parallel and Distributed Systems, vol.28, pp.1285-1298, 2017.

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, A Practical Automatic Polyhedral Parallelizer and Locality Optimizer, Proc. of the 29th ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI '08), pp.101-113, 2008.

J. Peter, E. H. Burt, and . Adelson, A Multiresolution Spline with Application to Image Mosaics, ACM Trans. Graph, vol.2, issue.4, pp.217-236, 1983.

J. Chen, S. Paris, and F. Durand, Real-time Edge-aware Image Processing with the Bilateral Grid, ACM SIGGRAPH 2007 Papers (SIGGRAPH '07), 2007.

E. C. Davis, M. M. Strout, and C. Olschanowsky, Transforming Loop Chains via Macro Dataflow Graphs, Proc. of the 2018 Intl. Symp. on Code Generation and Optimization, pp.265-277, 2018.

H. Eissfeller and S. Muller, The Triangle Method for Saving Startup Time in Parallel Computers, Distributed Memory Computing Conf., 1990., Proc. of the Fifth, pp.568-572, 1990.

P. Feautrier, Some efficient solutions to the affine scheduling problem, Part II. Multidimensional time. Intl. Journal of Parallel Programming, vol.21, pp.389-420, 1992.

P. Ghysels and W. Vanroose, Modeling the performance of geometric multigrid stencils on multicore computer architectures, SIAM Journal on Scientific Computing, vol.37, pp.194-216, 2015.

T. Grosser, A. Cohen, J. Holewinski, P. Sadayappan, and S. Verdoolaege, Hybrid Hexagonal/Classical Tiling for GPUs, Proc. of Annual IEEE/ACM Intl. Symp. on Code Generation and Optimization (CGO '14), vol.66, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00911177

T. Grosser, A. Cohen, H. J. Paul, J. Kelly, P. Ramanujam et al., Split Tiling for GPUs: Automatic Parallelization Using Trapezoidal Tiles, Proc. of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, pp.24-31, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00786812

T. Grosser, S. Verdoolaege, and A. Cohen, Polyhedral AST Generation Is More Than Scanning Polyhedra, ACM Trans. Program. Lang. Syst, vol.37, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01257239

T. Grosser, S. Verdoolaege, A. Cohen, and P. Sadayappan, The relation between diamond tiling and hexagonal tiling, Parallel Processing Letters, vol.24, p.1441002, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01257248

C. Harris and M. Stephens, A combined corner and edge detector, Alvey vision conference, vol.15, pp.10-5244, 1988.

J. Holewinski, L. Pouchet, and P. Sadayappan, High-performance Code Generation for Stencil Computations on GPU Architectures, Proc. of the 26th ACM Intl. Conf. on Supercomputing (ICS '12), pp.311-320, 2012.

D. Kim, L. Renganarayanan, D. Rostron, S. Rajopadhye, and M. M. Strout, Multi-level Tiling: M for the Price of One, Proc. of the 2007 ACM/IEEE Conf. on Supercomputing (SC '07), vol.51, 2007.

S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev et al., Effective Automatic Parallelization of Stencil Computations, Proc. of the 28th ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI '07), pp.235-244, 2007.

M. Tareq, G. Malas, H. Hager, D. E. Ltaief, and . Keyes, Multidimensional Intratile Parallelization for Memory-Starved Stencil Computations, ACM Trans. Parallel Comput, vol.4, 2017.

A. Ravi-teja-mullapudi, D. Adams, J. Sharlet, K. Ragan-kelley, and . Fatahalian, Automatically Scheduling Halide Image Processing Pipelines, ACM Trans. Graph, vol.35, 2016.

V. Ravi-teja-mullapudi, U. Vasista, and . Bondhugula, PolyMage: Automatic Optimization for Image Processing Pipelines, Proc. of the Twentieth Intl. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15), pp.429-443, 2015.

A. Daniel, G. R. Orozco, and . Gao, Mapping the FDTD Application to Many-Core Chip Architectures, Intl. Conf. on Parallel Processing, pp.309-316, 2009.

I. Pananilath, A. Acharya, V. Vasista, and U. Bondhugula, An Optimizing Code Generator for a Class of Lattice-Boltzmann Computations, ACM Trans. Archit. Code Optim, vol.12, issue.14, 2015.

S. Paris and S. W. Hasinoff, Local Laplacian Filters: Edge-aware Image Processing with a Laplacian Pyramid, Commun. ACM, vol.58, pp.81-91, 2015.

S. Paris, P. Kornprobst, J. Tumblin, and F. Durand, Bilateral filtering: Theory and applications. Foundations and Trends® in Computer Graphics and Vision, vol.4, pp.1-73, 2009.

W. Pugh and D. Wonnacott, Static Analysis of Upper and Lower Bounds on Dependences and Parallelism, ACM Trans. Program. Lang. Syst, vol.16, issue.4, pp.1248-1278, 1994.

J. Ragan-kelley, A. Adams, S. Paris, M. Levoy, S. Amarasinghe et al., Decoupling Algorithms from Schedules for Easy Optimization of Image Processing Pipelines, ACM Trans. Graph, vol.31, issue.4, 2012.

J. Ragan-kelley, C. Barnes, A. Adams, S. Paris, F. Durand et al., Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines, Proc. of the 34th ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI '13), 2013.

, , pp.519-530

F. Rastello and Y. Robert, Automatic Partitioning of Parallel Loops with Parallelepiped-Shaped Tiles, IEEE Trans. Parallel Distrib. Syst, vol.13, pp.460-470, 2002.
URL : https://hal.archives-ouvertes.fr/hal-00807408

S. Shrestha, R. Guang, J. Gao, A. Manzano, J. Marquez et al., Locality Aware Concurrent Start for Stencil Applications, Proc. of the 13th Annual IEEE/ACM Intl. Symp. on Code Generation and Optimization (CGO '15), pp.157-166, 2015.

R. Strzodka, M. Shaheen, D. Pajak, and H. Seidel, Cache Accurate Time Skewing in Iterative Stencil Computations, Proc. of the 2011 Intl. Conf. on Parallel Processing (ICPP '11), pp.571-581, 2011.

S. Verdoolaege, J. C. Juega, and A. Cohen, José Ignacio Gómez, Christian Tenllado, and Francky Catthoor, ACM Trans. Archit. Code Optim, vol.9, p.54, 2013.

X. Zhou, J. María, D. A. Garzarán, and . Padua, Optimal Parallelogram Selection for Hierarchical Tiling, ACM Trans. Archit. Code Optim, vol.11, p.58, 2015.

X. Zhou, J. Giacalone, M. J. Garzarán, R. H. Kuhn, Y. Ni et al., Hierarchical Overlapped Tiling, Proc. of the Tenth Intl. Symp. on Code Generation and Optimization (CGO '12), pp.207-218, 2012.

O. Zinenko, S. Verdoolaege, C. Reddy, J. Shirako, T. Grosser et al., Modeling the conflicting demands of parallelism and Temporal/Spatial locality in affine scheduling, Proc. of the 27th Intl. Conf. on Compiler Construction, pp.3-13, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01751823