OpenTuner: An Extensible Framework for Program Autotuning, Proc. of the 23rd Intl. Conf. on Parallel Architectures and Compilation (PACT '14), pp.303-316, 2014. ,
Fast Local Laplacian Filters: Theory and Applications, ACM Trans. Graph, vol.33, p.167, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01063419
Compiler Transformations for High-performance Computing, ACM Comput. Surv, vol.26, pp.345-420, 1994. ,
Tiling Stencil Computations to Maximize Parallelism, Proc. of the Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC '12), vol.40, 2012. ,
Tiling and Optimizing Time-iterated Computations on Periodic Domains, Proc. of the 23rd Intl. Conf. on Parallel Architectures and Compilation (PACT '14), pp.39-50, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01257240
Diamond tiling: Tiling techniques to maximize parallelism for stencil computations, IEEE Transactions on Parallel and Distributed Systems, vol.28, pp.1285-1298, 2017. ,
A Practical Automatic Polyhedral Parallelizer and Locality Optimizer, Proc. of the 29th ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI '08), pp.101-113, 2008. ,
A Multiresolution Spline with Application to Image Mosaics, ACM Trans. Graph, vol.2, issue.4, pp.217-236, 1983. ,
Real-time Edge-aware Image Processing with the Bilateral Grid, ACM SIGGRAPH 2007 Papers (SIGGRAPH '07), 2007. ,
Transforming Loop Chains via Macro Dataflow Graphs, Proc. of the 2018 Intl. Symp. on Code Generation and Optimization, pp.265-277, 2018. ,
The Triangle Method for Saving Startup Time in Parallel Computers, Distributed Memory Computing Conf., 1990., Proc. of the Fifth, pp.568-572, 1990. ,
Some efficient solutions to the affine scheduling problem, Part II. Multidimensional time. Intl. Journal of Parallel Programming, vol.21, pp.389-420, 1992. ,
Modeling the performance of geometric multigrid stencils on multicore computer architectures, SIAM Journal on Scientific Computing, vol.37, pp.194-216, 2015. ,
Hybrid Hexagonal/Classical Tiling for GPUs, Proc. of Annual IEEE/ACM Intl. Symp. on Code Generation and Optimization (CGO '14), vol.66, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00911177
Split Tiling for GPUs: Automatic Parallelization Using Trapezoidal Tiles, Proc. of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, pp.24-31, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00786812
Polyhedral AST Generation Is More Than Scanning Polyhedra, ACM Trans. Program. Lang. Syst, vol.37, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01257239
The relation between diamond tiling and hexagonal tiling, Parallel Processing Letters, vol.24, p.1441002, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01257248
A combined corner and edge detector, Alvey vision conference, vol.15, pp.10-5244, 1988. ,
High-performance Code Generation for Stencil Computations on GPU Architectures, Proc. of the 26th ACM Intl. Conf. on Supercomputing (ICS '12), pp.311-320, 2012. ,
Multi-level Tiling: M for the Price of One, Proc. of the 2007 ACM/IEEE Conf. on Supercomputing (SC '07), vol.51, 2007. ,
Effective Automatic Parallelization of Stencil Computations, Proc. of the 28th ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI '07), pp.235-244, 2007. ,
Multidimensional Intratile Parallelization for Memory-Starved Stencil Computations, ACM Trans. Parallel Comput, vol.4, 2017. ,
Automatically Scheduling Halide Image Processing Pipelines, ACM Trans. Graph, vol.35, 2016. ,
PolyMage: Automatic Optimization for Image Processing Pipelines, Proc. of the Twentieth Intl. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15), pp.429-443, 2015. ,
Mapping the FDTD Application to Many-Core Chip Architectures, Intl. Conf. on Parallel Processing, pp.309-316, 2009. ,
An Optimizing Code Generator for a Class of Lattice-Boltzmann Computations, ACM Trans. Archit. Code Optim, vol.12, issue.14, 2015. ,
Local Laplacian Filters: Edge-aware Image Processing with a Laplacian Pyramid, Commun. ACM, vol.58, pp.81-91, 2015. ,
, Bilateral filtering: Theory and applications. Foundations and Trends® in Computer Graphics and Vision, vol.4, pp.1-73, 2009.
Static Analysis of Upper and Lower Bounds on Dependences and Parallelism, ACM Trans. Program. Lang. Syst, vol.16, issue.4, pp.1248-1278, 1994. ,
Decoupling Algorithms from Schedules for Easy Optimization of Image Processing Pipelines, ACM Trans. Graph, vol.31, issue.4, 2012. ,
Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines, Proc. of the 34th ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI '13), 2013. ,
, , pp.519-530
Automatic Partitioning of Parallel Loops with Parallelepiped-Shaped Tiles, IEEE Trans. Parallel Distrib. Syst, vol.13, pp.460-470, 2002. ,
URL : https://hal.archives-ouvertes.fr/hal-00807408
Locality Aware Concurrent Start for Stencil Applications, Proc. of the 13th Annual IEEE/ACM Intl. Symp. on Code Generation and Optimization (CGO '15), pp.157-166, 2015. ,
Cache Accurate Time Skewing in Iterative Stencil Computations, Proc. of the 2011 Intl. Conf. on Parallel Processing (ICPP '11), pp.571-581, 2011. ,
José Ignacio Gómez, Christian Tenllado, and Francky Catthoor, ACM Trans. Archit. Code Optim, vol.9, p.54, 2013. ,
Optimal Parallelogram Selection for Hierarchical Tiling, ACM Trans. Archit. Code Optim, vol.11, p.58, 2015. ,
Hierarchical Overlapped Tiling, Proc. of the Tenth Intl. Symp. on Code Generation and Optimization (CGO '12), pp.207-218, 2012. ,
Modeling the conflicting demands of parallelism and Temporal/Spatial locality in affine scheduling, Proc. of the 27th Intl. Conf. on Compiler Construction, pp.3-13, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01751823