, Curabitur dictum gravida mauris. Nam arcu libero, nonummy eget, consectetuer id, vulputate a, magna. Donec vehicula augue eu neque. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra metus rhoncus sem. Nulla et lectus vestibulum urna fringilla ultrices. Phasellus eu tellus sit amet tortor gravida placerat. Integer sapien est, iaculis in, pretium quis, viverra ac, nunc. Praesent eget sem vel leo ultrices bibendum. Aenean faucibus. Morbi dolor nulla, malesuada eu, pulvinar at, mollis ac, nulla. Curabitur auctor semper nulla. Donec varius orci eget risus, Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis
A computational model for tensorflow: an introduction, Proceedings of the 1st ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, pp.1-7, 2017. ,
Automatic loop interchange, Proceedings of the 1984 SIGPLAN Symposium on Compiler Construction, SIGPLAN '84, pp.233-246, 1984. ,
Automatic translation of fortran programs to vector form, ACM Trans. Program. Lang. Syst, vol.9, issue.4, pp.491-542, 1987. ,
Opentuner: An extensible framework for program autotuning, Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT '14, pp.303-316, 2014. ,
The Pluto+ Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests, ACM Transactions on Programming Languages and Systems, vol.38, issue.3, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01425546
,
Code Generation in the Polyhedral Model Is Easier Than You Think, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, PACT '04, pp.7-16, 2004. ,
URL : https://hal.archives-ouvertes.fr/hal-00017260
Polyglot: a polyhedral loop transformation framework for a graphical dataflow language, International Conference on Compiler Construction, pp.123-143, 2013. ,
Pencil: A platform-neutral compute intermediate language for accelerator programming, Proc. Parallel Architectures and Compilation Techniques (PACT'15), 2015. ,
Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model, Compiler Construction, pp.132-146, 2008. ,
Javed Absar, Sven Van Haastregt, Alexey Kravets, and Alastair Donaldson. PENCIL Language Specification, INRIA, 2015. ,
A model for fusion and code motion in an automatic parallelizing compiler, Parallel Architectures and Compilation Techniques (PACT), 2010 19th International Conference on, pp.343-352, 2010. ,
A Model for Fusion and Code Motion in an Automatic Parallelizing Compiler, Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT '10, pp.343-352, 2010. ,
A Practical Automatic Polyhedral Parallelizer and Locality Optimizer, ACM SIGPLAN Notices, vol.43, issue.6, pp.101-113, 2008. ,
A practical automatic polyhedral parallelizer and locality optimizer, In ACM SIGPLAN conference on Programming Language Design and Implementation, vol.43, pp.101-113, 2008. ,
Chill: A framework for composing high-level loop transformations, Tvm: End-to-end optimization stack for deep learning, 2008. ,
High-level language support for user-defined reductions, The Journal of Supercomputing, vol.23, issue.1, pp.23-37, 2002. ,
Mapreduce: Simplified data processing on large clusters, Commun. ACM, vol.51, issue.1, pp.107-113, 2008. ,
Polly's polyhedral scheduling in the presence of reductions, 2015. ,
Parametric Integer Programming. Revue française d'automatique, d'informatique et de recherche opérationnelle, vol.22, pp.243-268, 1988. ,
Dataflow Analysis of Array and Scalar References, International Journal of Parallel Programming, vol.20, issue.1, pp.23-53, 1991. ,
Some Efficient Solutions to the Affine Scheduling Problem. I. One-Dimensional Time, vol.21, pp.313-347, 1992. ,
Some Efficient Solutions to the Affine Scheduling Problem. Part II. Multidimensional Time, International Journal of Parallel Programming, vol.21, issue.6, pp.389-420, 1992. ,
Reducers and other cilk++ hyperobjects, Proceedings of the Twenty-first Annual Symposium on Parallelism in Algorithms and Architectures, SPAA '09, pp.79-90, 2009. ,
Polyhedron Model, Encyclopedia of Parallel Computing, pp.1581-1592, 2011. ,
Polly -Performing Polyhedral Optimizations on a Low-Level Intermediate Representation, Parallel Processing Letters, vol.22, issue.04, p.1250010, 2012. ,
Polly-performing polyhedral optimizations on a low-level intermediate representation, Parallel Processing Letters, vol.22, issue.04, p.1250010, 2012. ,
Tensor flow xla ,
Simplifying reductions, POPL, vol.6, pp.30-41, 2006. ,
A stencil compiler for short-vector simd architectures, Proceedings of the 27th international ACM conference on International conference on supercomputing, pp.13-24, 2013. ,
A benchmark for rgb-d visual odometry, 3d reconstruction and slam, Robotics and automation (ICRA), 2014 IEEE international conference on, pp.1524-1531, 2014. ,
, The ANSI C standard (C99), 1999.
Supernode Partitioning, Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '88, pp.319-329, 1988. ,
An effective fusion and tile size model for optimizing image processing pipelines, Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.261-275 ,
A unified semantic approach for the vectorization and parallelization of generalized reductions, Proceedings of the 3rd international conference on Supercomputing, pp.186-194, 1989. ,
Parallelization by semantic detection of reductions, ESOP 86, pp.223-236, 1986. ,
Optimizing Compilers for Modern Architectures: A Dependence-Based Approach, 2002. ,
Maximizing loop parallelism and improving data locality via loop fusion and distribution, Languages and Compilers for Parallel Computing, pp.301-320, 1993. ,
A unifying framework for iteration reordering transformations, Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing, vol.1, pp.153-162, 1995. ,
When polyhedral transformations meet simd code generation, In ACM SIGPLAN Notices, vol.48, pp.127-138, 2013. ,
Differentiable programming for image processing and deep learning in halide, ACM Transactions on Graphics (TOG), vol.18, issue.4, p.139, 2018. ,
Optimizing parallel reduction in cuda ,
Automatically scheduling halide image processing pipelines, ACM Transactions on Graphics (TOG), vol.35, issue.4, p.83, 2016. ,
,
Mpi-2: Extensions to the message-passing interface, 1996. ,
Polymage: Automatic optimization for image processing pipelines, In ACM SIGARCH Computer Architecture News, vol.43, pp.429-443, 2015. ,
Introducing slambench, a performance and accuracy benchmarking methodology for, Mixed and augmented reality (ISMAR), 2011 10th IEEE international symposium on, pp.127-136, 2011. ,
Cub's collective primitives ,
, Nvidia. Thrust c++ library
, Nvidia forum. Faster parallel reductions on kepler
, Openmp 3.0 specification
, OpenMP forum
Combined iterative and modeldriven optimization in an automatic parallelization framework, Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp.549-562, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00551067
Iterative optimization in the polyhedral model: Part i, one-dimensional time, Code Generation and Optimization, 2007. CGO'07. International Symposium on, pp.179-197, 2006. ,
URL : https://hal.archives-ouvertes.fr/hal-01257281
Pytorch: Tensors and dynamic neural networks in python with strong gpu acceleration, 2017. ,
Idiom recognition in the polaris parallelizing compiler, Proceedings of the 9th international conference on Supercomputing, pp.444-448, 1995. ,
Program optimization and parallelization using idioms, ACM Transactions on Programming Languages and Systems (TOPLAS), vol.16, issue.3, pp.305-327, 1994. ,
Static Analysis of Upper and Lower Bounds on Dependences and Parallelism, ACM Trans. Program. Lang. Syst, vol.16, issue.4, pp.1248-1278, 1994. ,
Static analysis of upper and lower bounds on dependences and parallelism, ACM Transactions on Programming Languages and Systems (TOPLAS), vol.16, issue.4, pp.1248-1278, 1994. ,
Intel Threading Building Blocks, 2007. ,
Detection of recurrences in sequential programs with loops, PARLE'93 Parallel Architectures and Languages Europe, pp.132-145, 1993. ,
Scheduling reductions, Proceedings of the 8th international conference on Supercomputing, pp.117-125, 1994. ,
Detection of scans, PARALLEL ALGORITHMS AND APPLICATION, vol.15, issue.3-4, pp.229-263, 2000. ,
Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines, ACM SIGPLAN Notices, vol.48, issue.6, pp.519-530, 2013. ,
The lrpd test: Speculative run-time parallelization of loops with privatization and reduction parallelization. Parallel and Distributed Systems, IEEE Transactions on, vol.10, issue.2, pp.160-180, 1999. ,
Automatic Selection of High Order Transformations in the IBM XL Fortran Compilers, IBM J. Res. & Dev, vol.41, issue.3, 1997. ,
A framework for enhancing data reuse via associative reordering, In ACM SIGPLAN Notices, vol.49, pp.65-76, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01016093
Detection and global optimization of reduction operations for distributed parallel machines, Proceedings of the 10th international conference on Supercomputing, pp.18-25, 1996. ,
R-stream: A parametric high level compiler, Proceedings of HPEC, 2006. ,
Oil and Water Can Mix: An Integration of Polyhedral and AST-Based Transformations, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, pp.287-298, 2014. ,
Graphite two years after: First lessons learned from real-world polyhedral compilation, International Workshop on Languages and Compilers for Parallel Computing, pp.57-72, 2010. ,
Pgi accelerator compilers with openacc directives ,
Polyhedral-model guided loop-nest auto-vectorization, Parallel Architectures and Compilation Techniques, 2009. PACT'09. 18th International Conference on, pp.327-337, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00645325
Live Range Reordering, 6th Workshop on Polyhedral Compilation Techniques (IMPACT, Associated with HiPEAC), 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01257224
Polyhedral parallel code generation for cuda, ACM Transactions on Architecture and Code Optimization (TACO), vol.9, issue.4, p.54, 2013. ,
José Ignacio Gómez, Christian Tenllado, and Francky Catthoor. Polyhedral Parallel Code Generation for CUDA, vol.9, 2013. ,
Isl: An Integer Set Library for the Polyhedral Model, Mathematical Software -ICMS 2010, number 6327 in Lecture Notes in Computer Science, pp.299-302, 2010. ,
Counting Affine Calculator and Applications, First International Workshop on Polyhedral Compilation Techniques (IMPACT'11), 2011. ,
Schedule Trees, 4th Workshop on Polyhedral Compilation Techniques (IMPACT, Associated with HiPEAC), p.9, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00911894
Scheduling for ppcg, Report CW, vol.706, 2017. ,
Polyhedral parallel code generation for CUDA, ACM Transactions on Architecture and Code Optimization (TACO), 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00786677
Joint scheduling and layout optimization to enable multi-level vectorization, IMPACT-2: 2nd International Workshop on Polyhedral Compilation Techniques, 2012. ,
Counting integer points in parametric polytopes using barvinok's rational functions, Algorithmica, vol.48, issue.1, pp.37-66, 2007. ,
Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions, Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, p.185, 2014. ,
Loop skewing: The wavefront method revisited, Int. J. Parallel Program, vol.15, issue.4, pp.279-293, 1986. ,
Iteration space tiling for memory hierarchies, Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing, pp.357-361, 1989. ,
High Performance Compilers for Parallel Computing, 1995. ,
Openacc-first experiences with real-world applications, Euro-Par 2012 Parallel Processing, pp.859-870, 2012. ,
Ptype system: A featherweight parallelizability detector, Programming Languages and Systems, pp.197-212 ,
, , 2004.
Scan detection and parallelization in inherently sequential nested loop programs, Proceedings of the Tenth International Symposium on Code Generation and Optimization, pp.74-83, 2012. ,