PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming, 2015 International Conference on Parallel Architecture and Compilation (PACT), 2015. ,
DOI : 10.1109/PACT.2015.17
URL : https://hal.archives-ouvertes.fr/hal-01257236
PENCIL Language Specification Available: https, 2015. ,
A practical automatic polyhedral parallelizer and locality optimizer, ACM SIGPLAN conference on Programming Language Design and Implementation, pp.101-113, 2008. ,
DOI : 10.1145/1375581.1375595
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.151.5126
MapReduce, Communications of the ACM, vol.51, issue.1, pp.107-113, 2008. ,
DOI : 10.1145/1327452.1327492
High-level language support for user-defined reductions, The Journal of Supercomputing, vol.23, issue.1, pp.23-37, 2002. ,
DOI : 10.1023/A:1015781018449
Polly's polyhedral scheduling in the presence of reductions, 2015. ,
Reducers and other Cilk++ hyperobjects, Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, SPAA '09, pp.79-90, 2009. ,
DOI : 10.1145/1583991.1584017
Simplifying reductions, ACM Symposium on Principles of Programming Languages (POPL), pp.30-41, 2006. ,
Towards Metaprogramming for Parallel Systems on a Chip, Proceedings of the 2009 International Conference on Parallel Processing, ser. Euro-Par, pp.36-45, 2010. ,
DOI : 10.1007/978-3-642-14122-5_7
Semantical interprocedural parallelization: An overview of the pips project, ACM International Conf. on Supercomputing (ICS), 1991. ,
URL : https://hal.archives-ouvertes.fr/hal-00984684
Parallelization by semantic detection of reductions, ESOP 86, pp.223-236, 1986. ,
DOI : 10.1007/3-540-16442-1_17
A unified semantic approach for the vectorization and parallelization of generalized reductions, Proceedings of the 3rd international conference on Supercomputing , ICS '89, pp.186-194, 1989. ,
DOI : 10.1145/318789.318810
Optimizing parallel reduction in CUDA Available: https://docs.nvidia.com/cuda/samples/6_ Advanced/reduction/doc/reduction.pdf [15] Microsoft Parallel patterns library Available: https://msdn.microsoft.com/en-us/library/dd470426 MPI-2: Extensions to the message-passing interface, 1996. ,
Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM, 2015 IEEE International Conference on Robotics and Automation (ICRA), p.2167, 1410. ,
DOI : 10.1109/ICRA.2015.7140009
URL : http://arxiv.org/abs/1410.2167
CUB's collective primitives Available: https://nvlabs.github ,
Thrust C++ library Available: https://developer.nvidia.com/thrust/ [20] Nvidia forum Faster parallel reductions on Kepler ,
Program optimization and parallelization using idioms, ACM Transactions on Programming Languages and Systems, vol.16, issue.3, pp.305-327, 1994. ,
DOI : 10.1145/177492.177494
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.108.2706
Idiom recognition in the Polaris parallelizing compiler, Proceedings of the 9th international conference on Supercomputing , ICS '95, pp.444-448, 1995. ,
DOI : 10.1145/224538.224655
Static analysis of upper and lower bounds on dependences and parallelism, ACM Transactions on Programming Languages and Systems, vol.16, issue.4, pp.1248-1278, 1994. ,
DOI : 10.1145/183432.183525
The lrpd test: Speculative run-time parallelization of loops with privatization and reduction parallelization Parallel and Distributed Systems, IEEE Transactions on, vol.10, issue.2, pp.160-180, 1999. ,
Detection of recurrences in sequential programs with loops, Parallel Architectures and Languages Europe (PARLE, pp.132-145, 1993. ,
DOI : 10.1007/3-540-56891-3_11
Scheduling reductions, Proceedings of the 8th international conference on Supercomputing , ICS '94, pp.117-125, 1994. ,
DOI : 10.1145/181181.181319
DETECTION OF SCANS, Parallel Algorithms and Applications, vol.9, issue.3-4, pp.229-263, 2000. ,
DOI : 10.1145/318789.318810
Intel Threading Building Blocks, 2007. ,
The design, implementation, and evaluation of Jade, ACM Transactions on Programming Languages and Systems, vol.20, issue.3, pp.483-545, 1998. ,
DOI : 10.1145/291889.291893
A framework for enhancing data reuse via associative reordering, ACM SIGPLAN Notices, pp.65-76, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01016093
Detection and global optimization of reduction operations for distributed parallel machines, Proceedings of the 10th international conference on Supercomputing , ICS '96, pp.18-25, 1996. ,
DOI : 10.1145/237578.237581
PGI accelerator compilers with OpenACC directives ,
Non-affine Extensions to Polyhedral Code Generation, Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '14, p.185, 2014. ,
DOI : 10.1145/2581122.2544141
Polyhedral parallel code generation for CUDA, ACM Transactions on Architecture and Code Optimization, vol.9, issue.4, 2013. ,
DOI : 10.1145/2400682.2400713
URL : https://hal.archives-ouvertes.fr/hal-00786677
OpenACC ??? First Experiences with Real-World Applications, Euro-Par 2012 Parallel Processing, pp.859-870, 2012. ,
DOI : 10.1007/978-3-642-32820-6_85
PType System: A Featherweight Parallelizability Detector, Programming Languages and Systems, pp.197-212, 2004. ,
DOI : 10.1007/978-3-540-30477-7_14
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.130.512