R. Baghdadi, U. Beaugnon, A. Cohen, T. Grosser, M. Kruse et al., PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming, 2015 International Conference on Parallel Architecture and Compilation (PACT), 2015.
DOI : 10.1109/PACT.2015.17

URL : https://hal.archives-ouvertes.fr/hal-01257236

R. Baghdadi, A. Cohen, T. Grosser, S. Verdoolaege, A. Lokhmotov et al., PENCIL Language Specification Available: https, 2015.

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, A practical automatic polyhedral parallelizer and locality optimizer, ACM SIGPLAN conference on Programming Language Design and Implementation, pp.101-113, 2008.
DOI : 10.1145/1375581.1375595

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.151.5126

J. Dean and S. Ghemawat, MapReduce, Communications of the ACM, vol.51, issue.1, pp.107-113, 2008.
DOI : 10.1145/1327452.1327492

S. J. Deitz, B. L. Chamberlain, and L. Snyder, High-level language support for user-defined reductions, The Journal of Supercomputing, vol.23, issue.1, pp.23-37, 2002.
DOI : 10.1023/A:1015781018449

J. Doerfert, K. Streit, S. Hack, and Z. Benaissa, Polly's polyhedral scheduling in the presence of reductions, 2015.

M. Frigo, P. Halpern, C. E. Leiserson, and S. Lewin-berlin, Reducers and other Cilk++ hyperobjects, Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, SPAA '09, pp.79-90, 2009.
DOI : 10.1145/1583991.1584017

G. Gupta and S. V. Rajopadhye, Simplifying reductions, ACM Symposium on Principles of Programming Languages (POPL), pp.30-41, 2006.

L. Howes, A. Lokhmotov, A. F. Donaldson, and P. H. Kelly, Towards Metaprogramming for Parallel Systems on a Chip, Proceedings of the 2009 International Conference on Parallel Processing, ser. Euro-Par, pp.36-45, 2010.
DOI : 10.1007/978-3-642-14122-5_7

F. Irigoin, P. Jouvelot, and R. Triolet, Semantical interprocedural parallelization: An overview of the pips project, ACM International Conf. on Supercomputing (ICS), 1991.
URL : https://hal.archives-ouvertes.fr/hal-00984684

P. Jouvelot, Parallelization by semantic detection of reductions, ESOP 86, pp.223-236, 1986.
DOI : 10.1007/3-540-16442-1_17

P. Jouvelot and B. Dehbonei, A unified semantic approach for the vectorization and parallelization of generalized reductions, Proceedings of the 3rd international conference on Supercomputing , ICS '89, pp.186-194, 1989.
DOI : 10.1145/318789.318810

M. Harris, Optimizing parallel reduction in CUDA Available: https://docs.nvidia.com/cuda/samples/6_ Advanced/reduction/doc/reduction.pdf [15] Microsoft Parallel patterns library Available: https://msdn.microsoft.com/en-us/library/dd470426 MPI-2: Extensions to the message-passing interface, 1996.

L. Nardi, B. Bodin, M. Z. Zia, J. Mawer, A. Nisbet et al., Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM, 2015 IEEE International Conference on Robotics and Automation (ICRA), p.2167, 1410.
DOI : 10.1109/ICRA.2015.7140009

URL : http://arxiv.org/abs/1410.2167

. Nvidia, CUB's collective primitives Available: https://nvlabs.github

. Nvidia, Thrust C++ library Available: https://developer.nvidia.com/thrust/ [20] Nvidia forum Faster parallel reductions on Kepler

S. S. Pinter and R. Y. Pinter, Program optimization and parallelization using idioms, ACM Transactions on Programming Languages and Systems, vol.16, issue.3, pp.305-327, 1994.
DOI : 10.1145/177492.177494

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.108.2706

B. Pottenger and R. Eigenmann, Idiom recognition in the Polaris parallelizing compiler, Proceedings of the 9th international conference on Supercomputing , ICS '95, pp.444-448, 1995.
DOI : 10.1145/224538.224655

W. Pugh and D. Wonnacott, Static analysis of upper and lower bounds on dependences and parallelism, ACM Transactions on Programming Languages and Systems, vol.16, issue.4, pp.1248-1278, 1994.
DOI : 10.1145/183432.183525

L. Rauchwerger and D. A. Padua, The lrpd test: Speculative run-time parallelization of loops with privatization and reduction parallelization Parallel and Distributed Systems, IEEE Transactions on, vol.10, issue.2, pp.160-180, 1999.

X. Redon and P. Feautrier, Detection of recurrences in sequential programs with loops, Parallel Architectures and Languages Europe (PARLE, pp.132-145, 1993.
DOI : 10.1007/3-540-56891-3_11

X. Redon and P. Feautrier, Scheduling reductions, Proceedings of the 8th international conference on Supercomputing , ICS '94, pp.117-125, 1994.
DOI : 10.1145/181181.181319

X. Redon and P. Feautrier, DETECTION OF SCANS, Parallel Algorithms and Applications, vol.9, issue.3-4, pp.229-263, 2000.
DOI : 10.1145/318789.318810

J. Reinders, Intel Threading Building Blocks, 2007.

M. C. Rinard and M. S. Lam, The design, implementation, and evaluation of Jade, ACM Transactions on Programming Languages and Systems, vol.20, issue.3, pp.483-545, 1998.
DOI : 10.1145/291889.291893

K. Stock, M. Kong, T. Grosser, L. Pouchet, F. Rastello et al., A framework for enhancing data reuse via associative reordering, ACM SIGPLAN Notices, pp.65-76, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01016093

T. Suganuma, H. Komatsu, and T. Nakatani, Detection and global optimization of reduction operations for distributed parallel machines, Proceedings of the 10th international conference on Supercomputing , ICS '96, pp.18-25, 1996.
DOI : 10.1145/237578.237581

P. The and . Group, PGI accelerator compilers with OpenACC directives

A. Venkat, M. Shantharam, M. Hall, and M. M. Strout, Non-affine Extensions to Polyhedral Code Generation, Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '14, p.185, 2014.
DOI : 10.1145/2581122.2544141

S. Verdoolaege, J. C. Juega, A. Cohen, J. I. Gómez, C. Tenllado et al., Polyhedral parallel code generation for CUDA, ACM Transactions on Architecture and Code Optimization, vol.9, issue.4, 2013.
DOI : 10.1145/2400682.2400713

URL : https://hal.archives-ouvertes.fr/hal-00786677

S. Wienke, P. Springer, C. Terboven, and D. Mey, OpenACC ??? First Experiences with Real-World Applications, Euro-Par 2012 Parallel Processing, pp.859-870, 2012.
DOI : 10.1007/978-3-642-32820-6_85

D. N. Xu, S. Khoo, and Z. Hu, PType System: A Featherweight Parallelizability Detector, Programming Languages and Systems, pp.197-212, 2004.
DOI : 10.1007/978-3-540-30477-7_14

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.130.512