C. Addisson, J. Lagrone, L. Huang, and B. Chapman, OpenMP 3.0 tasking implementation in OpenUH, 2009.

E. Agullo, B. Bramas, O. Coulaud, E. Darve, M. Messner et al., Task-Based FMM for Multicore Architectures, SIAM Journal on Scientific Computing, vol.36, issue.1, pp.66-93, 2014.
DOI : 10.1137/130915662
URL : https://hal.archives-ouvertes.fr/hal-00807368

E. Agullo, B. Bramas, O. Coulaud, E. Darve, M. Messner et al., Task-based FMM for heterogeneous architectures, Concurrency and Computation: Practice and Experience, pp.2608-2629, 2016.

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par, 2009.

E. Ayguadé, N. Copty, A. Duran, J. Hoeflinger, Y. Lin et al., The design of openmp tasks, Transactions on Parallel and Distributed Systems, 2009.

P. Blanchard, O. Coulaud, and E. Darve, Fast hierarchical algorithms for generating Gaussian random fields, Research Report, vol.8811, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01228519

G. Bosilca, A. Bouteiller, A. Danalis, T. Hérault, P. Lemarinier et al., Dague: A generic distributed dag engine for high performance computing, Parallel Computing, 2012.

J. Bueno, X. Martorell, R. M. Badia, E. Ayguadé, and J. Labarta, Implementing OmpSs support for regions of data in architectures with multiple address spaces, Proceedings of the 27th international ACM conference on International conference on supercomputing, ICS '13, 2013.
DOI : 10.1145/2464996.2465017

H. Casanova, A. Legrand, and Y. Robert, Parallel algorithms, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00789466

J. Choi, A. Chandramowlishwaran, K. Madduri, and R. Vuduc, Gpu hybrid implementation and model-driven scheduling of the fast multipole method, Proceedings of Workshop on General Purpose Processing Using GPUs, pp.64-64

M. Frigo, C. E. Leiserson, and K. H. Randall, The implementation of the cilk-5 multithreaded language, Conference on Programming Language Design and Implementation, 1998.

T. Gautier, J. V. Lima, N. Maillard, and B. Raffin, Locality-aware work stealing on multi-cpu and multi-gpu architectures, Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00780890

L. Greengard and V. Rokhlin, A fast algorithm for particle simulations, Journal of Computational Physics, vol.73, issue.2, pp.325-348, 1987.
DOI : 10.1016/0021-9991(87)90140-9

C. Liao, D. J. Quinlan, T. Panas, and B. R. De-supinski, A ROSE-Based OpenMP 3.0 Research Compiler Supporting Multiple Runtime Libraries, Beyond Loop Level Parallelism in OpenMP: Accelerators, Tasking and More, 2010.
DOI : 10.1007/978-3-642-13217-9_2

H. Ltaief and R. Yokota, Data-driven execution of fast multipole methods, Concurrency and Computation: Practice and Experience, pp.1935-1946, 2013.

R. Yokota and L. A. Barba, A tuned and scalable fast multipole method as a preeminent algorithm for exascale systems, International Journal of High Performance Computing Applications, vol.26, issue.4, pp.337-346, 2012.
DOI : 10.1177/1094342011429952