L. Greengard and V. Rokhlin, A fast algorithm for particle simulations, Journal of Computational Physics, vol.73, issue.2, pp.325-348, 1987.
DOI : 10.1016/0021-9991(87)90140-9

F. Sullivan and J. Dongarra, Guest editors' introduction: The top 10 algorithms, Computing in Science & Engineering, vol.2, issue.1, pp.22-23, 2000.

T. Hamada, T. Narumi, R. Yokota, K. Yasuoka, K. Nitadori et al., 42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, ser. SC '09, 2009.

A. Rahimian, I. Lashuk, S. K. Veerapaneni, A. Chandramowlishwaran, D. Malhotra et al., Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 2010.
DOI : 10.1109/SC.2010.42

J. Milthorpe, A. P. Rendell, and T. Huber, PGAS-FMM: Implementing a distributed fast multipole method using the X10 programming language, Concurrency and Computation: Practice and Experience, 2013.
DOI : 10.1002/cpe.3039

M. Abduljabbar, R. Yokota, and D. Keyes, Asynchronous Execution of the Fast Multipole Method Using Charm++, 2014.

H. Ltaief and R. Yokota, Data-driven execution of fast multipole methods Concurrency and Computation: Practice and Experience, 1935.

E. Agullo, B. Bramas, O. Coulaud, E. Darve, M. Messner et al., Task-Based FMM for Multicore Architectures, SIAM Journal on Scientific Computing, vol.36, issue.1, pp.66-93, 2014.
DOI : 10.1137/130915662

URL : https://hal.archives-ouvertes.fr/hal-00807368

E. Agullo, O. Aumage, B. Bramas, O. Coulaud, and S. Pitoiset, Bridging the gap between OpenMP 4.0 and native runtime systems for the fast multipole method Available: https, Inria, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01372022

E. Agullo, B. Bramas, O. Coulaud, E. Darve, M. Messner et al., Task-based FMM for heterogeneous architectures, Concurrency and Computation: Practice and Experience, vol.7490, issue.9, pp.2608-2629, 2016.
DOI : 10.1002/cpe.3723

URL : https://hal.archives-ouvertes.fr/hal-00974674

L. Greengard and V. Rokhlin, A new version of the Fast Multipole Method for the Laplace equation in three dimensions, Acta Numerica, vol.448, pp.229-269, 1997.
DOI : 10.1016/0009-2614(92)90053-P

M. S. Warren and J. K. Salmon, Astrophysical N-body simulations using hierarchical tree data structures, Proceedings Supercomputing '92, pp.570-576, 1992.
DOI : 10.1109/SUPERC.1992.236647

S. Ogata, T. J. Campbell, R. K. Kalia, A. Nakano, P. Vashishta et al., Scalable and portable implementation of the fast multipole method on parallel computers, Computer Physics Communications, vol.153, issue.3, pp.445-461, 2003.
DOI : 10.1016/S0010-4655(03)00246-7

J. Kurzak and B. M. Pettitt, Massively parallel implementation of a fast multipole method for distributed memory machines, Journal of Parallel and Distributed Computing, vol.65, issue.7, pp.870-881, 2005.
DOI : 10.1016/j.jpdc.2005.02.001

F. A. Cruz, M. G. Knepley, and L. A. Barba, PetFMM-A dynamically load-balancing parallel fast multipole library, International Journal for Numerical Methods in Engineering, vol.19, issue.2, pp.403-428, 2011.
DOI : 10.1002/nme.2972

O. Coulaud, P. Fortin, and J. Roman, Hybrid MPI-Thread Parallelization of the Fast Multipole Method, Sixth International Symposium on Parallel and Distributed Computing (ISPDC'07), p.52, 2007.
DOI : 10.1109/ISPDC.2007.29

URL : https://hal.archives-ouvertes.fr/inria-00131001

A. Chandramowlishwaran, S. Williams, L. Oliker, I. Lashuk, G. Biros et al., Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp.1-12, 2010.
DOI : 10.1109/IPDPS.2010.5470415

D. Malhotra, A. Gholami, and G. Biros, A Volume Integral Equation Stokes Solver for Problems with Variable Coefficients, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, pp.92-102, 2014.
DOI : 10.1109/SC.2014.13

T. Ishiyama, K. Nitadori, and J. Makino, 4.45 Pflops astrophysical N-body simulation on K computer -- The gravitational trillion-body problem, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-510, 2012.
DOI : 10.1109/SC.2012.3

J. Choi, A. Chandramowlishwaran, K. Madduri, and R. Vuduc, A CPU, Proceedings of Workshop on General Purpose Processing Using GPUs, GPGPU-7, pp.64-71, 2014.
DOI : 10.1145/2588768.2576787

M. S. Warren and J. K. Salmon, A parallel hashed Oct-Tree N-body algorithm, Proceedings of the 1993 ACM/IEEE conference on Supercomputing , Supercomputing '93, pp.12-21, 1993.
DOI : 10.1145/169627.169640

R. Allen and K. Kennedy, Optimizing Compilers for Modern Architectures: A Dependence- Based Approach, 2002.

J. Yu and R. Buyya, A taxonomy of scientific workflow systems for grid computing, ACM SIGMOD Record, vol.34, issue.3, pp.44-49, 2005.
DOI : 10.1145/1084805.1084814

M. Cosnard and M. Loi, Automatic task graph generation techniques, System Sciences Proceedings of the Twenty-Eighth Hawaii International Conference on, pp.113-122, 1995.

Z. Budimli?, M. Burke, V. Cavé, K. Knobe, G. Lowney et al., Concurrent Collections, Scientific Programming, vol.18, issue.3-4, pp.203-217, 2010.
DOI : 10.1155/2010/521797

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, T. Hérault et al., PaRSEC: Exploiting Heterogeneity to Enhance Scalability, Computing in Science & Engineering, vol.15, issue.6, pp.36-45, 2013.
DOI : 10.1109/MCSE.2013.98

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, A. Haidar et al., Distibuted Dense Numerical Linear Algebra Algorithms on massively parallel architectures: DPLASMA Available: https, Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW'11), PDSEC 2011, pp.1432-1441, 2011.

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par, 2009.

A. Duran, J. M. Perez, R. M. Ayguadé, E. Badia, and J. Labarta, Extending the OpenMP Tasking Model to Allow Dependent Tasks, OpenMP in a New Era of Parallelism, 4th International Workshop, pp.111-122, 2008.
DOI : 10.1007/978-3-540-79561-2_10

A. Yarkhan, J. Kurzak, and J. Dongarra, QUARK users' guide: QUeueing And Runtime for Kernels, 2011.

E. Tejedor, M. Farreras, D. Grove, R. M. Badia, G. Almasi et al., A highproductivity task-based programming model for clusters, Concurrency and Computation: Practice and Experience, 2012.

A. Yarkhan, Dynamic task execution on shared and distributed memory architectures, 2012.

E. Agullo, O. Aumage, M. Faverge, N. Furmento, F. Pruvost et al., Achieving High Performance on Supercomputers with a Sequential Taskbased Programming Model Available: https, 2016.

C. Augonnet, J. Clet-ortega, S. Thibault, and R. Namyst, Data-Aware Task Scheduling on Multi-accelerator Based Platforms, 2010 IEEE 16th International Conference on Parallel and Distributed Systems, 2010.
DOI : 10.1109/ICPADS.2010.129

URL : https://hal.archives-ouvertes.fr/inria-00523937

W. Fong and E. Darve, The black-box fast multipole method, Journal of Computational Physics, vol.228, issue.23, pp.8712-8725, 2009.
DOI : 10.1016/j.jcp.2009.08.031