A fast algorithm for particle simulations, Journal of Computational Physics, vol.7387, issue.2, pp.325-3480021, 1987. ,
A parallel version of the fast multipole method, Computers & Mathematics with Applications, vol.20, issue.7, pp.63-71, 1990. ,
DOI : 10.1016/0898-1221(90)90349-O
Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp.1-15, 2010. ,
DOI : 10.1109/IPDPS.2010.5470415
PetFMM-A dynamically load-balancing parallel fast multipole library, International Journal for Numerical Methods in Engineering, vol.19, issue.2, pp.403-428, 2011. ,
DOI : 10.1002/nme.2972
The fast multipole method on parallel clusters, multicore processors, and graphics processing units, Comptes Rendus M??canique, vol.339, issue.2-3, pp.185-193, 2011. ,
DOI : 10.1016/j.crme.2010.12.005
Fast multipole methods on a cluster of GPUs for the meshless simulation of turbulence, Computer Physics Communications, vol.180, issue.11, pp.2066-2078, 2009. ,
DOI : 10.1016/j.cpc.2009.06.009
Fast multipole methods on graphics processors, Journal of Computational Physics, vol.227, issue.18, pp.8290-8313, 2008. ,
DOI : 10.1016/j.jcp.2008.05.023
URL : http://drum.lib.umd.edu/bitstream/1903/7549/1/paper_gpu_fmm_revised_final.pdf
42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence, 1?62:12. [Online]. Available, 2009. ,
Optimizing the multipole-to-local operator in the fast multipole method for graphical processing units, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC '11, pp.105-1331, 2011. ,
DOI : 10.1002/nme.3240
A massively parallel adaptive fast-multipole method on heterogeneous architectures, Proceedings of the 2009 ACM/IEEE conference on Supercomputing, pp.1-11, 2009. ,
Faster, Cheaper, Better ? a Hybridization Methodology to Develop Linear Algebra Software for GPUs, GPU Computing Gems, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00547847
LU factorization for accelerator-based systems, 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA), pp.217-224, 2011. ,
DOI : 10.1109/AICCSA.2011.6126599
URL : https://hal.archives-ouvertes.fr/hal-00654193
QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, 2011 IEEE International Parallel & Distributed Processing Symposium, pp.932-943, 2011. ,
DOI : 10.1109/IPDPS.2011.90
URL : https://hal.archives-ouvertes.fr/inria-00547614
Solving dense linear systems on platforms with multiple hardware accelerators, ACM SIGPLAN Notices, vol.44, issue.4, pp.121-130, 2009. ,
DOI : 10.1145/1594835.1504196
Scheduling of QR Factorization Algorithms on SMP and Multi-Core Architectures, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008), 2008. ,
DOI : 10.1109/PDP.2008.37
Parallel tiled QR factorization for multicore architectures, Concurrency and Computation: Practice and Experience, pp.1573-1590, 2008. ,
Scheduling dense linear algebra operations on multicore processors, Concurrency and Computation: Practice and Experience, pp.15-44, 2010. ,
DOI : 10.1145/1377612.1377615
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.177.3294
Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA PLASMA users' guide, parallel linear algebra software for multicore architectures, IPDPS Workshops. IEEE, pp.1432-1441, 2009. ,
MAGMA users' guide, version 0.2, 2009. ,
The libflame library for dense matrix computations, Computing in Science and Engineering, vol.11, issue.6, pp.56-63, 2009. ,
QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, 2011 IEEE International Parallel & Distributed Processing Symposium, 2011. ,
DOI : 10.1109/IPDPS.2011.90
URL : https://hal.archives-ouvertes.fr/inria-00547614
Data-driven execution of fast multipole methods, Concurrency and Computation: Practice and Experience, vol.26, issue.11, 1203. ,
DOI : 10.1002/cpe.3132
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, Concurrency and Computation: Practice and Experience, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
The black-box fast multipole method, Journal of Computational Physics, vol.228, issue.23, pp.8712-8725, 2009. ,
DOI : 10.1016/j.jcp.2009.08.031
A kernel-independent adaptive fast multipole algorithm in two and three dimensions, Journal of Computational Physics, vol.196, issue.2, pp.591-626, 2004. ,
DOI : 10.1016/j.jcp.2003.11.021
Fast directional multilevel summation for oscillatory kernels based on Chebyshev interpolation, Journal of Computational Physics, vol.231, issue.4, pp.1175-1196, 2012. ,
DOI : 10.1016/j.jcp.2011.09.027
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, Proceedings of the 15th International Euro-Par Conference on Parallel Processing, pp.851-862, 2009. ,
DOI : 10.1109/TPDS.2003.1214317
DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.1-2, pp.37-51, 2012. ,
DOI : 10.1016/j.parco.2011.10.003
Harmony, Proceedings of the 17th international symposium on High performance distributed computing, HPDC '08, pp.197-200, 2008. ,
DOI : 10.1145/1383422.1383447
Sequoia: Programming the memory hierarchy Scaling hierarchical N-body simulations on GPU clusters, ACM/IEEE SC'06 Conference SC'10 USB Key, 2006. ,
A parallel hashed Oct-Tree N-body algorithm, Proceedings of the 1993 ACM/IEEE conference on Supercomputing , Supercomputing '93, pp.12-21, 1993. ,
DOI : 10.1145/169627.169640
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par, pp.187-198, 2009. ,