A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, Parallel tiled QR factorization for multicore architectures. Concurrency and Computation: Practice and Experience, pp.1573-1590, 2008.
DOI : 10.1007/978-3-540-68111-3_67
URL : http://arxiv.org/abs/0707.3548

G. Quintana-ortí, E. S. Quintana-ortí, E. Chan, F. G. Van-zee, and R. A. Van-de-geijn, Scheduling of QR Factorization Algorithms on SMP and Multi-Core Architectures, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008), 2008.
DOI : 10.1109/PDP.2008.37

E. Agullo, C. Augonnet, J. Dongarra, H. Ltaief, R. Namyst et al., Faster, Cheaper, Better ? a Hybridization Methodology to Develop Linear Algebra Software for GPUs, GPU Computing Gems, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00547847

E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, and J. Langou, Hatem Ltaief, and Stanimire Tomov. LU factorization for accelerator-based systems, The 9th IEEE/ACS International Conference on Computer Systems and Applications, AICCSA, pp.217-224, 2011.

E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, H. Ltaief et al., QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, 2011 IEEE International Parallel & Distributed Processing Symposium, pp.932-943, 2011.
DOI : 10.1109/IPDPS.2011.90
URL : https://hal.archives-ouvertes.fr/inria-00547614

G. Quintana-ortí, F. D. Igual, E. S. Quintana-ortí, and R. A. Van-de-geijn, Solving dense linear systems on platforms with multiple hardware accelerators, ACM SIGPLAN Notices, vol.44, issue.4, pp.121-130, 2009.
DOI : 10.1145/1594835.1504196

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, A. Haidar et al., Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp.1432-1441, 2011.
DOI : 10.1109/IPDPS.2011.299

G. Field, E. Van-zee, R. A. Chan, E. S. Van-de-geijn, G. Quintana-orti et al., The libflame Library for Dense Matrix Computations, Computing in Science and Engineering, vol.11, issue.6, pp.56-63, 2009.

X. Lacoste, M. Faverge, P. Ramet, S. Thibault, and G. Bosilca, Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014.
DOI : 10.1109/IPDPSW.2014.9
URL : https://hal.archives-ouvertes.fr/hal-00925017

E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, Multifrontal QR Factorization for Multicore Architectures over Runtime Systems, Euro-Par 2013 Parallel Processing, pp.521-532, 2013.
DOI : 10.1007/978-3-642-40047-6_53
URL : https://hal.archives-ouvertes.fr/hal-01220611

E. Agullo, L. Giraud, A. Guermouche, S. Nakov, and J. Roman, Task-based Conjugate-Gradient for multi-GPUs platforms
URL : https://hal.archives-ouvertes.fr/hal-00767368

H. Ltaief and R. Yokota, Data-driven execution of fast multipole methods, Concurrency and Computation: Practice and Experience, vol.26, issue.11
DOI : 10.1002/cpe.3132

E. Agullo, B. Bramas, O. Coulaud, E. Darve, M. Messner et al., Task-Based FMM for Multicore Architectures, SIAM Journal on Scientific Computing, vol.36, issue.1, pp.66-93, 2014.
DOI : 10.1137/130915662
URL : https://hal.archives-ouvertes.fr/hal-00807368

B. Lize, G. Sylvand, E. Agullo, and S. Thibault, A task-based Hmatrix solver for acoustic and electromagnetic problems on multicore architectures, SciCADE, the International Conference on Scientific Computation and Differential Equations, 2013.

R. Kriemann, H -LU Factorization on Many-Core Systems, Preprint 5, Max-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig, 2014.

R. Allen and K. Kennedy, Optimizing Compilers for Modern Architectures: A Dependence-based Approach, 2001.

M. Cosnard, E. Jeannot, and T. Yang, Symbolic Partitionning and Scheduling of Parameterized Task Graphs, IEEE International Conference on Parallel And Distributed Systems (ICPADS'98, 1998.
URL : https://hal.archives-ouvertes.fr/inria-00098553

A. Yarkhan, J. Kurzak, and J. Dongarra, QUARK users' guide: QUeueing And Runtime for Kernels, 2011.

A. Duran, J. M. Perez, R. M. Ayguadé, E. Badia, and J. Labarta, Extending the OpenMP Tasking Model to Allow Dependent Tasks, OpenMP in a New Era of Parallelism, 4th International Workshop, pp.111-122, 2008.
DOI : 10.1007/978-3-540-79561-2_10

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par, pp.187-198, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00384363

E. Chan, E. S. Quintana-orti, G. G. Quintana-orti, and R. Van-de-geijn, Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures, Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures , SPAA '07, pp.116-125, 2007.
DOI : 10.1145/1248377.1248397

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, T. Herault et al., PaRSEC: Exploiting Heterogeneity to Enhance Scalability, Computing in Science & Engineering, vol.15, issue.6, pp.36-45, 2013.
DOI : 10.1109/MCSE.2013.98

S. Delcourte, L. Fezoui, and N. Glinsky-olivier, A high-order Discontinuous Galerkin method for the seismic wave propagation, ESAIM: Proceedings, pp.70-89, 2009.
DOI : 10.1051/proc/2009020
URL : https://hal.archives-ouvertes.fr/hal-00868418

J. Virieux, wave propagation in heterogeneous media: Velocity???stress finite???difference method, GEOPHYSICS, vol.51, issue.4, pp.889-901, 1986.
DOI : 10.1190/1.1442147

M. Bernacki, S. Lanteri, and S. Piperno, TIME-DOMAIN PARALLEL SIMULATION OF HETEROGENEOUS WAVE PROPAGATION ON UNSTRUCTURED GRIDS USING EXPLICIT, NONDIFFUSIVE, DISCONTINUOUS GALERKIN METHODS, Journal of Computational Acoustics, vol.14, issue.01, pp.57-81, 2006.
DOI : 10.1142/S0218396X06002937
URL : https://hal.archives-ouvertes.fr/hal-00607725

C. Baldassari, Modélisation et simulation numérique pour la migration terrestre paréquationpar´paréquation d'ondes, 2009.

J. Jeffers and J. Reinders, Intel Xeon Phi Coprocessor High-Performance Programming, 2013.