]. E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak et al., Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, issue.30, pp.12037-12066, 2009.
DOI : 10.1088/1742-6596/180/1/012037

URL : http://iopscience.iop.org/article/10.1088/1742-6596/180/1/012037/pdf

E. Agullo, L. Giraud, A. Guermouche, A. Haidar, and J. Roman, Parallel algebraic domain decomposition solver for the solution of augmented systems, Advances in Engineering Software, vol.60, issue.61, pp.23-30, 2013.
DOI : 10.1016/j.advengsoft.2012.07.004

URL : https://hal.archives-ouvertes.fr/hal-00719512

E. Agullo, P. R. Amestoy, A. Buttari, A. Guermouche, L. Jean-yves et al., Robust Memory-Aware Mappings for Parallel Multifrontal Factorizations, SIAM Journal on Scientific Computing, vol.38, issue.3, p.2016
DOI : 10.1137/130938505

URL : https://hal.archives-ouvertes.fr/hal-01334113

P. Amestoy, I. S. Duff, S. Pralet, and C. Voemel, Adapting a parallel sparse direct solver to architectures with clusters of SMPs, Parallel Computing, vol.29, issue.11-12, pp.1645-1668, 2003.
DOI : 10.1016/j.parco.2003.05.010

P. R. Amestoy, T. Davis, and I. S. Duff, An Approximate Minimum Degree Ordering Algorithm, SIAM Journal on Matrix Analysis and Applications, vol.17, issue.4, pp.886-905, 1996.
DOI : 10.1137/S0895479894278952

P. R. Amestoy, I. S. Duff, and J. Excellent, Multifrontal parallel distributed symmetric and unsymmetric solvers, Computer Methods in Applied Mechanics and Engineering, vol.184, issue.2-4, pp.501-520, 2000.
DOI : 10.1016/S0045-7825(99)00242-X

URL : https://hal.archives-ouvertes.fr/hal-00856651

P. R. Amestoy, I. S. Duff, J. Excellent, and X. S. Li, Analysis, tuning and comparison of two general sparse solvers for distributed memory computers, ACM Trans. Math. Softw, vol.27, issue.4, pp.388-420, 2001.
DOI : 10.2172/776597

URL : https://hal.archives-ouvertes.fr/hal-00856654

P. R. Amestoy, A. Guermouche, J. Excellent, and S. Pralet, Hybrid scheduling for the parallel solution of linear systems, Parallel Computing, vol.32, issue.2, pp.136-156, 2006.
DOI : 10.1016/j.parco.2005.07.004

URL : https://hal.archives-ouvertes.fr/hal-00358623

P. Amestoy, C. Ashcraft, O. Boiteau, A. Buttari, L. Jean-yves et al., Improving Multifrontal Methods by Means of Block Low-Rank Representations, SIAM Journal on Scientific Computing, vol.37, issue.3, pp.1451-1474, 2015.
DOI : 10.1137/120903476

URL : https://hal.archives-ouvertes.fr/hal-01237169

P. Amestoy, A. Buttari, L. Jean-yves, T. Excellent, and . Mary, On the Complexity of the Block Low-Rank Multifrontal Factorization, SIAM Journal on Scientific Computing, vol.39, issue.4, pp.2016-2019, 2016.
DOI : 10.1137/16M1077192

URL : https://hal.archives-ouvertes.fr/hal-01672943

R. Patrick, . Amestoy, S. Iain, S. Duff, C. Pralet et al., Adapting a parallel sparse direct solver to architectures with clusters of smps Parallel and distributed scientific and engineering computing, Parallel Computing, vol.29, issue.11, pp.1645-1668, 2003.

A. Aminfar, S. Ambikasaran, and E. Darve, A fast block low-rank dense solver with applications to finite-element matrices, Journal of Computational Physics, vol.304, pp.170-188, 2016.
DOI : 10.1016/j.jcp.2015.10.012

A. Aminfar and E. Darve, A fast, memory efficient and robust sparse preconditioner based on a multifrontal approach with applications to finite-element matrices, International Journal for Numerical Methods in Engineering, vol.1, issue.4, 2016.
DOI : 10.1002/nla.1680010405

J. Anton, C. Ashcraft, and . Weisbecker, A Block Low-Rank Multithreaded Factorization for Dense BEM Operators, SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP 2016), pp.72-78, 2016.

G. Antoniu, L. Bougé, and R. Namyst, An efficient and transparent thread migration scheme in the PM2 runtime system, pp.496-510, 1999.
DOI : 10.1007/BFb0097934

URL : https://hal.archives-ouvertes.fr/inria-00073068

J. Antony, P. P. Janes, and A. P. , Exploring Thread and Memory Placement on NUMA Architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport, HiPC, pp.338-352, 2006.
DOI : 10.1007/11945918_35

URL : http://cs.anu.edu.au/~Alistair.Rendell/papers/ThreadAndMemoryPlacement.Springer.pdf

D. L. Applegate, R. E. Bixby, V. Chvatal, and W. J. Cook, The Traveling Salesman Problem: A Computational Study (Princeton Series in Applied Mathematics), 2007.

A. Ashcraft, S. C. Eisenstat, and J. W. Liu, A Fan-In Algorithm for Distributed Sparse Numerical Factorization, SIAM Journal on Scientific and Statistical Computing, vol.11, issue.3, pp.593-599, 1990.
DOI : 10.1137/0911033

C. Ashcraft, S. C. Eisenstat, J. W. Liu, and A. Sherman, A comparison of three column based distributed sparse factorization schemes, Proc. Fifth SIAM Conf. on Parallel Processing for Scientific Computing, 1991.
DOI : 10.21236/ADA228143

C. Ashcraft, The Fan-Both Family of Column-Based Distributed Cholesky Factorization Algorithms, pp.159-190, 1993.
DOI : 10.1007/978-1-4613-8369-7_8

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, pp.30-31, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00384363

M. Barrault, B. Lathuilière, P. Ramet, and J. Roman, Efficient parallel resolution of the simplified transport equations in mixed-dual formulation, Journal of Computational Physics, vol.230, issue.5, pp.2004-2020, 2011.
DOI : 10.1016/j.jcp.2010.11.047

URL : https://hal.archives-ouvertes.fr/hal-00547406

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, A. Haidar et al., Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp.30-31, 2011.
DOI : 10.1109/IPDPS.2011.299

G. Bosilca, A. Bouteiller, A. Danalis, T. Hérault, P. Lemarinier et al., DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.12, p.2012

J. Briat, I. Ginzburg, M. Pasin, and B. Plateau, Athapascan runtime: Efficiency for irregular problems, pp.591-600, 1997.
DOI : 10.1007/BFb0002788

F. Broquedis, J. Clet-ortega, S. Moreaud, N. Furmento, B. Goglin et al., hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp.180-186, 2010.
DOI : 10.1109/PDP.2010.67

URL : https://hal.archives-ouvertes.fr/inria-00429889

A. Buttari, Fine-grained multithreading for the multifrontal QR factorization of sparse matrices, 2013. To appear in SIAM SISC and APO technical report number RT-APO-11-6

A. Buttari, J. Dongarra, J. Kurzak, J. Langou, P. Luszczek et al., The Impact of Multicore on Math Software, Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing, PARA'06, pp.1-10, 2007.
DOI : 10.1007/978-3-540-75755-9_1

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009.
DOI : 10.1016/j.parco.2008.10.002

Y. Campbell and T. A. Davis, Incomplete LU factorization: A multifrontal approach, 1995.

A. Casadei, Optimizations of hybrid sparse linear solvers relying on Schur complement and domain decomposition approaches, p.43, 2015.
URL : https://hal.archives-ouvertes.fr/tel-01228520

A. Casadei, P. Ramet, and J. Roman, An improved recursive graph bipartitioning algorithm for well balanced domain decomposition, 2014 21st International Conference on High Performance Computing (HiPC), pp.1-10, 2014.
DOI : 10.1109/HiPC.2014.7116878

URL : https://hal.archives-ouvertes.fr/hal-01056749

J. N. Chadwick and D. S. Bindel, An Efficient Solver for Sparse Linear Systems Based on Rank- Structured Cholesky Factorization, 2015.

F. Tony, P. S. Chan, and . Vassilevski, A framework for block ilu factorizations using block-size reduction, Mathematics of Computation, vol.64, issue.209, pp.129-156, 1995.

M. Chanaud, Conception d'un solveur haute performance de systèmes linéaires creux couplant des méthodes multigrilles et directes pour la résolution des équations de Maxwell 3D en régime harmonique discrétisées par éléments finis, 2009.

M. Chanaud, L. Giraud, D. Goudin, J. J. Pesqué, and J. Roman, A Parallel Full Geometric Multigrid Solver for Time Harmonic Maxwell Problems, SIAM Journal on Scientific Computing, vol.36, issue.2, pp.119-138, 2014.
DOI : 10.1137/130909512

URL : https://hal.archives-ouvertes.fr/hal-00847966

A. Chapman and Y. Saad, Deflated and augmented krylov subspace techniques. Numerical Linear Algebra with Applications, pp.43-66, 1997.
DOI : 10.1002/(sici)1099-1506(199701/02)4:1<43::aid-nla99>3.0.co;2-z

URL : ftp://ftp.cs.umn.edu/dept/users/saad/reports/FILES/umsi-95-181.ps.gz

P. Charrier and J. Roman, Algorithmic study and complexity bounds for a nested dissection solver, Numerische Mathematik, vol.8, issue.4, pp.463-476, 1989.
DOI : 10.1007/BF01396049

M. Cosnard, E. Jeannot, and T. Yang, Compact DAG representation and its symbolic scheduling, Journal of Parallel and Distributed Computing, vol.64, issue.8, pp.921-935, 2004.
DOI : 10.1016/j.jpdc.2004.05.001

URL : https://hal.archives-ouvertes.fr/inria-00099958

T. Davis, Direct Methods for Sparse Linear Systems, Society for Industrial and Applied Mathematics, 2006.
DOI : 10.1137/1.9780898718881

T. Davis, Multifrontal sparse qr factorization: Multicore, and gpu work in progress, 15th SIAM Conference on Parallel Processing for Scientific Computing, 2012.

T. A. Davis, Algorithm 915, SuiteSparseQR, ACM Transactions on Mathematical Software, vol.38, issue.1, 2011.
DOI : 10.1145/2049662.2049670

T. A. Davis and Y. Hu, The university of Florida sparse matrix collection, ACM Transactions on Mathematical Software, vol.38, issue.1, pp.1-1, 2011.
DOI : 10.1145/2049662.2049663

T. A. Davis, S. Rajamanickam, and W. M. Sid-lakhdar, A survey of direct methods for sparse linear systems, Acta Numerica, vol.1, issue.2, pp.383-566, 2016.
DOI : 10.1137/0907081

R. Diekmann, R. Preis, F. Schlimbach, and C. Walshaw, Shape-optimized mesh partitioning and load balancing for parallel adaptive FEM, Parallel Computing, vol.26, issue.12, pp.1555-1581, 2000.
DOI : 10.1016/S0167-8191(00)00043-0

I. S. Duff, A. M. Erisman, and J. K. Reid, Direct methods for sparse matrices, 1986.
DOI : 10.1093/acprof:oso/9780198508380.001.0001

I. S. Duff and J. K. Reid, The Multifrontal Solution of Indefinite Sparse Symmetric Linear, ACM Transactions on Mathematical Software, vol.9, issue.3, pp.302-325, 1983.
DOI : 10.1145/356044.356047

M. Faverge, Ordonnancement hybride statique-dynamique en algèbre linéaire creuse pour de grands clusters de machines NUMA et multi-coeurs, 2009.

M. Faverge and P. Ramet, Dynamic Scheduling for sparse direct Solver on NUMA architectures, PARA'08, pp.19-29, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00344026

J. Gaidamour, Conception d'un solveur linéaire creux parallèle hybride direct-itératif, 2009.

J. Gaidamour and P. Hénon, A Parallel Direct/Iterative Solver Based on a Schur Complement Approach, 2008 11th IEEE International Conference on Computational Science and Engineering, pp.98-105, 2008.
DOI : 10.1109/CSE.2008.36

URL : https://hal.archives-ouvertes.fr/hal-00353589

A. George, Nested Dissection of a Regular Finite Element Mesh, SIAM Journal on Numerical Analysis, vol.10, issue.2, pp.345-363, 1973.
DOI : 10.1137/0710032

A. George, M. T. Heath, J. W. Liu, and E. G. Ng, Sparse Cholesky Factorization on a Local-Memory Multiprocessor, SIAM Journal on Scientific and Statistical Computing, vol.9, issue.2, pp.327-340, 1988.
DOI : 10.1137/0909021

A. George and J. W. Liu, Computer Solution of Large Sparse Positive Definite Systems, p.7, 1981.

A. George and D. R. Mcintyre, On the Application of the Minimum Degree Algorithm to Finite Element Systems, SIAM Journal on Numerical Analysis, vol.15, issue.1, pp.90-112, 1978.
DOI : 10.1137/0715006

T. George, . Saxena, . Gupta, A. Singh, and . Choudhury, Multifrontal Factorization of Sparse SPD Matrices on GPUs, 2011 IEEE International Parallel & Distributed Processing Symposium, pp.372-383, 2011.
DOI : 10.1109/IPDPS.2011.44

P. Ghysels, X. S. Li, F. Rouet, S. Williams, and A. Napov, An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling, SIAM Journal on Scientific Computing, vol.38, issue.5, pp.358-384, 2016.
DOI : 10.1137/15M1010117

L. Giraud, A. Haidar, and Y. Saad, Sparse approximations of the Schur complement for parallel algebraic hybrid linear solvers in 3D, p.53, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00466828

D. Goudin, P. Hénon, M. Mandallena, K. Mer, F. Pellegrini et al., Outils numériques parallèles pour la résolution de très grands problèmes d'électromagnétisme, Séminaire sur l'Algorithmique Numérique Appliquée aux Problèmes Industriels, 2003.

D. Goudin, P. Hénon, F. Pellegrini, P. Ramet, J. Roman et al., Parallel sparse linear algebra and application to structural mechanics. Numerical Algorithms, pp.371-391, 2000.
URL : https://hal.archives-ouvertes.fr/inria-00346016

L. Grasedyck, R. Kriemann, and S. L. Borne, Parallel black box $$\mathcal {H}$$ -LU preconditioning for elliptic boundary value problems, Computing and Visualization in Science, vol.40, issue.1, pp.273-291, 2008.
DOI : 10.1007/s00791-008-0098-9

L. Grasedyck, W. Hackbusch, and R. Kriemann, Performance Of H-Lu Preconditioning For Sparse Matrices, Computational methods in applied mathematics, pp.336-349, 2008.
DOI : 10.2478/cmam-2008-0024

L. Grigori, J. A. Demmel, and X. S. Li, Parallel Symbolic Factorization for Sparse LU with Static Pivoting, SIAM Journal on Scientific Computing, vol.29, issue.3, pp.1289-1314, 2007.
DOI : 10.1137/050638102

A. Gupta, Recent Progress in General Sparse Direct Solvers, In Lecture Notes in Computer Science, vol.2073, pp.823-840, 2001.
DOI : 10.1007/3-540-45545-0_94

W. Hackbusch, Hierarchical Matrices: Algorithms and Analysis, Series in Computational Mathematics, pp.72-83, 2015.
DOI : 10.1007/978-3-662-47324-5

URL : https://link.springer.com/content/pdf/bfm%3A978-3-662-47324-5%2F1.pdf

R. W. Hamming, Error Detecting and Error Correcting Codes, Bell System Technical Journal, vol.29, issue.2, pp.147-160, 1950.
DOI : 10.1002/j.1538-7305.1950.tb00463.x

URL : https://calhoun.nps.edu/bitstream/10945/46756/1/Hamming_1982.pdf

M. T. Heath, E. G. Ng, and B. W. Peyton, Parallel Algorithms for Sparse Linear Systems, SIAM Review, vol.33, issue.3, pp.420-460, 1991.
DOI : 10.1137/1033099

B. Hendrickson and R. Leland, A multi-level algorithm for partitioning graphs, Supercomputing Proceedings of the IEEE/ACM SC95 Conference, pp.28-28, 1995.

B. Hendrickson and E. Rothberg, Improving the Run Time and Quality of Nested Dissection Ordering, SIAM Journal on Scientific Computing, vol.20, issue.2, pp.468-489, 1998.
DOI : 10.1137/S1064827596300656

B. Hendrickson and T. G. Kolda, Graph partitioning models for parallel computing, Parallel Computing, vol.26, issue.12, pp.1519-1534, 2000.
DOI : 10.1016/S0167-8191(00)00048-X

B. Hendrickson and R. Leland, An Improved Spectral Graph Partitioning Algorithm for Mapping Parallel Computations, SIAM Journal on Scientific Computing, vol.16, issue.2, pp.452-469, 1995.
DOI : 10.1137/0916028

P. Hénon, Distribution des Données et Régulation Statique des Calculs et des Communications pour la Résolution de Grands Systèmes Linéaires Creux par Méthode Directe, pp.11-14, 2001.

P. Hénon, P. Ramet, and J. Roman, A Mapping and Scheduling Algorithm for Parallel Sparse Fan-In Numerical Factorization???, Proceedings of EuroPAR'99, number 1685 in Lecture Notes in Computer Science, pp.1059-1067, 1999.
DOI : 10.1007/3-540-48311-X_148

P. Hénon, P. Ramet, and J. Roman, PaStiX: A Parallel Sparse Direct Solver Based on a Static Scheduling for Mixed 1D/2D Block Distributions, Proceedings of Irregular'2000, number 1800 in Lecture Notes in Computer Science, pp.519-525, 2000.
DOI : 10.1007/3-540-45591-4_70

P. Hénon, P. Ramet, and J. Roman, PaStiX: a high-performance parallel direct solver for sparse symmetric positive definite systems, Parallel Computing, vol.28, issue.2, pp.301-321, 2002.
DOI : 10.1016/S0167-8191(01)00141-7

P. Hénon, P. Ramet, and J. Roman, Efficient algorithms for direct resolution of large sparse system on clusters of SMP nodes, SIAM Conference on Applied Linear Algebra, p.8, 2003.

P. Hénon, P. Ramet, and J. Roman, On finding approximate supernodes for an efficient block-ILU(k) factorization, Parallel Computing, vol.34, issue.6-8, pp.345-362, 2008.
DOI : 10.1016/j.parco.2007.12.003

P. Hénon and Y. Saad, A Parallel Multistage ILU Factorization Based on a Hierarchical Graph Decomposition, SIAM Journal on Scientific Computing, vol.28, issue.6, pp.2266-2293, 2006.
DOI : 10.1137/040608258

L. Kenneth, L. Ho, and . Ying, Hierarchical interpolative factorization for elliptic operators: differential equations, Communications on Pure and Applied Mathematics, 2015.

J. D. Hogg, J. K. Reid, and J. A. Scott, Design of a Multicore Sparse Cholesky Factorization Using DAGs, SIAM Journal on Scientific Computing, vol.32, issue.6, pp.3627-3649, 2010.
DOI : 10.1137/090757216

G. Huysmans and S. Pamela, Emiel Van Der Plas, and Pierre Ramet. Non-linear MHD simulations of edge localized modes (ELMs), Plasma Physics, vol.51, issue.12, p.124012, 2009.

D. Hysom and A. Pothen, A Scalable Parallel Algorithm for Incomplete Factor Preconditioning, SIAM Journal on Scientific Computing, vol.22, issue.6, pp.2194-2215, 2001.
DOI : 10.1137/S1064827500376193

F. D. Igual, E. Chan, E. S. Quintana-ortí, G. Quintana-ortí, R. A. Van-de-geijn et al., The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations, Journal of Parallel and Distributed Computing, vol.72, issue.9, pp.1134-1143, 2012.
DOI : 10.1016/j.jpdc.2011.10.014

S. David, L. A. Johnson, and . Mcgeoch, The Traveling Salesman Problem: A Case Study in Local Optimization, pp.215-310, 1997.

M. Joshi, G. Karypis, V. Kumar, A. Gupta, and F. Gustavson, PSPASES : Scalable Parallel Direct Solver Library for Sparse Symmetric Positive Definite Linear Systems, 1999.

G. Karypis and V. Kumar, A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs, SIAM Journal on Scientific Computing, vol.20, issue.1, pp.359-392, 1998.
DOI : 10.1137/S1064827595287997

G. Karypis and V. Kumar, MeTiS ? A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices ? Version 4.0, 1998.

G. Karypis and V. Kumar, Parallel multilevel k-way partitioning scheme for irregular graphs, Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM) , Supercomputing '96, pp.96-129, 1998.
DOI : 10.1145/369028.369103

URL : http://www.cs.umn.edu/~kumar/papers/mlevel_kparallel.ps

G. Karypis and V. Kumar, Parallel threshold-based ILU factorization, Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM) , Supercomputing '97, pp.1-24, 1997.
DOI : 10.1145/509593.509621

O. Kaya and B. Uçar, Scalable sparse tensor decompositions in distributed memory systems, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '15, pp.1-77, 2015.
DOI : 10.1145/2807591.2807624

URL : https://hal.archives-ouvertes.fr/hal-01148202

O. Kaya and B. Uçar, High Performance Parallel Algorithms for the Tucker Decomposition of Sparse Tensors, 2016 45th International Conference on Parallel Processing (ICPP), pp.103-112, 2016.
DOI : 10.1109/ICPP.2016.19

URL : https://hal.archives-ouvertes.fr/hal-01354894

K. Barker, K. Davis, A. Hoisie, D. Kerbyson, M. Lang et al., Experiences in scaling scientific applications on current-generation quad-core processors, 2008 IEEE International Symposium on Parallel and Distributed Processing, 2008.
DOI : 10.1109/IPDPS.2008.4536342

R. Kriemann, $${{\fancyscript{H}}} $$ H -LU factorization on many-core systems, Computing and Visualization in Science, vol.35, issue.2, pp.105-117, 2013.
DOI : 10.1145/1365490.1365500

J. Kurzak, S. Tomov, and J. Dongarra, Autotuning GEMM Kernels for the Fermi GPU, IEEE Transactions on Parallel and Distributed Systems, vol.23, issue.11, pp.2045-2057, 2012.
DOI : 10.1109/TPDS.2011.311

URL : http://www.netlib.org/utk/people/JackDongarra/PAPERS/auto-fermi-2012.pdf

X. Lacoste, Scheduling and memory optimizations for sparse direct solver on multi-core/multi-gpu cluster systems, p.5, 2015.

X. Lacoste, M. Faverge, P. Ramet, S. Thibault, and G. Bosilca, Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp.29-38, 2014.
DOI : 10.1109/IPDPSW.2014.9

URL : https://hal.archives-ouvertes.fr/hal-00987094

B. Lathuilière, Méthode de décomposition de domaine pour les équations du transport simplifié en neutronique, 2010.

X. S. Li, Evaluation of Sparse LU Factorization and Triangular Solution on Multicore Platforms, Lecture Notes in Computer Science, vol.29, issue.2, pp.287-300, 2008.
DOI : 10.1145/1362622.1362674

X. S. Li and J. W. , A scalable sparse direct solver using static pivoting, Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, pp.8-43, 1999.

R. J. Lipton, D. J. Rose, and R. E. Tarjan, Generalized Nested Dissection, SIAM Journal on Numerical Analysis, vol.16, issue.2, pp.9-53, 1979.
DOI : 10.1137/0716027

R. J. Lipton and R. E. Tarjan, A Separator Theorem for Planar Graphs, SIAM Journal on Applied Mathematics, vol.36, issue.2, pp.177-189, 1979.
DOI : 10.1137/0136016

URL : http://historical.ncstrl.org/litesite-data/stan/CS-TR-77-627.pdf

J. W. Liu, Modification of the minimum-degree algorithm by multiple elimination, ACM Transactions on Mathematical Software, vol.11, issue.2, pp.141-153, 1985.
DOI : 10.1145/214392.214398

J. W. Liu, The Role of Elimination Trees in Sparse Factorization, SIAM Journal on Matrix Analysis and Applications, vol.11, issue.1, pp.134-172, 1990.
DOI : 10.1137/0611010

W. H. Joseph, E. G. Liu, B. W. Ng, and . Peyton, On finding supernodes for sparse matrix computations, SIAM Journal on Matrix Analysis and Applications, vol.14, issue.1, pp.242-252, 1993.

R. F. Lucas, G. Wagenbreth, D. M. Davis, and R. Grimes, Multifrontal Computations on GPUs and Their Multi-core Hosts, Proceedings of the 9th international conference on High performance computing for computational science, VECPAR'10, pp.71-82, 2011.
DOI : 10.1016/0167-8191(86)90019-0

J. María, I. Martín, F. F. Pardines, and . Rivera, Scheduling of algorithms based on elimination trees on NUMA systems, EuroPar'99, pp.1068-1072, 1999.

B. Henning-meyerhenke, T. Monien, and . Sauerwald, A new diffusion-based multilevel algorithm for computing graph partitions, Journal of Parallel and Distributed Computing, vol.69, issue.9, pp.750-761, 2009.
DOI : 10.1016/j.jpdc.2009.04.005

G. L. Miller, S. Teng, and S. A. Vavasis, A unified geometric approach to graph separators, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science, pp.538-547, 1991.
DOI : 10.1109/SFCS.1991.185417

G. L. Miller and S. A. Vavasis, Density graphs and separators, Second Annual ACM-SIAM Symposium on Discrete Algorithms, pp.331-336, 1991.

M. Magolu-monga-made, H. A. Van, and . Vorst, A generalized domain decomposition paradigm for parallel incomplete LU factorization preconditionings, High Performance Computing and Networking, pp.925-932, 2001.
DOI : 10.1016/S0167-739X(01)00034-6

S. Moustafa, Massively Parallel Cartesian Discrete Ordinates Method for Neutron Transport Simulation, 2015.
URL : https://hal.archives-ouvertes.fr/tel-01379686

S. Moustafa, I. Dutka-malen, L. Plagne, A. Ponçot, and P. Ramet, Shared Memory Parallelism for 3D Cartesian Discrete Ordinates Solver, Annals of Nuclear Energy, pp.1-10, 2014.
DOI : 10.1016/j.anucene.2014.08.034

URL : https://hal.archives-ouvertes.fr/hal-00986975

R. Nath, S. Tomov, and J. Dongarra, An Improved Magma Gemm For Fermi Graphics Processing Units, The International Journal of High Performance Computing Applications, vol.24, issue.4, pp.511-515, 2010.
DOI : 10.1016/S0167-8191(00)00087-9

C. Nvidia, Cublas library. NVIDIA Corporation, 2008.

F. Pellegrini and J. Roman, Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs, High-Performance Computing and Networking, pp.493-498, 1996.
DOI : 10.1007/3-540-61142-8_588

F. Pellegrini and J. Roman, Sparse matrix ordering with Scotch, Proceedings of HPCN'97, pp.370-378, 1997.
DOI : 10.1007/BFb0031609

F. Pellegrini, J. Roman, and P. Amestoy, Hybridizing nested dissection and halo approximate minimum degree for efficient sparse matrix ordering. Concurrency: Practice and Experience, Preliminary version published in Proceedings of Irregular'99, pp.69-84, 2000.
DOI : 10.1002/(sici)1096-9128(200002/03)12:2/3<69::aid-cpe472>3.0.co;2-w

URL : http://www.enseeiht.fr/Recherche/Info/Numerique/Noail/Membres/../MUMPS/hamd_cpe.ps.gz

F. Pellegrini, Scotch and libScotch 5.1 User's Guide, pp.10-78, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00410332

G. Pichon, E. Darve, M. Faverge, P. Ramet, and J. Roman, Sparse Supernodal Solver Using Block Low-Rank Compression, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp.1138-1147, 2017.
DOI : 10.1109/IPDPSW.2017.86

URL : https://hal.archives-ouvertes.fr/hal-01450732

G. Pichon, M. Faverge, P. Ramet, and J. Roman, Reordering Strategy for Blocking Optimization in Sparse Linear Solvers, SIAM Journal on Matrix Analysis and Applications, vol.38, issue.1, pp.226-248, 2017.
DOI : 10.1137/16M1062454

URL : https://hal.archives-ouvertes.fr/hal-01485507

A. Pothen and C. Sun, A Mapping Algorithm for Parallel Sparse Cholesky Factorization, SIAM Journal on Scientific Computing, vol.14, issue.5, pp.1253-1257, 1993.
DOI : 10.1137/0914074

H. Pouransari, P. Coulier, and E. Darve, Fast hierarchical solvers for sparse matrices using extended sparsification and low-rank approximation. arXiv preprint
DOI : 10.1137/15m1046939

URL : http://arxiv.org/pdf/1510.07363

G. Quintana-ortí, E. S. Quintana-ortí, R. A. Van-de-geijn, F. G. Van-zee, and E. Chan, Programming matrix algorithms-by-blocks for thread-level parallelism, ACM Transactions on Mathematical Software, vol.36, issue.3, 2009.
DOI : 10.1145/1527286.1527288

P. Raghavan, K. Teranishi, and E. G. Ng, A latency tolerant hybrid sparse solver using incomplete cholesky factorization. Numerical Linear Algebra with Applications, pp.541-560, 2003.
DOI : 10.1002/nla.327

S. Rajamanickam, E. G. Boman, and M. A. Heroux, ShyLU: A Hybrid-Hybrid Solver for Multicore Platforms, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.631-643, 2012.
DOI : 10.1109/IPDPS.2012.64

D. J. Rose, R. Endre-tarjan, and G. S. Lueker, Algorithmic Aspects of Vertex Elimination on Graphs, SIAM Journal on Computing, vol.5, issue.2, pp.266-283, 1976.
DOI : 10.1137/0205021

E. Rothberg and A. Gupta, An Efficient Block-Oriented Approach to Parallel Sparse Cholesky Factorization, SIAM Journal on Scientific Computing, vol.15, issue.6, pp.1413-1439, 1994.
DOI : 10.1137/0915085

Y. Saad, ILUT: A dual threshold incomplete LU factorization, Numerical Linear Algebra with Applications, vol.19, issue.4, pp.387-402, 1994.
DOI : 10.2118/8252-PA

URL : ftp://ftp.cs.umn.edu/dept/users/saad/reports/FILES/umsi-92-38.ps.gz

Y. Saad, Iterative Methods for Sparse Linear Systems, Society for Industrial and Applied Mathematics, pp.43-44, 2003.
DOI : 10.1137/1.9780898718003

O. Schenk and K. Gärtner, Solving unsymmetric sparse systems of linear equations with PARDISO, Future Generation Computer Systems, vol.20, issue.3, pp.475-487, 2004.
DOI : 10.1016/j.future.2003.07.011

W. M. Sid-lakhdar, Scaling the solution of large sparse linear systems using multifrontal methods on hybrid shared-distributed memory architectures, 2014.
URL : https://hal.archives-ouvertes.fr/tel-01111259

S. Smith and G. Karypis, Accelerating the Tucker Decomposition with Compressed Sparse Tensors, European Conference on Parallel Processing, 2017.
DOI : 10.1145/2783258.2783395

S. Smith, G. Park, and . Karypis, Sparse Tensor Factorization on Many-Core Processors with High-Bandwidth Memory, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2017.
DOI : 10.1109/IPDPS.2017.84

S. Hsl, A collection of Fortran codes for large scale scientific computation

A. Daria, . Sushnikova, V. Ivan, and . Oseledets, Compress and eliminate " solver for symmetric positive definite sparse matrices

G. Tan, L. Li, S. Triechle, E. Phillips, Y. Bao et al., Fast implementation of DGEMM on Fermi GPU, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-35, 2011.
DOI : 10.1145/2063384.2063431

R. Thakur and W. Gropp, Test Suite for Evaluating Performance of MPI Implementations That Support MPI_THREAD_MULTIPLE, In EuroPVM/MPI Lecture Notes in Computer Science, vol.4757, pp.46-55, 2007.
DOI : 10.1007/978-3-540-75416-9_13

F. Trahay, A. Denis, O. Aumage, and R. Namyst, Improving Reactivity and Communication Overlap in MPI Using a Generic I/O Manager, EuroPVM/MPI, pp.170-177, 2007.
DOI : 10.1007/978-3-540-75416-9_27

URL : https://hal.archives-ouvertes.fr/inria-00177167

G. Field, E. Van-zee, . Chan, A. Robert, E. S. Van-de-geijn et al., The LibFlame library for dense matrix computations, Computing in science & engineering, vol.11, issue.6, pp.56-63, 2009.

V. Volkov and J. W. Demmel, Benchmarking GPUs to tune dense linear algebra, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, pp.31-32, 2008.
DOI : 10.1109/SC.2008.5214359

URL : http://bebop.cs.berkeley.edu/pubs/volkov2008-benchmarking.pdf

S. Wang, X. S. Li, F. Rouet, J. Xia, M. V. De et al., A Parallel Geometric Multifrontal Solver Using Hierarchically Semiseparable Structure, ACM Transactions on Mathematical Software, vol.42, issue.3, pp.1-21, 2016.
DOI : 10.1002/nla.691

W. James and I. Watts, A conjugate gradient-truncated direct method for the iterative solution of the reservoir simulation pressure equation, 1981.

J. Xia, S. Chandrasekaran, M. Gu, and X. S. Li, Superfast Multifrontal Method for Large Structured Linear Systems of Equations, SIAM Journal on Matrix Analysis and Applications, vol.31, issue.3, pp.1382-1411, 2010.
DOI : 10.1137/09074543X

I. Yamazaki and X. S. Li, On Techniques to Improve Robustness and Scalability of a Parallel Hybrid Linear Solver, High Performance Computing for Computational Science ? VECPAR 2010, pp.421-434, 2011.
DOI : 10.1145/779359.779361

I. Yamazaki, S. Tomov, and J. Dongarra, One-sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators*, Procedia Computer Science, vol.9, issue.Complete, pp.37-46, 2012.
DOI : 10.1016/j.procs.2012.04.005

URL : https://doi.org/10.1016/j.procs.2012.04.005

K. Yang, H. Pouransari, and E. Darve, Sparse hierarchical solvers with guaranteed convergence . arXiv preprint

A. Yarkhan, Dynamic Task Execution on Shared and Distributed Memory Architectures

C. Yu, D. Wang, and . Pierce, A CPU???GPU hybrid approach for the unsymmetric multifrontal method, Parallel Computing, vol.37, issue.12, pp.759-770, 2011.
DOI : 10.1016/j.parco.2011.09.002