Multifrontal QR Factorization for Multicore Architectures over Runtime Systems, Euro-Par 2013 Parallel Processing, pp.521-532, 2013. ,
DOI : 10.1007/978-3-642-40047-6_53
URL : https://hal.archives-ouvertes.fr/hal-01220611
Task-Based Multifrontal QR Solver for GPU-Accelerated Multicore Architectures, 2015 IEEE 22nd International Conference on High Performance Computing (HiPC), pp.54-63, 2015. ,
DOI : 10.1109/HiPC.2015.27
URL : https://hal.archives-ouvertes.fr/hal-01166312
Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems, ACM Transactions on Mathematical Software, vol.43, issue.2, pp.1-1322, 2016. ,
DOI : 10.1145/2898348
URL : https://hal.archives-ouvertes.fr/hal-01333645
Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, issue.1, p.12037, 2009. ,
DOI : 10.1088/1742-6596/180/1/012037
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures . Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par, pp.187-198, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00550877
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, Euro-Par, pp.851-862, 2009. ,
DOI : 10.1109/TPDS.2003.1214317
Legion: Expressing locality and independence with logical regions, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, p.66, 2012. ,
DOI : 10.1109/SC.2012.71
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.259.7715
PaRSEC: Exploiting Heterogeneity to Enhance Scalability, Computing in Science & Engineering, vol.15, issue.6, pp.36-45, 2013. ,
DOI : 10.1109/MCSE.2013.98
Dense linear algebra on distributed heterogeneous hardware with a symbolic DAG approach. Scalable Computing and Communications: Theory and Practice pp, pp.699-733, 2013. ,
Fine-Grained Multithreading for the Multifrontal $QR$ Factorization of Sparse Matrices, SIAM Journal on Scientific Computing, vol.35, issue.4, pp.323-345, 2013. ,
DOI : 10.1137/110846427
URL : https://hal.archives-ouvertes.fr/hal-01122471
A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009. ,
DOI : 10.1016/j.parco.2008.10.002
The university of Florida sparse matrix collection, ACM Transactions on Mathematical Software, vol.38, issue.1, pp.1-125, 2011. ,
DOI : 10.1145/2049662.2049663
Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor, High Performance Computing -ISC High Performance 2016 International Workshops, pp.339-353, 2016. ,
DOI : 10.1190/geo2011-0238.1
The Multifrontal Solution of Indefinite Sparse Symmetric Linear, ACM Transactions on Mathematical Software, vol.9, issue.3, pp.302-325, 1983. ,
DOI : 10.1145/356044.356047
Tile QR factorization with parallel panel processing for multicore architectures, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp.1-10, 2010. ,
DOI : 10.1109/IPDPS.2010.5470443
URL : https://hal.archives-ouvertes.fr/inria-00548899
Programming model, performance analysis and optimization techniques for the Intel Knights Landing Xeon Phi, 2016 IEEE High Performance Extreme Computing Conference, pp.1-7, 2016. ,
A Parallel Sparse Direct Solver via Hierarchical DAG Scheduling, ACM Transactions on Mathematical Software, vol.41, issue.1, pp.1-327, 2014. ,
DOI : 10.1145/2629641
Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014. ,
DOI : 10.1109/IPDPSW.2014.9
URL : https://hal.archives-ouvertes.fr/hal-00925017
Optimization of the sparse matrixvector products of an IDR Krylov iterative solver in emgeo for the intel KNL manycore processor In: High Performance Computing -ISC High Performance 2016 International Workshops Revised Selected Papers, pp.378-389, 2016. ,
Evaluating the Impact of TLB Misses on Future HPC Systems, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.1010-1021, 2012. ,
DOI : 10.1109/IPDPS.2012.94
Programming matrix algorithms-by-blocks for thread-level parallelism, ACM Transactions on Mathematical Software, vol.36, issue.3, 2009. ,
DOI : 10.1145/1527286.1527288
Intel Threading Building Blocks: Outfitting C++ for Multi- Core Processor Parallelism, 2007. ,
A comparative study of application performance and scalability on the intel knights landing processor Revised Selected Papers, High Performance Computing -ISC High Performance 2016 International Workshops, pp.307-318, 2016. ,
Evaluating the effect of replacing CNK with linux on the compute-nodes of blue gene/l, Proceedings of the 22nd annual international conference on Supercomputing , ICS '08, pp.165-174, 2008. ,
DOI : 10.1145/1375527.1375554
Knights landing (KNL): 2nd Generation Intel?? Xeon Phi processor, 2015 IEEE Hot Chips 27 Symposium (HCS), pp.1-24, 2015. ,
DOI : 10.1109/HOTCHIPS.2015.7477467