Computing the Polar Decomposition with Applications, SIAM Journal on Scientific and Statistical Computing, vol.7, issue.4, pp.1160-1174, 1986. ,
, ser. John Hopkins Studies in the Mathematical Sciences, 1996.
Iterative optimal orthogonalization of the strapdown matrix, IEEE Transactions on, issue.11, pp.30-37, 1975. ,
Linear algebra and quantum chemistry, Am. Math. Monthly, vol.98, issue.10, pp.710-718, 1991. ,
,
Optimizing Halley's Iteration for Computing the Matrix Polar Decomposition, SIAM Journal on Matrix Analysis and Applications, pp.2700-2720, 2010. ,
Stable and Efficient Spectral Divide and Conquer Algorithms for the Symmetric Eigenvalue Decomposition and the SVD, SIAM Journal on Scientific Computing, vol.35, issue.3, pp.1325-1349, 2013. ,
, ScaLAPACK Users' Guide. Philadelphia: Society for Industrial and Applied Mathematics, 1997.
PTG: An abstraction for unhindered parallelism, Proceedings of WOLFHPC 2014: 4th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing -Held in Conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Stor, pp.21-30, 2014. ,
Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, 2009. ,
Supermatrix out-of-order scheduling of matrix operations for SMP and multicore architectures, SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, pp.116-125, 2007. ,
, The Chameleon Project, 2018.
Hierarchical qr factorization algorithms for multicore clusters, Parallel Computing, vol.39, issue.4, pp.212-232, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00809770
SUMMA: scalable universal matrix multiplication algorithm, Concurrency: Practice and Experience, vol.9, issue.4, pp.255-274, 1997. ,
,
Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA, IPDPS Workshops. IEEE, pp.1432-1441, 2011. ,
Investigating Applications Portability with the Uintah DAG-based Runtime System on PetaScale Supercomputers, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC '13, 2013. ,
, The HiCMA Library, 2018.
Tile Low Rank Cholesky Factorization for Climate/Weather Modeling Applications on Manycore Architectures, High Performance Computing: 32nd International Conference, pp.22-40, 2017. ,
Exploiting data sparsity for large-scale matrix computations, Euro-Par 2018: Parallel Processing, pp.721-734, 2018. ,
Numerical Linear Algebra, 1997. ,
A High Performance QDWH-SVD Solver Using Hardware Accelerators, ACM Trans. Math. Softw, vol.43, issue.1, pp.1-6, 2016. ,
, Matrix Algebra on GPU and Multicore Architectures, MAGMA, 2009.
, LAPACK User's Guide, 1999.
Elemental: A New Framework for Distributed Memory Dense Matrix Computations, ACM Trans. Math. Softw, vol.39, issue.2, p.13, 2013. ,
High Performance Polar Decomposition on Distributed Memory Systems, Euro-Par 2016: Parallel Processing -22nd International Conference on Parallel and Distributed Computing, vol.9833, pp.605-616, 2016. ,
,
A QDWH-based SVD software framework on distributed-memory manycore systems, ACM Trans. Math. Softw, vol.45, issue.2, pp.1-18, 2019. ,
,
Asynchronous taskbased polar decomposition on single node manycore architectures, IEEE Transactions on Parallel and Distributed Systems, vol.29, issue.2, pp.312-323, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01585079
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, Concurrency and Computation: Practice and Experience, vol.23, issue.2, pp.187-198, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
DAGuE: A generic distributed DAG Engine for High Performance Computing, Parallel Computing, vol.38, issue.1-2, pp.0-2012, 2012. ,
Compact dag representation and its symbolic scheduling, Journal of Parallel and Distributed Computing, vol.64, issue.8, pp.921-935, 2004. ,
URL : https://hal.archives-ouvertes.fr/inria-00099958
A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, pp.38-53 ,
Parallel tiled QR factorization for multicore architectures, Concurrency: Practice and Experience, vol.20, issue.13, pp.1573-1590, 2008. ,
Programming matrix algorithms-by-blocks for threadlevel parallelism, ACM Transactions on Mathematical Software, vol.36, issue.3, 2009. ,
Communication-avoiding parallel and sequential QR and LU factorizations: theory and practice, 2008. ,
Tile QR factorization with parallel panel processing for multicore architectures, IPDPS'10, the 24st IEEE Int. Parallel and Distributed Processing Symposium, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00548899
Tiled QR factorization algorithms, SC'2011, the IEEE/ACM Conference on High Performance Computing Networking, Storage and Analysis, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00585721
Scalable tile communication-avoiding QR factorization on multicore cluster systems, SC'10, the 2010 ACM/IEEE conference on Supercomputing, 2010. ,
Bidiagonalization and R-Bidiagonalization: Parallel Tiled Algorithms, Critical Paths and Distributed-Memory Implementation, IPDPS'17 -31st IEEE International Parallel and Distributed Processing Symposium, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01484113
Communication-optimal parallel 2.5d matrix multiplication and lu factorization algorithms, 2011. ,
, , pp.90-109, 2011.
Communication-optimal parallel recursive rectangular matrix multiplication, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp.261-272, 2013. ,
Estimating the matrix p-norm, Numerische Mathematik, vol.62, issue.1, pp.539-555, 1992. ,
Efficient algorithms for all-to-all communications in multiport messagepassing systems, IEEE Transactions on Parallel and Distributed Systems, vol.8, issue.11, pp.1143-1156, 1997. ,
Scalable taskbased algorithm for multiplication of block-rank-sparse matrices, Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms, ser. IA3 '15, vol.4, pp.1-4, 2015. ,
Installation guide for LAPACK, 1992. ,
Massively Parallel Polar Decomposition on Distributed-Memory Systems, Accepted at ACM Transactions on Parallel Computing, 2019. ,