Optimizing Halley's Iteration for Computing the Matrix Polar Decomposition, SIAM Journal on Matrix Analysis and Applications, vol.31, issue.5, pp.2700-2720, 2010. ,
DOI : 10.1137/090774999
Bar-itzhack Practical Comparison of Iterative Matrix Orthogonalization Algorithms, IEEE Transactions on Aerospace and Electronic Systems, issue.3, pp.230-235, 1977. ,
Linear Algebra and Quantum Chemistry, The American Mathematical Monthly, vol.98, issue.8, pp.710-718, 1991. ,
DOI : 10.2307/2324422
Computing the Polar Decomposition???with Applications, SIAM Journal on Scientific and Statistical Computing, vol.7, issue.4, pp.1160-1174, 1986. ,
DOI : 10.1137/0907079
A High Performance QDWH-SVD Solver Using Hardware Accelerators, ACM Transactions on Mathematical Software, vol.43, issue.1, pp.1-6, 2016. ,
DOI : 10.1145/2894747
Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, 2009. ,
DOI : 10.1088/1742-6596/180/1/012037
On Halley's Iteration Method, The American Mathematical Monthly, vol.92, issue.2, pp.131-134, 1985. ,
DOI : 10.2307/2322644
On Scaling Newton???s Method for Polar Decomposition and the Matrix Sign Function, SIAM Journal on Matrix Analysis and Applications, vol.13, issue.3, pp.688-706, 1992. ,
DOI : 10.1137/0613044
A parallel algorithm for computing the polar decomposition, Parallel Computing, vol.20, issue.8, pp.1161-1173, 1994. ,
DOI : 10.1016/0167-8191(94)90073-6
Numerical behaviour of higham's scaled method for polar decomposition, Numerical Algorithms, vol.32, issue.2/4, pp.105-140, 2003. ,
DOI : 10.1023/A:1024098014869
A New Scaling for Newton's Iteration for the Polar Decomposition and its Backward Stability, SIAM Journal on Matrix Analysis and Applications, vol.30, issue.2, pp.822-843, 2008. ,
DOI : 10.1137/070699895
Approximation of Matrices and a Family of Gander Methods for Polar Decomposition, BIT Numerical Mathematics, vol.46, issue.2, pp.345-366, 2006. ,
DOI : 10.1007/s10543-006-0053-4
Stable and Efficient Spectral Divide and Conquer Algorithms for the Symmetric Eigenvalue Decomposition and the SVD, SIAM Journal on Scientific Computing, vol.35, issue.3, pp.1325-1349, 2013. ,
DOI : 10.1137/120876605
ser. John Hopkins Studies in the Mathematical Sciences, Matrix Computations, 1996. ,
Numerical Linear Algebra, 1997. ,
DOI : 10.1137/1.9780898719574
Elemental, ACM Transactions on Mathematical Software, vol.39, issue.2, p.13, 2013. ,
DOI : 10.1145/2427023.2427030
The International Exascale Software Project roadmap, The International Exascale Software Project RoadmapOnline]. Available, pp.3-60, 2011. ,
DOI : 10.1177/1094342010391989
A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009. ,
DOI : 10.1016/j.parco.2008.10.002
Programming matrix algorithms-by-blocks for thread-level parallelism, ACM Transactions on Mathematical Software, vol.36, issue.3, pp.1-14, 2009. ,
DOI : 10.1145/1527286.1527288
Comparative study of one-sided factorizations with multiple software packages on multicore hardware, SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp.1-12, 2009. ,
QUARK Users' Guide: QUeueing And Runtime for Kernels, 2011. ,
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures, Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures , SPAA '07, pp.116-125, 2007. ,
DOI : 10.1145/1248377.1248397
Openmp application program interface version 4.0, 2013. ,
A Proposal to Extend the OpenMP Tasking Model with Dependent Tasks, International Journal of Parallel Programming, vol.26, issue.6, pp.292-305, 2009. ,
DOI : 10.1007/s10766-009-0101-1
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, vol.23, issue.4, pp.187-198, 2011. ,
DOI : 10.1002/cpe.1631
URL : https://hal.archives-ouvertes.fr/inria-00384363
DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.1-2, pp.37-51 ,
DOI : 10.1016/j.parco.2011.10.003
Pipelining Computational Stages of the Tomographic Reconstructor for Multi-Object Adaptive Optics on a Multi-GPU System, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, pp.262-273, 2014. ,
DOI : 10.1109/SC.2014.27
Adaptive Optics Simulation for the World's Largest Telescope on Multicore Architectures with Multiple GPUs, Proceedings of the Platform for Advanced Scientific Computing Conference on ZZZ, PASC '16, pp.1-9, 2016. ,
DOI : 10.1145/2929908.2929920
Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model Inria Bordeaux Sud-Ouest Available: https, Bordeaux INP, 2016. ,
Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE transactions on parallel and distributed systems, pp.260-274, 2002. ,
DOI : 10.1109/71.993206
Exploiting finegrain parallelism in recursive lu factorization, PARCO, pp.429-436, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00809755
Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion, ser. Mathematical Modeling and Computation, 1998. ,
DOI : 10.1137/1.9780898719697