Y. Nakatsukasa, Z. Bai, and F. Gygi, Optimizing Halley's Iteration for Computing the Matrix Polar Decomposition, SIAM Journal on Matrix Analysis and Applications, vol.31, issue.5, pp.2700-2720, 2010.
DOI : 10.1137/090774999

J. Meyer and I. Y. , Bar-itzhack Practical Comparison of Iterative Matrix Orthogonalization Algorithms, IEEE Transactions on Aerospace and Electronic Systems, issue.3, pp.230-235, 1977.

J. A. Goldstein and M. Levy, Linear Algebra and Quantum Chemistry, The American Mathematical Monthly, vol.98, issue.8, pp.710-718, 1991.
DOI : 10.2307/2324422

N. J. Higham, Computing the Polar Decomposition???with Applications, SIAM Journal on Scientific and Statistical Computing, vol.7, issue.4, pp.1160-1174, 1986.
DOI : 10.1137/0907079

D. Sukkari, H. Ltaief, and D. Keyes, A High Performance QDWH-SVD Solver Using Hardware Accelerators, ACM Transactions on Mathematical Software, vol.43, issue.1, pp.1-6, 2016.
DOI : 10.1145/2894747

E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak et al., Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, 2009.
DOI : 10.1088/1742-6596/180/1/012037

W. Gander, On Halley's Iteration Method, The American Mathematical Monthly, vol.92, issue.2, pp.131-134, 1985.
DOI : 10.2307/2322644

C. S. Kenney and A. J. Laub, On Scaling Newton???s Method for Polar Decomposition and the Matrix Sign Function, SIAM Journal on Matrix Analysis and Applications, vol.13, issue.3, pp.688-706, 1992.
DOI : 10.1137/0613044

N. J. Higham and P. Papadimitriou, A parallel algorithm for computing the polar decomposition, Parallel Computing, vol.20, issue.8, pp.1161-1173, 1994.
DOI : 10.1016/0167-8191(94)90073-6

A. Kielbasinski and K. Zietak, Numerical behaviour of higham's scaled method for polar decomposition, Numerical Algorithms, vol.32, issue.2/4, pp.105-140, 2003.
DOI : 10.1023/A:1024098014869

R. Byers and H. Xu, A New Scaling for Newton's Iteration for the Polar Decomposition and its Backward Stability, SIAM Journal on Matrix Analysis and Applications, vol.30, issue.2, pp.822-843, 2008.
DOI : 10.1137/070699895

B. Laszkiewicz and K. Zietak, Approximation of Matrices and a Family of Gander Methods for Polar Decomposition, BIT Numerical Mathematics, vol.46, issue.2, pp.345-366, 2006.
DOI : 10.1007/s10543-006-0053-4

Y. Nakatsukasa and N. J. Higham, Stable and Efficient Spectral Divide and Conquer Algorithms for the Symmetric Eigenvalue Decomposition and the SVD, SIAM Journal on Scientific Computing, vol.35, issue.3, pp.1325-1349, 2013.
DOI : 10.1137/120876605

G. H. Golub and C. F. Van-loan, ser. John Hopkins Studies in the Mathematical Sciences, Matrix Computations, 1996.

L. N. Trefethen and D. Bau, Numerical Linear Algebra, 1997.
DOI : 10.1137/1.9780898719574

J. Poulson, B. Marker, R. A. Van-de-geijn, J. R. Hammond, and N. A. Romero, Elemental, ACM Transactions on Mathematical Software, vol.39, issue.2, p.13, 2013.
DOI : 10.1145/2427023.2427030

J. Dongarra, P. Beckman, T. Moore, P. Aerts, G. Aloisio et al., The International Exascale Software Project roadmap, The International Exascale Software Project RoadmapOnline]. Available, pp.3-60, 2011.
DOI : 10.1177/1094342010391989

A. Buttari, J. Langou, J. Kurzak, and J. J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009.
DOI : 10.1016/j.parco.2008.10.002

G. Quintana-ortí, E. S. Quintana-ortí, R. A. Geijn, F. G. Zee, and E. Chan, Programming matrix algorithms-by-blocks for thread-level parallelism, ACM Transactions on Mathematical Software, vol.36, issue.3, pp.1-14, 2009.
DOI : 10.1145/1527286.1527288

E. Agullo, B. Hadri, H. Ltaief, and J. Dongarrra, Comparative study of one-sided factorizations with multiple software packages on multicore hardware, SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp.1-12, 2009.

A. Yarkhan, J. Kurzak, and J. Dongarra, QUARK Users' Guide: QUeueing And Runtime for Kernels, 2011.

E. Chan, E. S. Quintana-ortí, G. Quintana-ortí, and R. Van-de-geijn, Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures, Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures , SPAA '07, pp.116-125, 2007.
DOI : 10.1145/1248377.1248397

A. Openmp, Openmp application program interface version 4.0, 2013.

A. Duran, R. Ferrer, E. Ayguadé, R. M. Badia, and J. Labarta, A Proposal to Extend the OpenMP Tasking Model with Dependent Tasks, International Journal of Parallel Programming, vol.26, issue.6, pp.292-305, 2009.
DOI : 10.1007/s10766-009-0101-1

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, vol.23, issue.4, pp.187-198, 2011.
DOI : 10.1002/cpe.1631

URL : https://hal.archives-ouvertes.fr/inria-00384363

G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier et al., DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.1-2, pp.37-51
DOI : 10.1016/j.parco.2011.10.003

A. Charara, H. Ltaief, D. Gratadour, D. E. Keyes, A. Sevin et al., Pipelining Computational Stages of the Tomographic Reconstructor for Multi-Object Adaptive Optics on a Multi-GPU System, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, pp.262-273, 2014.
DOI : 10.1109/SC.2014.27

H. Ltaief, D. Gratadour, A. Charara, and E. Gendron, Adaptive Optics Simulation for the World's Largest Telescope on Multicore Architectures with Multiple GPUs, Proceedings of the Platform for Advanced Scientific Computing Conference on ZZZ, PASC '16, pp.1-9, 2016.
DOI : 10.1145/2929908.2929920

E. Agullo, O. Aumage, M. Faverge, N. Furmento, F. Pruvost et al., Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model Inria Bordeaux Sud-Ouest Available: https, Bordeaux INP, 2016.

H. Topcuoglu, S. Hariri, and M. Wu, Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE transactions on parallel and distributed systems, pp.260-274, 2002.
DOI : 10.1109/71.993206

J. Dongarra, M. Faverge, H. Ltaief, and P. Luszczek, Exploiting finegrain parallelism in recursive lu factorization, PARCO, pp.429-436, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00809755

P. C. Hansen, Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion, ser. Mathematical Modeling and Computation, 1998.
DOI : 10.1137/1.9780898719697