E. Agullo, B. Hadri, H. Ltaief, and J. Dongarra, Comparative study of one-sided factorizations with multiple software packages on multi-core hardware, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pp.1-12, 2009.
DOI : 10.1145/1654059.1654080

M. W. Berry, Large-scale sparse singular value computations, Int. J. Supercomputer . Appl, vol.6, issue.1, pp.13-49, 1992.

C. Bischof and C. V. Loan, The WY Representation for Products of Householder Matrices, SIAM Journal on Scientific and Statistical Computing, vol.8, issue.1, pp.2-13, 1987.
DOI : 10.1137/0908009

S. Blackford and J. J. Dongarra, Installation guide for LAPACK, 1992.

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, A. Haidar et al., Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp.1432-1441, 2011.
DOI : 10.1109/IPDPS.2011.299

G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier et al., DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.1-2, pp.37-51, 2012.
DOI : 10.1016/j.parco.2011.10.003

H. Bouwmeester, Tiled Algorithms for Matrix Computations on Multicore Architectures, 2013.

H. Bouwmeester, M. Jacquelin, J. Langou, and Y. Robert, Tiled QR factorization algorithms, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, 2011.
DOI : 10.1145/2063384.2063393
URL : https://hal.archives-ouvertes.fr/inria-00585721

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, Parallel tiled QR factorization for multicore architectures, Concurrency and Computation: Practice and Experience, vol.21, issue.8, pp.1573-1590, 2008.
DOI : 10.1002/cpe.1301

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009.
DOI : 10.1016/j.parco.2008.10.002

T. F. Chan, An Improved Algorithm for Computing the Singular Value Decomposition, ACM Transactions on Mathematical Software, vol.8, issue.1, pp.72-83, 1982.
DOI : 10.1145/355984.355990

J. Choi, J. J. Dongarra, and D. W. Walker, The design of a parallel dense linear algebra software library: Reduction to Hessenberg, tridiagonal, and bidiagonal form, Numerical Algorithms, vol.10, issue.2, pp.379-399, 1995.
DOI : 10.1007/BF02140776

M. Cosnard, J. Muller, and Y. Robert, Parallel QR decomposition of a rectangular matrix, Numerische Mathematik, vol.1, issue.1, pp.239-249, 1986.
DOI : 10.1007/BF01389871
URL : https://hal.archives-ouvertes.fr/hal-00857127

J. Dongarra, M. Faverge, T. Herault, M. Jacquelin, J. Langou et al., Hierarchical QR Factorization Algorithms for Multi-core Cluster Systems, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.4-5212, 2013.
DOI : 10.1109/IPDPS.2012.62
URL : https://hal.archives-ouvertes.fr/hal-00764022

J. J. Dongarra, D. C. Sorensen, and S. J. Hammarling, Block reduction of matrices to condensed forms for eigenvalue computations, Journal of Computational and Applied Mathematics, vol.27, issue.1-2, pp.215-227, 1989.
DOI : 10.1016/0377-0427(89)90367-1

Z. Drma? and K. Veseli´cveseli´c, New Fast and Accurate Jacobi SVD Algorithm. I, SIAM Journal on Matrix Analysis and Applications, vol.29, issue.4, pp.1322-1342, 2008.
DOI : 10.1137/050639193

Z. Drma? and K. Veseli´cveseli´c, New Fast and Accurate Jacobi SVD Algorithm. II, SIAM Journal on Matrix Analysis and Applications, vol.29, issue.4, pp.1343-1362, 2008.
DOI : 10.1137/05063920X

C. Eckart and G. Young, The approximation of one matrix by another of lower rank, Psychometrika, vol.1, issue.3, pp.211-218, 1936.
DOI : 10.1007/BF02288367

G. Golub and W. Kahan, Calculating the Singular Values and Pseudo-Inverse of a Matrix, Journal of the Society for Industrial and Applied Mathematics Series B Numerical Analysis, vol.2, issue.2, pp.205-224, 1965.
DOI : 10.1137/0702016

B. Großer and B. Lang, Efficient parallel reduction to bidiagonal form, Parallel Computing, vol.25, issue.8, pp.969-986, 1999.
DOI : 10.1016/S0167-8191(99)00041-1

M. Gu and S. C. Eisenstat, A Divide-and-Conquer Algorithm for the Bidiagonal SVD, SIAM Journal on Matrix Analysis and Applications, vol.16, issue.1, pp.79-92, 1995.
DOI : 10.1137/S0895479892242232

J. A. Gunnels, F. G. Gustavson, G. M. Henry, and R. A. Van-de-geijn, FLAME: Formal Linear Algebra Methods Environment, ACM Transactions on Mathematical Software, vol.27, issue.4, pp.422-455, 2001.
DOI : 10.1145/504210.504213
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.118.7096

A. Haidar, J. Kurzak, and P. Luszczek, An improved parallel singular value algorithm and its implementation for multicore hardware, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '13, pp.1-90, 2013.
DOI : 10.1145/2503210.2503292

A. Haidar, H. Ltaief, P. Luszczek, and J. Dongarra, A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.25-35, 2012.
DOI : 10.1109/IPDPS.2012.13

M. E. Hochstenbach, A Jacobi--Davidson Type SVD Method, SIAM Journal on Scientific Computing, vol.23, issue.2, pp.606-628, 2001.
DOI : 10.1137/S1064827500372973

Z. Jia and D. Niu, An Implicitly Restarted Refined Bidiagonalization Lanczos Method for Computing a Partial Singular Value Decomposition, SIAM Journal on Matrix Analysis and Applications, vol.25, issue.1, pp.246-265, 2003.
DOI : 10.1137/S0895479802404192

B. Lang, Parallel reduction of banded matrices to bidiagonal form, Parallel Computing, vol.22, issue.1, pp.1-18, 1996.
DOI : 10.1016/0167-8191(95)00064-X

R. M. Larsen, Lanczos Bidiagonalization With Partial Reorthogonalization, DAIMI Report Series, vol.27, issue.537, 1998.
DOI : 10.7146/dpb.v27i537.7070

C. Lawson and R. Hanson, Solving Least Squares Problems, Society for Industrial and Applied Mathematics, 1974.
DOI : 10.1137/1.9781611971217

]. H. Ltaief, J. Kurzak, and J. Dongarra, Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures, IEEE Transactions on Parallel and Distributed Systems, vol.21, issue.4, pp.417-423, 2010.
DOI : 10.1109/TPDS.2009.79

H. Ltaief, P. Luszczek, and J. Dongarra, Enhancing parallelism of tile bidiagonal transformation on multicore architectures using tree reduction Revised Selected Papers, Part I, Parallel Processing and Applied Mathematics: 9th International Conference, pp.661-670, 2011.

H. Ltaief, P. Luszczek, and J. Dongarra, High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures, ACM Transactions on Mathematical Software, vol.39, issue.3
DOI : 10.1145/2450153.2450154

Y. Nakatsukasa and N. J. Higham, Stable and Efficient Spectral Divide and Conquer Algorithms for the Symmetric Eigenvalue Decomposition and the SVD, SIAM Journal on Scientific Computing, vol.35, issue.3, pp.1325-1349, 2013.
DOI : 10.1137/120876605

S. Rajamanickam, Efficient algorithms for sparse singular value decomposition, 2009.

R. Schreiber and C. V. Loan, A Storage-Efficient $WY$ Representation for Products of Householder Transformations, SIAM Journal on Scientific and Statistical Computing, vol.10, issue.1, pp.53-57, 1989.
DOI : 10.1137/0910005

L. N. Trefethen and D. Bau, Numerical linear algebra, Society for Industrial and Applied Mathematics, 1997.
DOI : 10.1137/1.9780898719574

K. E. Vipin, Significant performance improvement of symmetric eigensolvers and SVD in Intel MKL 11.2, 2014. https://software.intel.com/en- us/articles/significant-performance-improvment-of-symmetric- eigensolvers-and-svd-in-intel-mkl-112

P. R. Willems, B. Lang, and C. Vömel, Computing the Bidiagonal SVD Using Multiple Relatively Robust Representations, SIAM Journal on Matrix Analysis and Applications, vol.28, issue.4, pp.907-926, 2006.
DOI : 10.1137/050628301

L. Wu, E. Romero, and A. Stathopoulos, PRIMME SVDS: A highperformance preconditioned SVD solver for accurate large-scale computations, 2016.

L. Wu and A. Stathopoulos, A Preconditioned Hybrid SVD Method for Accurately Computing Singular Triplets of Large Matrices, Inria RESEARCH CENTRE GRENOBLE ? RHÔNE-ALPES Inovallée 655 avenue de l'Europe Montbonnot 38334 Saint Ismier Cedex Publisher Inria Domaine de Voluceau -Rocquencourt BP 105 -78153 Le Chesnay Cedex inria.fr ISSN, pp.365-388, 2015.
DOI : 10.1137/140979381