C. Aggarwal, On k-anonymity and the curse of dimensionality, Proceedings of the 31st VLDB Conference, 2005.

D. Arthur and S. Vassilvitski, k-means++: The advantages of careful seeding, Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, 2007.

H. Attouch, J. Bolte, P. Redont, and A. Soubeyran, Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-??ojasiewicz Inequality, Mathematics of Operations Research, vol.35, issue.2, 2010.
DOI : 10.1287/moor.1100.0449

F. R. Bach, Sharp analysis of low-rank kernel matrix approximations, International Conference on Learning Theory (COLT), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00723365

F. R. Bach and Z. Harchaoui, Diffrac: a discriminative and flexible framework for clustering, Advances in Neural Information Processing Systems, pp.49-56, 2008.

H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2011.
DOI : 10.1007/978-3-319-48311-5
URL : https://hal.archives-ouvertes.fr/hal-00643354

A. Beck and M. Teboulle, A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems, SIAM Journal on Imaging Sciences, vol.2, issue.1, pp.183-202, 2009.
DOI : 10.1137/080716542
URL : http://ie.technion.ac.il/%7Ebecka/papers/finalicassp2009.pdf

J. Bolte, S. Sabach, and M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Mathematical Programming, pp.459-494, 2014.
DOI : 10.1007/BF01584660
URL : https://hal.archives-ouvertes.fr/hal-00916090

L. Bottou and Y. Bengio, Convergence properties of the k-means algorithms, Advances in Neural Information Processing Systems 7, pp.585-592, 1995.

F. Bunea, C. Giraud, M. Royer, and N. Verzelen, PECOK: A convex optimization approach to variable clustering, p.2016

E. J. Candès, The restricted isometry property and its implications for compressed sensing, Comptes Rendus Mathematique, vol.346, issue.9-10, pp.589-592, 2008.
DOI : 10.1016/j.crma.2008.03.014

E. J. Candès, M. B. Wakin, and S. P. Boyd, Enhancing sparsity by reweighted ?1 minimization, Journal of Fourier analysis and applications, vol.14, pp.5-6877, 2008.

A. Chambolle and C. Dossal, On the convergence of the iterates of "fista, Journal of Optimization Theory and Applications, issue.166, p.2015
URL : https://hal.archives-ouvertes.fr/hal-01060130

P. L. Combettes and J. Pesquet, Proximal splitting methods in signal processing In Fixed-point algorithms for inverse problems in science and engineering, pp.185-212, 2011.

P. L. Combettes and V. R. Wajs, Signal Recovery by Proximal Forward-Backward Splitting, Multiscale Modeling & Simulation, vol.4, issue.4, pp.1168-1200, 2005.
DOI : 10.1137/050626090
URL : https://hal.archives-ouvertes.fr/hal-00017649

L. Condat, Fast projection onto the simplex and the $$\pmb {l}_\mathbf {1}$$ l 1 ball, Mathematical Programming, vol.31, issue.3, pp.575-585, 2016.
DOI : 10.1137/080714488

L. Condat, A convex approach to k-means clustering and image segmentation, p.2017
URL : https://hal.archives-ouvertes.fr/hal-01504799

F. De-la-torre and T. Kanade, Discriminative cluster analysis, Proceedings of the 23rd international conference on Machine learning , ICML '06, 2006.
DOI : 10.1145/1143844.1143875

C. Ding and T. Li, -means clustering, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.521-528, 2007.
DOI : 10.1145/1273496.1273562

D. L. Donoho and M. Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via ?1 minimization, Proceedings of the National Academy of Sciences, pp.2197-2202, 2003.
DOI : 10.1073/pnas.0437847100
URL : http://www.pnas.org/content/100/5/2197.full.pdf

D. L. Donoho and B. F. Logan, Signal Recovery and the Large Sieve, SIAM Journal on Applied Mathematics, vol.52, issue.2, pp.577-591, 1992.
DOI : 10.1137/0152031

J. Duchi, S. Shalev-shwartz, Y. Singer, and T. Chandra, Efficient projections onto the l 1-ball for learning in high dimensions, Proceedings of the 25th international conference on Machine learning, pp.272-279, 2008.

N. Flammarion, B. Palaniappan, and F. R. Bach, Robust discriminative clustering with sparse regularizers, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01357666

T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu, The entire regularization path for the support vector machine, Journal of Machine Learning Research, vol.5, pp.1391-1415, 2004.

X. He, D. Cai, and P. Niyogi, Laplacian score for feature selection, Advances in Neural Information Processing Systems 18, pp.507-514, 2006.

H. Lawrence and A. Phipps, Comparing partitions, Journal of Classification, 1985.

P. Lions and B. Mercier, Splitting Algorithms for the Sum of Two Nonlinear Operators, SIAM Journal on Numerical Analysis, vol.16, issue.6, pp.964-979, 1979.
DOI : 10.1137/0716071

J. Mairal and B. Yu, Complexity analysis of the lasso regularization path, Proceedings of the 29th International Conference on Machine Learning (ICML-12), pp.353-360, 2012.

D. G. Mixon, S. Villar, and R. Ward, Clustering subgaussian mixtures with k-means. Information and inference, pp.1-27, 2017.
DOI : 10.1109/itw.2016.7606826

S. Mosci, L. Rosasco, M. Santoro, A. Verri, and S. Villa, Solving Structured Sparsity Regularization with Proximal Methods, Machine Learning and Knowledge Discovery in Databases, pp.418-433
DOI : 10.1007/978-3-642-15883-4_27
URL : http://lcsl.mit.edu/papers/prox_ECML.pdf

A. Y. Ng, M. I. Jordan, and Y. Weiss, On spectral clustering: Analysis and an algorithm, Advances in Neural Information Processing Systems 14, pp.849-856, 2002.

J. Peng and Y. Wei, Approximating K???means???type Clustering via Semidefinite Programming, SIAM Journal on Optimization, vol.18, issue.1, pp.186-205, 2017.
DOI : 10.1137/050641983
URL : http://www.cas.mcmaster.ca/~oplab/publication/report/2005-5.pdf

M. Radovanovic, A. Nanopoulos, and M. Ivanovic, Hubs in space : Popular nearest neighbors in high-dimensional data, Journal of Machine Learning Research, vol.11, pp.2487-2531

S. Z. Selim and M. A. Ismail, K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.6, issue.1, pp.81-87, 1984.
DOI : 10.1109/TPAMI.1984.4767478

S. Sra, S. Nowozin, and S. J. Wright, Optimization for Machine Learning, 2012.

R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), pp.267-288, 1996.
DOI : 10.1111/j.1467-9868.2011.00771.x

R. Tibshirani, G. Walther, and T. Hastie, Estimating the number of clusters in a data set via the gap statistic, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.63, issue.2, pp.411-423, 2001.
DOI : 10.1111/1467-9868.00293

L. J. Van-der-maaten and G. E. Hinton, Visualizing high-dimensional data using t-sne, Journal of Machine Learning Research, vol.9, pp.2579-2605, 2008.

U. and V. Luxburg, A tutorial on spectral clustering, Statistics and Computing, vol.21, issue.1, 2007.
DOI : 10.1017/CBO9780511810633

B. Wang, J. Zhu, E. Pierson, S. Ramazzotti, and . Batzoglou, Visualization and analysis of single-cell rna-seq data by kernel-based similarity learning, Nature methods, issue.14, p.2017

C. Wei-chien, On using principal components before separating a mixture of two multivariate normal distributions, Journal of the Royal Statistical Society, vol.32, issue.3, 1983.

D. Witten and R. Tibshirani, A Framework for Feature Selection in Clustering, Journal of the American Statistical Association, vol.105, issue.490, pp.713-726, 2010.
DOI : 10.1198/jasa.2010.tm09415