D. Achlioptas and F. Mcsherry, On spectral learning of mixtures of distributions. Learning Theory, pp.458-469, 2005.

P. Ahrendt, The Multivariate Gaussian Probability Distribution, 2005.

N. Ailon, R. Jaiswal, and C. Monteleoni, Streaming k -means approximation, Advances in Neural Information Processing Systems (NIPS), pp.10-18, 2009.

J. Anderson, M. Belkin, N. Goyal, L. Rademacher, and J. Voss, The more, the merrier: the blessing of dimensionality for learning large gaussian mixtures, pp.1-30, 2013.

C. Andrieu and A. Doucet, Online expectation-maximization type algorithms for parameter estimation in general state space models, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)., 2003.
DOI : 10.1109/ICASSP.2003.1201620

R. Arora, A. Cotter, K. Livescu, and N. Srebro, Stochastic optimization for PCA and PLS, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp.861-868, 2012.
DOI : 10.1109/Allerton.2012.6483308

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.411.8556

D. Arthur and S. Vassilvitskii, k-means++: The Advantages of Careful Seeding, ACM-SIAM symposium on Discrete algorithms, pp.1027-1035, 2007.

F. Bach, On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions, pp.1-38, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01118276

A. Balsubramani, S. Dasgupta, and Y. Freund, The fast convergence of incremental pca, Advances in Neural Information Processing Systems 26, pp.3174-3182, 2013.

R. Baraniuk, Compressive sensing, 2008 42nd Annual Conference on Information Sciences and Systems, pp.118-121, 2007.
DOI : 10.1109/CISS.2008.4558479

URL : https://hal.archives-ouvertes.fr/hal-00452261

R. Baraniuk, M. Davenport, A. Ronald, . Devore, B. Michael et al., A Simple Proof of the Restricted Isometry Property for Random Matrices, Constructive Approximation, vol.159, issue.2, pp.253-263, 2008.
DOI : 10.1007/978-3-642-60932-9

M. Belkin and K. Sinha, Polynomial learning of distribution families, IEEE 51st Annual Symposium on Foundations of Computer Science. Ieee, 2010.
DOI : 10.1137/13090818x

URL : http://arxiv.org/abs/1004.4864

M. Belkin and K. Sinha, Toward Learning Gaussian Mixtures with Arbitrary Separation, Conference On Learning Theory (COLT), 2010.

K. Bertin, L. Pennec, and V. Rivoirard, Adaptive Dantzig density estimation, Annales De L Institut Henri Poincare-Probabilites Et Statistiques, pp.43-74, 2011.
DOI : 10.1214/09-AIHP351

URL : https://hal.archives-ouvertes.fr/hal-00381984

M. Bojarski, K. Choromanska, and . Choromanski, Structured adaptive and random spinners for fast machine learning computations. arXiv, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01505587

A. Bourrier, M. E. Davies, T. Peleg, P. Perez, and R. Gribonval, Fundamental performance limits for ideal decoders in high-dimensional linear inverse problems. Information Theory, IEEE Transactions on, vol.60, issue.12, pp.7928-7946, 2014.
DOI : 10.1109/tit.2014.2364403

URL : https://hal.archives-ouvertes.fr/hal-00908358

A. Bourrier, R. Gribonval, and P. Perez, Compressive Gaussian Mixture estimation, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.6024-6028, 2013.
DOI : 10.1109/ICASSP.2013.6638821

URL : https://hal.archives-ouvertes.fr/hal-00811819

E. Candès, T. Strohmer, and V. Voroninski, PhaseLift: Exact and Stable Signal Recovery from Magnitude Measurements via Convex Programming, Communications on Pure and Applied Mathematics, vol.38, issue.5, pp.661241-1274, 2013.
DOI : 10.1109/9.554402

J. Emmanuel, C. Candès, and . Fernandez-granda, Super-resolution from noisy data, Journal of Fourier Analysis and Applications, vol.19, issue.6, pp.1229-1254, 2013.

J. Emmanuel, J. Candès, T. Romberg, and . Tao, Stable Signal Recovery from Incomplete and Inaccurate Measurements, Comm. Pure Appl. Math, vol.59, pp.1207-1223, 2006.

O. Cappé and E. Moulines, On-line expectation-maximization algorithm for latent data models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.11, issue.3, pp.593-613, 2009.
DOI : 10.1007/978-1-4684-0192-9

M. Carrasco and J. Florens, GENERALIZATION OF GMM TO A CONTINUUM OF MOMENT CONDITIONS, Econometric Theory, vol.16, issue.6, 2000.
DOI : 10.1017/S0266466600166010

M. Carrasco and J. Florens, Efficient GMM estimation using the empirical characteristic function. IDEI Working Paper, 2002.
DOI : 10.2139/ssrn.2342174

M. Carrasco and J. Florens, ON THE ASYMPTOTIC EFFICIENCY OF GMM, Econometric Theory, vol.14, issue.02, pp.372-406, 2014.
DOI : 10.2307/1913241

A. Cohen, W. Dahmen, and R. Devore, Compressed sensing and best $k$-term approximation, Journal of the American Mathematical Society, vol.22, issue.1, 2009.
DOI : 10.1090/S0894-0347-08-00610-3

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.148.5477

G. Cormode, M. Garofalakis, P. J. Haas, and C. Jermaine, Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches. Foundations and Trends in Databases, pp.1-294, 2011.
DOI : 10.1561/1900000004

G. Cormode, N. Minos, . Garofalakis, J. Peter, C. Haas et al., Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches, Foundations and Trends in Databases, vol.4, issue.1-3, pp.1-31, 2012.
DOI : 10.1561/1900000004

G. Cormode and M. Hadjieleftheriou, Methods for finding frequent items in data streams, The VLDB Journal, vol.15, issue.5, pp.3-20, 2009.
DOI : 10.1080/15427951.2004.10129079

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.187.9800

G. Cormode and S. Muthukrishnan, An improved data stream summary: the count-min sketch and its applications, Journal of Algorithms, vol.55, issue.1, pp.58-75, 2005.
DOI : 10.1016/j.jalgor.2003.12.001

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.116.9395

G. Cormode and S. Muthukrishnan, An improved data stream summary: the count-min sketch and its applications, Journal of Algorithms, vol.55, issue.1, pp.58-75, 2005.
DOI : 10.1016/j.jalgor.2003.12.001

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.116.9395

T. Cover and J. Thomas, Elements of Information Theory, 1991.
DOI : 10.1002/047174882x

F. Cucker and S. Smale, On the mathematical foundations of learning, Bulletin of the American Mathematical Society, vol.39, issue.01, pp.1-49, 2002.
DOI : 10.1090/S0273-0979-01-00923-5

S. Dasgupta and L. J. Schulman, A Two-Round Variant of EM for Gaussian Mixtures, Uncertainty in Artificial Intelligence, pp.152-159, 2000.

Y. De-castro, D. Gamboa, J. Henrion, and . Lasserre, Exact Solutions to Super Resolution on Semi-Algebraic Domains in Higher Dimensions, IEEE Transactions on Information Theory, vol.63, issue.1, 2015.
DOI : 10.1109/TIT.2016.2619368

URL : https://hal.archives-ouvertes.fr/hal-01114328

S. Dirksen, Dimensionality Reduction with Subgaussian Matrices: A Unified Theory, Foundations of Computational Mathematics, vol.14, issue.1, 2014.
DOI : 10.1007/s10208-015-9280-x

L. David and . Donoho, Compressed sensing, IEEE Trans. Information Theory, vol.52, issue.4, pp.1289-1306, 2006.

J. Duchi, Derivations for Linear Algebra and Optimization, 2007.

C. John, . Duchi, I. Michael, . Jordan, J. Martin et al., Privacy Aware Learning, 2012.

R. Dudley, Real Analysis and Probability, 2002.
DOI : 10.1017/CBO9780511755347

V. Duval and G. Peyré, Exact Support Recovery for Sparse Spikes Deconvolution, Foundations of Computational Mathematics, vol.15, issue.5, pp.1315-1355, 2015.
DOI : 10.1007/s10208-014-9228-6

URL : https://hal.archives-ouvertes.fr/hal-00839635

A. Eftekhari, B. Michael, and . Wakin, New Analysis of Manifold Embeddings and Signal Recovery from Compressive Measurements. arXiv, 2013.

K. Fan, On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations I, Proceedings of the National Academy of Sciences, vol.35, issue.11, pp.652-655, 1949.
DOI : 10.1073/pnas.35.11.652

A. Alexei, P. Fedotov, F. Harremoës, and . Topsøe, Refinements of Pinsker's Inequality, IEEE Trans. Inf. Theor, vol.49, issue.6, pp.1491-1498, 2003.

D. Feldman, M. Faulkner, and A. Krause, Scalable Training of Mixture Models via Coresets, Proceedings of Neural Information Processing Systems, pp.1-9, 2011.

D. Feldman and M. Langberg, A unified framework for approximating and clustering data, Proceedings of the 43rd annual ACM symposium on Theory of computing, STOC '11, pp.569-578, 2011.
DOI : 10.1145/1993636.1993712

URL : http://arxiv.org/abs/1106.1379

D. Feldman, M. Monemizadeh, C. Sohler, P. David, and . Woodruff, Coresets and Sketches for High Dimensional Subspace Approximation Problems, pp.630-649, 2010.
DOI : 10.1137/1.9781611973075.53

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.154.5460

A. Feuerverger, A. Roman, and . Mureika, The Empirical Characteristic Function and Its Applications, The Annals of Statistics, vol.5, issue.1, pp.88-97, 1977.
DOI : 10.1214/aos/1176343742

S. Foucart and H. Rauhut, A Mathematical Introduction to Compressive Sensing, 2012.
DOI : 10.1007/978-0-8176-4948-7

G. Frahling and C. Sohler, A fast k -means implementation using coresets, Proceedings of the twenty-second annual symposium on Computational geometry (SoCG), pp.605-625, 2005.
DOI : 10.1145/1137856.1137879

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.308.1671

M. Ghashami, D. Perry, and J. M. Phillips, Streaming Kernel Principal Component Analysis, International Conference on Artificial Intelligence and Statistics, pp.1-16, 2016.

C. Anna, Y. Gilbert, . Kotidis, . Muthukrishnan, J. Martin et al., How to summarize the universe: dynamic maintenance of quantiles, VLDB '02: Proceedings of the 28th international conference on Very Large Data Bases, pp.454-465, 2002.

R. Giryes, G. Sapiro, and A. M. Bronstein, Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?, IEEE Transactions on Signal Processing, vol.64, issue.13, 2016.
DOI : 10.1109/TSP.2016.2546221

URL : http://arxiv.org/abs/1504.08291

S. Guha, Clustering Data Streams, 2000.

A. R. Hall, Generalized method of moments, 2005.
DOI : 10.1002/9780470996249.ch12

R. Paul and . Halmos, Measure theory, 2013.

S. Har-peled and S. Mazumdar, Coresets for k-Means and k-Median Clustering and their Applications, Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pp.291-300, 2004.
DOI : 10.1145/1007352.1007400

D. Hsu and S. M. Kakade, Learning mixtures of spherical gaussians, Proceedings of the 4th conference on Innovations in Theoretical Computer Science, ITCS '13, 2013.
DOI : 10.1145/2422436.2422439

M. Kabanava, R. Kueng, H. Rauhut, and U. Terstiege, Stable low-rank matrix recovery via null space properties, Information and Inference, vol.5, issue.4, pp.405-441, 2016.
DOI : 10.1093/imaiai/iaw014

URL : http://arxiv.org/abs/1507.07184

N. Keriven, A. Bourrier, R. Gribonval, and P. Pérèz, Sketching for large-scale learning of mixture models, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), p.2015
DOI : 10.1109/ICASSP.2016.7472867

URL : https://hal.archives-ouvertes.fr/hal-01208027

N. Keriven, A. Bourrier, R. Gribonval, and P. Pérèz, Sketching for Large-Scale Learning of Mixture Models. arXiv preprint, pp.1-50, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01208027

N. Keriven, N. Tremblay, Y. Traonmilin, and R. Gribonval, Compressive K-means, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.
DOI : 10.1109/ICASSP.2017.7953382

URL : https://hal.archives-ouvertes.fr/hal-01386077

J. Henry and . Landau, Moments in mathematics, 1987.

C. Levrard, Fast rates for empirical vector quantization, Electronic Journal of Statistics, vol.7, issue.0, pp.1716-1746, 2013.
DOI : 10.1214/13-EJS822

URL : https://hal.archives-ouvertes.fr/hal-00664068

Q. Li and G. Tang, The Nonconvex Geometry of Low-Rank Matrix Optimizations with General Objective Functions, 2016.

M. Lucic, M. Faulkner, A. Krause, and D. Feldman, Training Mixture Models at Scale via Coresets, 2017.

J. Mairal, F. Bach, J. Ponce, and G. Sapiro, Online Learning for Matrix Factorization and Sparse Coding, Journal of Machine Learning Research, vol.11, issue.1, pp.19-60, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00408716

K. Whitney, D. Newey, and . Mcfadden, Large sample estimation and hypothesis testing, Handbook of Econometrics, pp.2111-2245, 1994.

G. Puy, E. Michael, R. Davies, and . Gribonval, Recipes for Stable Linear Embeddings From Hilbert Spaces to $ {\mathbb {R}}^{m}$, IEEE Transactions on Information Theory, vol.63, issue.4, pp.2171-2187, 2017.
DOI : 10.1109/TIT.2017.2664858

URL : https://hal.archives-ouvertes.fr/hal-01203614

A. Rahimi and B. Recht, Random Features for Large Scale Kernel Machines, Advances in Neural Information Processing Systems (NIPS), pp.1-8, 2007.

A. Rahimi and B. Recht, Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning, Advances in Neural Information Processing Systems (NIPS), pp.1-8, 2009.

A. Rudi, R. Camoriano, and L. Rosasco, Less is More: Nyström Computational Regularization. arXiv, 2015.

A. J. Smola, A. Gretton, L. Song, and B. Schölkopf, A Hilbert Space Embedding for Distributions, International Conference on Algorithmic Learning Theory, pp.13-31, 2007.
DOI : 10.1007/11941439_21

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.165.8341

K. Sridharan, A gentle introduction to concentration inequalities, Dept Comput Sci, 2002.

K. Bharath, Z. Sriperumbudur, and . Szabó-0001, Optimal Rates for Random Fourier Features, NIPS, 2015.

K. Bharath, A. Sriperumbudur, K. Gretton, B. Fukumizu, G. R. Schölkopf et al., Hilbert space embeddings and metrics on probability measures, The Journal of Machine Learning Research, vol.11, pp.1517-1561, 2010.

N. Thaper, S. Guha, P. Indyk, and N. Koudas, Dynamic multidimensional histograms, Proceedings of the 2002 ACM SIGMOD international conference on Management of data , SIGMOD '02, pp.428-439, 2002.
DOI : 10.1145/564691.564741

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.132.2078

Y. Traonmilin and R. Gribonval, Stable recovery of low-dimensional cones in Hilbert spaces: One RIP to rule them all, Applied and Computational Harmonic Analysis, 2016.
DOI : 10.1016/j.acha.2016.08.004

URL : https://hal.archives-ouvertes.fr/hal-01207987

A. Joel and . Tropp, User-friendly tail bounds for Sums of Random Matrices, Foundations of Computational Mathematics, 2011.

S. Vempala and G. Wang, A spectral algorithm for learning mixture models, Journal of Computer and System Sciences, vol.68, issue.4, pp.841-860, 2004.
DOI : 10.1016/j.jcss.2003.11.008