A. Jansen, E. Dupoux, S. Goldwater, M. Johnson, S. Khudanpur et al., A summary of the 2012 jhu clsp workshop on zero resource speech technologies and models of early language acquisition, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.8111-8115, 2013.

M. Versteegh, R. Thiollière, T. Schatz, X. Kam, X. Anguera et al., The zero resource speech challenge, vol.09, p.2015, 2015.

E. Dunbar, X. Cao, J. Benjumea, J. Karadayi, M. Bernard et al., The zero resource speech challenge 2017, CoRR, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01664586

E. Dunbar, R. Algayres, J. Karadayi, M. Bernard, J. Benjumea et al., The zero resource speech challenge 2019: Tts without t, vol.04, p.2019
URL : https://hal.archives-ouvertes.fr/hal-02274112

H. Kamper, Truly unsupervised acoustic word embeddings using weak top-down constraints in encoder-decoder models, CoRR, 2018.

N. Holzenberger, M. Du, J. Karadayi, R. Riad, and E. Dupoux, Learning word embeddings: Unsupervised methods for fixedsize representations of variable-length speech segments, pp.2683-2687, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01888708

R. Riad, C. Dancette, J. Karadayi, N. Zeghidour, T. Schatz et al., Sampling strategies in siamese networks for unsupervised speech representation learning, CoRR, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01888725

A. L. Maas, S. D. Miller, T. M. , and A. Y. Ng, Word-level acoustic modeling with convolutional vector regression

K. Levin, K. Henry, A. Jansen, and K. Livescu, Fixeddimensional acoustic embeddings of variable-length segments in low-resource settings, vol.12, pp.410-415, 2013.

Y. Chung, C. Wu, C. Shen, H. Lee, and L. Lee, Audio word2vec: Unsupervised learning of audio segment representations using sequence-to-sequence autoencoder, 2016.

Y. Chung, W. Weng, S. Tong, and J. R. Glass, Unsupervised cross-modal alignment of speech and text embedding spaces, CoRR, 2018.

D. Renshaw, H. Kamper, A. Jansen, and S. Goldwater, A comparison of neural network methods for unsupervised representation learning on the zero resource speech challenge, 2015.

T. J. Hazen, W. Shen, and C. White, Query-by-example spoken term detection using phonetic posteriorgram templates, IEEE Workshop on Automatic Speech Recognition Understanding, pp.421-426, 2009.

K. Levin, A. Jansen, and B. Van-durme, Segmental acoustic indexing for zero resource keyword search, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.5828-5832, 2015.

Y. Wang, H. Lee, and L. Lee, Segmental audio word2vec: Representing utterances as sequences of vectors with applications in spoken term detection, CoRR, 2018.

A. S. Park and J. R. Glass, Unsupervised pattern discovery in speech, IEEE Transactions on Audio, Speech, and Language Processing, vol.16, issue.1, pp.186-197, 2008.

C. Lee, T. J. O'donnell, and J. Glass, Unsupervised lexicon discovery from acoustic input, Transactions of the Association for Computational Linguistics, vol.3, pp.389-403, 2015.

O. Räsänen, G. Doyle, and M. Frank, Unsupervised word discovery from speech using automatic segmentation into syllable-like units, vol.09, p.2015

S. Goldwater, T. Griffiths, and M. Johnson, A bayesian framework for word segmentation: Exploring the effects of context, Cognition, vol.112, pp.21-54, 2009.

H. Kamper, A. Jansen, and S. Goldwater, A segmental framework for fully-unsupervised large-vocabulary speech recognition, Computer Speech and Language, 2017.

K. Kawakami, C. Dyer, and P. Blunsom, Unsupervised word discovery with segmental neural language models, CoRR, 2018.

M. A. Carlin, S. Thomas, A. Jansen, and H. Hermansky, Rapid evaluation of speech representations for spoken term discovery, INTERSPEECH, 2011.

T. Schatz, V. Peddinti, F. Bach, A. Jansen, H. Hermansky et al., Evaluating speech features with the minimal-pair abx task: Analysis of the classical mfc/plp pipeline, INTER-SPEECH 2013: 14th Annual Conference of the International Speech Communication Association, vol.01, p.2013
URL : https://hal.archives-ouvertes.fr/hal-00918599

S. B. Davis and P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, ACOUSTICS, SPEECH AND SIGNAL PRO-CESSING, pp.357-366, 1980.

H. Hermansky, Perceptual linear predictive (plp) analysis of speech, The Journal of the Acoustical Society of America, vol.87, issue.4, pp.1738-52, 1990.

R. Thiollière, E. Dunbar, G. Synnaeve, M. Versteegh, and E. Dupoux, A hybrid dynamic time warping-deep neural network architecture for unsupervised acoustic modeling, INTER-SPEECH, 2015.

P. Last, H. A. Engelbrecht, and H. Kamper, Unsupervised feature learning for speech using correspondence and siamese networks, IEEE Signal Processing Letters, vol.27, pp.421-425, 2020.

A. Thual, C. Dancette, J. Karadayi, J. Benjumea, and E. Dupoux, A k-nearest neighbours approach to unsupervised spoken term discovery, IEEE Spoken Language Technology Workshop (SLT), pp.491-497, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01947953

J. Johnson, M. Douze, and H. Jégou, Billion-scale similarity search with gpus, 2017.

G. K. Zipf, The psycho-biology of language, 1935.

E. Parzen, On estimation of a probability density function and mode, Ann. Math. Statist, vol.33, issue.3, pp.1065-1076, 1962.

H. Kamper, A. Jansen, S. King, and S. Goldwater, Unsupervised lexical clustering of speech segments using fixed-dimensional acoustic embeddings, 2014 IEEE Spoken Language Technology Workshop (SLT), pp.100-105, 2014.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion et al., Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, vol.12, pp.2825-2830, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00650905

J. Wu, The Uniform Effect of K-means Clustering, pp.17-35, 2012.

M. A. Pitt, K. Johnson, E. Hume, S. Kiesling, and W. Raymond, The buckeye corpus of conversational speech: Labeling conventions and a test of transcriber reliability, Speech Communication, vol.45, issue.1, pp.89-95, 2005.