M. Bl, X. , and E. '. , s research was funded by the European Research Council (ERC-2011-AdG-295810 BOOT- PHON), the Agence Nationale pour la Recherche (ANR- 2010-BLAN-1901-1 BOOTLANG) and the Fondation de France. It was also supported by ANR-10-IDEX-0001- 02 PSL and ANR-10-LABX-0087 IEC. MJ's research was partially supported under the Australian Research Council's Discovery Projects funding scheme

E. Amigo, J. Gonzalo, J. Artiles, and F. Verdejo, A comparison of extrinsic clustering evaluation metrics based on formal constraints, Information Retrieval, vol.30, issue.4, pp.461-486, 2009.
DOI : 10.1007/s10791-008-9066-8

N. Bernstein-ratner, The phonology of parent-child speech, Children's Language, pp.159-174, 1987.

M. Brent and T. Cartwright, Distributional regularity and phonotactic constraints are useful for segmentation, Cognition, vol.61, issue.1-2, pp.93-125, 1996.
DOI : 10.1016/S0010-0277(96)00719-6

M. Brent, An efficient, probabilistically sound algorithm for segmentation and word discovery, Machine Learning, pp.71-105, 1999.

L. Catanese, N. Souvirà-a-labastie, B. Qu, S. Campion, G. Gravier et al., MODIS: an audio motif discovery software, Proc. of INTER- SPEECH 2013, pp.2675-2677, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00931227

R. Daland and J. Pierrehumbert, Learning Diphone-Based Segmentation, Cognitive Science, vol.7, issue.3, pp.119-155, 2011.
DOI : 10.1111/j.1551-6709.2010.01160.x

R. Daland and K. Zuraw, Does korean defeat phonotactic word segmentation?, Proceedings of the 51th Annual Meeting of the ACL, pp.873-877, 2013.

M. Dredze, A. Jansen, G. Coppersmith, and K. Church, NLP on spoken documents without ASR, Proc. of EMNLP 2010, pp.460-470, 2010.

M. Elsner, S. Goldwater, and J. Eisenstein, Bootstrapping a unified model of lexical and phonetic acquisition, Proceedings of the 50th Annual Meeting of the ACL, pp.184-193, 2012.

R. Flamary, X. Anguera, and N. Oliver, Spoken WordCloud: Clustering recurrent patterns in speech, 2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI), pp.133-138, 2011.
DOI : 10.1109/CBMI.2011.5972534
URL : https://hal.archives-ouvertes.fr/hal-00582799

J. Glass, Towards unsupervised speech processing, 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA), pp.1-4, 2012.
DOI : 10.1109/ISSPA.2012.6310546

S. Goldwater, T. Griffiths, and M. Johnson, A Bayesian framework for word segmentation: Exploring the effects of context, Cognition, vol.112, issue.1, pp.21-54, 2009.
DOI : 10.1016/j.cognition.2009.03.008

C. Herley, ARGOS: automatically extracting repeating objects from multimedia streams, IEEE Transactions on Multimedia, vol.8, issue.1, pp.115-129, 2006.
DOI : 10.1109/TMM.2005.861286

A. Jansen and B. Van-durme, Efficient spoken term discovery using randomized algorithms, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp.401-406, 2011.
DOI : 10.1109/ASRU.2011.6163965

A. Jansen, E. Dupoux, S. Goldwater, M. Johnson, S. Khudanpur et al., A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.8111-8115, 2013.
DOI : 10.1109/ICASSP.2013.6639245

M. Johnson and S. Goldwater, Improving nonparameteric Bayesian inference, Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics on, NAACL '09, pp.317-325, 2009.
DOI : 10.3115/1620754.1620800

M. Johnson, T. Griffiths, and S. Goldwater, Adaptor Grammars: A framework for specifying compositional nonparametric Bayesian models, Advances in Neural Information Processing Systems 19, pp.641-648, 2007.

C. Lee and J. Glass, A nonparametric bayesian approach to acoustic model discovery, Proceedings of the 50th Annual Meeting of the ACL, pp.40-49, 2012.

I. Malioutov, A. Park, R. Barzilay, and J. Glass, Making sense of sound: Unsupervised topic segmentation over acoustic input, Proc. of ACL 2007, pp.504-511, 2007.

F. Mcinnes and S. Goldwater, Unsupervised extraction of recurring words from infant-directed speech, Proceedings of the 33rd Annual Conference of the Cognitive Science Society, 2011.

A. Muscariello, G. Gravier, and F. Bimbot, Zeroresource audio-only spoken term detection based on a combination of template matching techniques, Proc. of INTERSPEECH 2011, pp.921-924, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00597907

A. Muscariello, G. Gravier, and F. Bimbot, Unsupervised Motif Acquisition in Speech via Seeded Discovery and Template Matching Combination, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.7, pp.2031-2044, 2012.
DOI : 10.1109/TASL.2012.2194283
URL : https://hal.archives-ouvertes.fr/hal-00740978

G. Neubig, M. Mimura, S. Mori, and T. Kawahara, Bayesian Learning of a Language Model from Continuous Speech, Proc. of INTERSPEECH-2010, pp.1053-1056, 2010.
DOI : 10.1587/transinf.E95.D.614

A. Park and R. Glass, Unsupervised Pattern Discovery in Speech, IEEE Transactions on Audio, Speech, and Language Processing, vol.16, issue.1, pp.186-197, 2008.
DOI : 10.1109/TASL.2007.909282

M. Pitt, L. Dilley, K. Johnson, S. Kiesling, W. Raymond et al., Buckeye corpus of conversational speech, 2007.

M. Tekieli and W. Cullinan, The Perception of Temporally Segmented Vowels and Consonant-Vowel Syllables, Journal of Speech Language and Hearing Research, vol.22, issue.1, pp.103-121, 1979.
DOI : 10.1044/jshr.2201.103