D. M. Blei, A. Y. Ng, and M. I. Jordan, Latent dirichlet allocation, The Journal of Machine Learning Research, 2003.

J. L. Boyd-graber and D. Blei, Syntactic topic models, Advances in Neural Information Processing Systems 21, 2009.

P. F. Brown, P. V. Desouza, R. L. Mercer, V. J. Della-pietra, and J. C. Lai, Class-based ngram models of natural language, 1992.

O. Cappé and E. Moulines, On-line expectation-maximization algorithm for latent data models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.11, issue.3, 2009.
DOI : 10.1111/j.1467-9868.2009.00698.x

M. Ciaramita and Y. Altun, Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger, Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, EMNLP '06, 2006.
DOI : 10.3115/1610075.1610158

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.178.7563

M. C. De-marneffe and C. D. Manning, The Stanford typed dependencies representation, Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation, CrossParser '08, 2008.
DOI : 10.3115/1608858.1608859

M. C. De-marneffe, B. Maccartney, and C. D. Manning, Generating typed dependency parses from phrase structure parses, Proceedings of LREC, 2006.

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, Indexing by latent semantic analysis, Journal of the American society for information science, 1990.
DOI : 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.108.8490

M. Faruqui, S. Padó, and M. Sprachverarbeitung, Training and evaluating a German named entity recognizer with semantic generalization, Semantic Approaches in Natural Language Process- ing, 2010.

D. Freitag, Trained named entity recognition using distributional clusters, Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, 2004.

G. Haffari, M. Razavi, and A. Sarkar, An ensemble model that combines syntactic and semantic clustering for discriminative dependency parsing, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 2011.

T. Hofmann, Probabilistic latent semantic analysis, Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, 1999.

R. Kneser and H. Ney, Improved clustering techniques for class-based statistical language modelling, Third European Conference on Speech Communication and Technology, 1993.

T. Koo, X. Carreras, and M. Collins, Simple semi-supervised dependency parsing, Proceedings of ACL-08, 2008.

J. Lafferty, A. Mccallum, and F. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, Proceedings of the 18th International Conference on Machine Learning, 2001.

W. Li and A. Mccallum, Semi-supervised sequence modeling with syntactic topic models, Proceedings of the National Conference on Artificial Intelligence, 2005.

P. Liang and D. Klein, Online EM for unsupervised models, Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics on, NAACL '09, 2009.
DOI : 10.3115/1620754.1620843

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.218.5525

P. Liang, Semi-supervised learning for natural language, 2005.

K. Lund and C. Burgess, Producing high-dimensional semantic spaces from lexical cooccurrence, Behavior Research Methods, Instruments , & Computers, 1996.
DOI : 10.3758/bf03204766

S. Miller, J. Guinness, and A. Zamanian, Name tagging with word clusters and discriminative training, Proceedings of HLT-NAACL, 2004.

C. Pal, C. Sutton, and A. Mccallum, Sparse Forward-Backward Using Minimum Divergence Beams for Fast Training Of Conditional Random Fields, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 2006.
DOI : 10.1109/ICASSP.2006.1661342

S. Petrov, Coarse-to-Fine Natural Language Processing, 2009.
DOI : 10.1007/978-3-642-22743-1

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.322.3808

E. Sandhaus, The New York Times annotated corpus. Linguistic Data Consortium, 2008.

D. O. Séaghdha, Latent variable models of selectional preference, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 2010.

O. Täckström, R. Mcdonald, and J. Uszkoreit, Cross-lingual word clusters for direct transfer of linguistic structure, Proceedings of the 2012 Conference of the North American Chapter, 2012.

S. Tratz and E. Hovy, A fast, accurate, nonprojective , semantically-enriched parser, Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 2011.

J. Turian, L. Ratinov, and Y. Bengio, Word representations: a simple and general method for semisupervised learning, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 2010.

J. Uszkoreit and T. Brants, Distributed word clustering for large scale class-based language modeling in machine translation, Proceedings of ACL- 08, 2008.

M. J. Wainwright and M. I. Jordan, Graphical models, exponential families, and variational inference . Foundations and Trends R in Machine Learn- ing, 2008.
DOI : 10.1561/2200000001

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.192.2462