M. Bansal, K. Gimpel, and K. Livescu, Tailoring Continuous Word Representations for Dependency Parsing, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp.809-815, 2014.
DOI : 10.3115/v1/P14-2131
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.648.7587

M. Blondel, Y. Kubo, and U. Naonori, Online passive-aggressive algorithms for non-negative matrix factorization and completion, Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, pp.96-104, 2014.

Y. Chen, B. Perozzi, R. Al-rfou, and S. Skiena, The expressive power of word embeddings. arXiv preprint, 2013.

R. Collobert and J. Weston, A unified architecture for natural language processing, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008.
DOI : 10.1145/1390156.1390177

R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu et al., Natural language processing (almost) from scratch, Journal of Machine Learning Research, vol.12, pp.2493-2537, 2011.

K. Crammer, O. Dekel, J. Keshet, S. Shalev-shwartz, and Y. Singer, Online passive-aggressive algorithms, Journal of Machine Learning Research, vol.7, pp.551-585, 2006.

C. A. Floudas and V. Viswewaran, A global optimization algorithm (GOP) for certain classes of nonconvex NLPs???I. Theory, Computers & Chemical Engineering, vol.14, issue.12, pp.1397-1417, 1990.
DOI : 10.1016/0098-1354(90)80020-C

J. Gorski, F. Pfeuffer, and K. Klamroth, Biconvex sets and optimization with biconvex functions: a survey and extensions, Mathematical Methods of Operations Research, vol.21, issue.1, pp.373-407, 2007.
DOI : 10.1016/S0022-0000(76)80021-9

Y. Grandvalet and S. Canu, Adaptive scaling for feature selection in svms, Advances in Neural Information Processing Systems, 2003.

T. Koo, X. Carreras, and M. Collins, Simple semi-supervised dependency parsing, Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2008.

I. Labutov and H. Lipson, Re-embedding words, Proceedings of the 51th Annual Meeting of the Association for Computational Linguistics, 2013.

R. Lebret and R. Collobert, Word emdeddings through hellinger PCA, Proceedings of the 14th Conference of the European Chapter, 2014.
DOI : 10.3115/v1/e14-1051
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.412.7939

R. Lebret, J. Legrand, and R. Collobert, Is deep learning really necessary for word embeddings, NIPS Workshop on Deep Learning, 2013.

O. Levy and Y. Goldberg, Dependencybased word embeddings, ACL (2), pp.302-308, 2014.

O. Levy and Y. Goldberg, Neural word embedding as implicit matrix factorization, Advances in neural information processing systems, pp.2177-2185, 2014.

X. Li and D. Roth, Learning question classifiers, Proceedings of the 19th international conference on Computational linguistics -, pp.1-7, 2002.
DOI : 10.3115/1072228.1072378
URL : http://acl.ldc.upenn.edu/C/C02/C02-1150.pdf

A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng et al., Learning word vectors for sentiment analysis, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011.

T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient estimation of word representations in vector space. arXiv preprint, 2013.

A. Mnih, E. Geoffrey, and . Hinton, A scalable hierarchical distributed language model, Advances in neural information processing systems, pp.1081-1088, 2009.

J. Pennington, R. Socher, D. Christopher, and . Manning, Glove: Global Vectors for Word Representation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.1532-1575, 2014.
DOI : 10.3115/v1/D14-1162
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.645.8863

T. Schnabel, I. Labutov, D. Mimno, and T. Joachims, Evaluation methods for unsupervised word embeddings, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015.
DOI : 10.18653/v1/D15-1036
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.697.5781

J. Turian, L. Ratinov, and Y. Bengio, Word representations: a simple and general method for semi-supervised learning, Proceedings of the 48th annual meeting of the association for computational linguistics, pp.384-394, 2010.

M. Wang, D. Christopher, and . Manning, Effect of non-linear deep architecture in sequence labeling, IJCNLP, pp.1285-1291, 2013.