S. Petrov and R. Mcdonald, Overview of the 2012 shared task on parsing the web, Notes of the First Workshop on Syntactic Analysis of Non-Canonical Language (SANCL), 2012.

D. Freitag, Trained named entity recognition using distributional clusters, EMNLP, 2004.

S. Miller, J. Guinness, and A. Zamanian, Name tagging with word clusters and discriminative training, HLT-NAACL, 2004.

F. Huang and A. Yates, Distributional representations for handling sparsity in supervised sequence-labeling, Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1, ACL-IJCNLP '09, 2009.
DOI : 10.3115/1687878.1687948

F. Huang, A. Yates, A. Ahuja, and D. Downey, Language models as representations for weakly supervised NLP tasks, Computational Natural Language Learning (CoNLL), 2011.
DOI : 10.1162/coli_a_00167

URL : http://doi.org/10.1162/coli_a_00167

M. Candito, E. H. Anguiano, and D. Seddah, A word clustering approach to domain adaptation: effective parsing of biomedical texts, Proceedings of the 12th International Conference on Parsing Technologies, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00659577

D. Seddah, B. Sagot, and M. Candito, The Alpage architecture at the SANCL 2012 shared task: robust pre-processing and lexical bridging for user-generated content parsing, Notes of the First Workshop on Syntactic Analysis of Non-Canonical Language (SANCL), 2012.
URL : https://hal.archives-ouvertes.fr/hal-00703124

K. Hayashi, K. Kondo, Y. Duh, and . Matsumoto, The NAIST dependency parser for SANCL 2012 shared task, Notes of the First Workshop on Syntactic Analysis of Non-Canonical Language (SANCL), 2012.

X. Wu and D. A. Smith, Semi-supervised deterministic shift-reduce parsing with word embeddings, Notes of the First Workshop on Syntactic Analysis of Non-Canonical Language (SANCL), 2012.

E. Grave, G. Obozinski, and F. Bach, Hidden Markov tree models for semantic class induction, Proceedings of the Seventeenth Conference on Computational Natural Language Learning, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00833288

O. Cappé and E. Moulines, On-line expectation-maximization algorithm for latent data models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.11, issue.3, 2009.
DOI : 10.1111/j.1467-9868.2009.00698.x

C. Pal, C. Sutton, and A. Mccallum, Sparse Forward-Backward Using Minimum Divergence Beams for Fast Training Of Conditional Random Fields, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 2006.
DOI : 10.1109/ICASSP.2006.1661342

J. Lafferty, A. Mccallum, and F. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, Proceedings of the 18th International Conference on Machine Learning, 2001.

S. Petrov, D. Das, and R. Mcdonald, A universal part-of-speech tagset, Proceedings of LREC, 2012.

Y. Tateisi, A. Yakushiji, T. Ohta, and J. Tsujii, Syntax annotation for the Genia corpus, Proceedings of IJCNLP, 2005.

O. Owoputi, B. O-'connor, C. Dyer, K. Gimpel, N. Schneider et al., Improved part-of-speech tagging for online conversational text with word clusters, Proceedings of NAACL, 2013.