L. Adamic and B. Huberman, Zipf's Law and the Internet, Glottometrics, vol.3, issue.1, pp.143-150, 2002.

M. Ayenson, D. J. Wambach, A. Soltani, N. Good, and C. J. Hoofnagle, Flash Cookies and Privacy II: Now with HTML5 and ETag Respawning Available at SSRN: http://ssrn, p.1898390, 2011.
DOI : 10.2139/ssrn.1898390

K. R. Beesley, Language identifier: A computer program for automatic naturallanguage identification of on-line text, Language at Crossroads: Proceedings of the 29th Annual Conference of the American Translators Association, pp.12-16, 1988.

O. Berthold, H. Federrath, and S. Köpsell, Web MIXes: a system for anonymous and unobservable Internet access In: International workshop on Designing privacy enhancing technologies: design issues in anonymity and unobservability, pp.115-129, 2001.

S. Castillo-perez and J. García-alfaro, Evaluation of Two Privacy-Preserving Protocols for the DNS, 2009 Sixth International Conference on Information Technology: New Generations, pp.411-416, 2009.
DOI : 10.1109/ITNG.2009.195

W. B. Cavnar and J. M. Trenkle, N-Gram-Based Text Categorization, Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, pp.161-175, 1994.

C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, vol.1, issue.3, pp.273-297, 1995.
DOI : 10.1007/BF00994018

M. Damashek, Gauging Similarity with n-Grams: Language-Independent Categorization of Text, Science, vol.17, issue.2, pp.843-848, 1995.
DOI : 10.1109/TPAMI.1979.4766902

J. Dean and S. Ghemawat, MapReduce, Communications of the ACM, vol.51, issue.1, pp.107-113, 2008.
DOI : 10.1145/1327452.1327492

R. Dingledine, N. Mathewson, and P. F. Syverson, Tor: The Second?Generation Onion Router, Proceedings of the 13th USENIX Security Symposium, pp.303-320, 2004.

M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann et al., The WEKA data mining software, ACM SIGKDD Explorations Newsletter, vol.11, issue.1, pp.10-18, 2009.
DOI : 10.1145/1656274.1656278

D. Herrmann, C. Gerber, C. Banse, and H. Federrath, Analyzing Characteristic Host Access Patterns for Re-identification of Web User Sessions, Proceedings of the 15th Nordic Conference on Secure IT Systems Lecture Notes in Computer Science, 2010.
DOI : 10.1007/978-3-540-31966-5_25

R. Kohavi, A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection, International Joint Conference on Artificial Intelligence, pp.1137-1143, 1995.

M. Kumpo?t and V. Matyá?, User Profiling and Re-identification: Case of University-Wide Network Analysis, TrustBus '09: Proceedings of the 6th International Conference on Trust, Privacy and Security in Digital Business, pp.1-10, 2009.
DOI : 10.1007/978-3-540-24630-5_53

E. Kushilevitz and R. Ostrovsky, Replication is not needed: single database, computationally-private information retrieval, Proceedings 38th Annual Symposium on Foundations of Computer Science, pp.364-373, 1997.
DOI : 10.1109/SFCS.1997.646125

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.2667

Y. Lu and G. Tsudik, Towards Plugging Privacy Leaks in the Domain Name System, 2010 IEEE Tenth International Conference on Peer-to-Peer Computing (P2P), pp.1-10, 2010.
DOI : 10.1109/P2P.2010.5569976

C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval, 2008.
DOI : 10.1017/CBO9780511809071

B. Padmanabhan and Y. Yang, Clickprints on the Web: Are there signatures in Web Browsing Data? Available at http, 2006.

B. Raghavan, T. Kohno, A. C. Snoeren, and D. Wetherall, Enlisting ISPs to Improve Online Privacy: IP Address Mixing by Default, Proceedings of the 9th International Symposium on Privacy Enhancing Technologies (PETS '09), Lecture Notes in Computer Science, LNCS 5672, pp.143-163, 2009.
DOI : 10.1007/978-3-642-03168-7_9

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.147.9570

K. Rieck and P. Laskov, Language models for detection of unknown attacks in network traffic, Journal in Computer Virology, vol.1, issue.1???2, pp.243-256, 2007.
DOI : 10.1007/s11416-006-0030-0

T. White, Hadoop ? The Definitive Guide: Storage and Analysis at Internet Scale. O'Reilly, 2011.

I. H. Witten and E. Frank, Data Mining. Practical Machine Learning Tools and Techniques, 2005.

Y. Xie, F. Yu, and M. Abadi, De-anonymizing the internet using unreliable IDs, Proceedings of the ACM SIGCOMM 2009 conference on Data communication, pp.75-86, 2009.
DOI : 10.1145/1594977.1592579

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.4356

Y. Xie, F. Yu, K. Achan, E. Gillum, M. Goldszmidt et al., How dynamic are IP addresses?, Proceedings of the 2007 conference on Applications, technologies , architectures, and protocols for computer communications (SIGCOMM '07, pp.301-312, 2007.
DOI : 10.1145/1282380.1282415

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.128.5470

Y. Yang, Web user behavioral profiling for user identification, Decision Support Systems, vol.49, issue.3, pp.261-271, 2010.
DOI : 10.1016/j.dss.2010.03.001

Y. Yang and B. Padmanabhan, Toward user patterns for online security: Observation time and online user identification, Decision Support Systems, vol.48, issue.4, pp.548-558, 2008.
DOI : 10.1016/j.dss.2009.11.005

F. Zhao, Y. Hori, and K. Sakurai, Analysis of Existing Privacy-Preserving Protocols in Domain Name System, IEICE Transactions on Information and Systems, vol.93, issue.5, pp.1031-1043, 2010.
DOI : 10.1587/transinf.E93.D.1031

G. K. Zipf, The psycho-biology of language. An introduction to dynamic philology. M.I, 1968.