P. J. Nero, B. Wardman, H. Copes, and G. Warner, Phishing: Crime that pays, 2011 eCrime Researchers Summit, pp.1-10, 2011.
DOI : 10.1109/eCrime.2011.6151979

K. W. Church and P. Hanks, Word association norms, mutual information, and lexicography, Proceedings of the 27th annual meeting on Association for Computational Linguistics -, pp.22-29, 1990.
DOI : 10.3115/981623.981633

URL : http://acl.ldc.upenn.edu/J/J90/J90-1003.pdf

R. L. Cilibrasi and P. M. Vitanyi, The Google similarity distance Knowledge and Data Engineering, IEEE Transactions on, vol.19, issue.3, pp.370-383, 2007.

P. Kolb, Disco: A multilingual database of distributionally similar words, Proceedings of KONVENS-2008, 2008.

G. A. Miller, WordNet: a lexical database for English, Communications of the ACM, vol.38, issue.11, pp.39-41, 1995.
DOI : 10.1145/219717.219748

S. Marchal, J. François, R. State, and T. Engel, PhishScore: Hacking phishers' minds, 10th International Conference on Network and Service Management (CNSM) and Workshop, p.2014, 2014.
DOI : 10.1109/CNSM.2014.7014140

URL : https://hal.archives-ouvertes.fr/hal-01094238

P. Mockapetris, RFC 1034: Domain Names -Concepts and Facilities, 1987.
DOI : 10.17487/rfc1034

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.420.1013

P. Mockapetris and K. Dunlap, Development of the domain name system, Proceedings of the 1988 ACM SIGCOMM, 1988.

S. Garera, N. Provos, M. Chew, and A. D. Rubin, A framework for detection and measurement of phishing attacks, Proceedings of the 2007 ACM workshop on Recurring malcode, WORM '07, pp.1-8, 2007.
DOI : 10.1145/1314389.1314391

T. Segaran and J. Hammerbacher, Beautiful data: the stories behind elegant data solutions, 2009.

S. Marchal, J. François, R. State, and T. Engel, Proactive Discovery of Phishing Related Domain Names, Research in Attacks, Intrusions, and Defenses, pp.190-209, 2012.
DOI : 10.1007/978-3-642-33338-5_10

URL : https://hal.archives-ouvertes.fr/hal-00748808

T. K. Landauer and S. T. Dumais, A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge., Psychological Review, vol.104, issue.2, p.211, 1997.
DOI : 10.1037/0033-295X.104.2.211

B. H. Bloom, Space/time trade-offs in hash coding with allowable errors, Communications of the ACM, vol.13, issue.7, pp.422-426, 1970.
DOI : 10.1145/362686.362692

L. Michael, W. Nejdl, O. Papapetrou, and W. Siberski, Improving distributed join efficiency with extended bloom filter operations, 21st International Conference on Advanced Networking and Applications (AINA '07), pp.187-194, 2007.
DOI : 10.1109/AINA.2007.80

M. Khonji, Y. Iraqi, and A. Jones, Lexical url analysis for discriminating phishing and legitimate e-mail messages, 2011 International Conference for Internet Technology and Secured Transactions (ICITST), pp.422-427, 2011.
DOI : 10.1145/2030376.2030389

J. Ma, L. K. Saul, S. Savage, and G. M. Voelker, Identifying suspicious URLs, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.681-688, 2009.
DOI : 10.1145/1553374.1553462

A. Le, A. Markopoulou, and M. Faloutsos, PhishDef: URL names say it all, 2011 Proceedings IEEE INFOCOM, pp.191-195, 2011.
DOI : 10.1109/INFCOM.2011.5934995

URL : http://arxiv.org/abs/1009.2275

P. Prakash, M. Kumar, R. Kompella, and M. Gupta, PhishNet: Predictive Blacklisting to Detect Phishing Attacks, 2010 Proceedings IEEE INFOCOM, pp.1-5, 2010.
DOI : 10.1109/INFCOM.2010.5462216

G. Forman and M. Scholz, Apples-to-apples in cross-validation studies, ACM SIGKDD Explorations Newsletter, vol.12, issue.1, pp.49-57, 2010.
DOI : 10.1145/1882471.1882479

M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann et al., The WEKA data mining software, ACM SIGKDD Explorations Newsletter, vol.11, issue.1, pp.10-18, 2009.
DOI : 10.1145/1656274.1656278

L. Breiman, Random forests, Machine Learning, vol.45, issue.1, pp.5-32, 2001.
DOI : 10.1023/A:1010933404324

S. Yadav, A. K. Reddy, A. N. Reddy, and S. Ranjan, Detecting algorithmically generated malicious domain names, Proceedings of the 10th annual conference on Internet measurement, IMC '10, 2010.
DOI : 10.1145/1879141.1879148

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.221.1167

T. Chen, S. Dick, and J. Miller, Detecting visually similar Web pages, ACM Transactions on Internet Technology, vol.10, issue.2, 2010.
DOI : 10.1145/1754393.1754394

E. Medvet, E. Kirda, and C. Kruegel, Visual-similarity-based phishing detection, Proceedings of the 4th international conference on Security and privacy in communication netowrks, SecureComm '08, p.22, 2008.
DOI : 10.1145/1460877.1460905

G. Xiang and J. I. Hong, A hybrid phish detection approach by identity discovery and keywords retrieval, Proceedings of the 18th international conference on World wide web, WWW '09, pp.571-580, 2009.
DOI : 10.1145/1526709.1526786

Y. Zhang, J. I. Hong, and L. F. Cranor, Cantina, Proceedings of the 16th international conference on World Wide Web , WWW '07, pp.639-648, 2007.
DOI : 10.1145/1242572.1242659

T. Chen, T. Stepan, S. Dick, and J. Miller, An Anti-Phishing System Employing Diffused Information, ACM Transactions on Information and System Security, vol.16, issue.4, 2014.
DOI : 10.1145/2584680

M. Antonakakis, R. Perdisci, D. Dagon, W. Lee, and N. Feamster, Building a dynamic reputation system for DNS, 19th Usenix Security Symposium, 2010.

L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi, Exposure, Proceedings of NDSS, 2011.
DOI : 10.1145/2584679

J. Ma, L. K. Saul, S. Savage, and G. M. Voelker, Beyond blacklists, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '09, 2009.
DOI : 10.1145/1557019.1557153

J. Zhang, P. Porras, and J. Ullrich, Highly predictive blacklisting, Proceedings of the 17th conference on Security symposium. USENIX Association, 2008.

P. Barraclough, G. Sexton, M. Hossain, and N. Aslam, Intelligent phishing detection parameter framework for E-banking transactions based on Neuro-fuzzy, 2014 Science and Information Conference, pp.545-555, 2014.
DOI : 10.1109/SAI.2014.6918240

J. Zhang, Q. Li, Q. Wang, T. Geng, X. Ouyang et al., Parsing and detecting phishing pages based on semantic understanding of text, Journal of Information & Computational Science, issue.9, pp.1521-1534, 2012.

V. Ramanathan and H. Wechsler, phishGILLNET???phishing detection methodology using probabilistic latent semantic analysis, AdaBoost, and co-training, EURASIP Journal on Information Security, vol.2012, issue.1, pp.1-22, 2012.
DOI : 10.1023/A:1010933404324

L. Cao, T. Probst, and R. Kompella, PhishLive: A View of Phishing and Malware Attacks from an Edge Router, Proceedings of the 14th International Conference on Passive and Active Measurement -PAM
DOI : 10.1007/978-3-642-36516-4_24

B. Braun, M. Johns, J. Koestler, and J. Posegga, PhishSafe, Proceedings of the 4th ACM conference on Data and application security and privacy, CODASPY '14, 2014.
DOI : 10.1145/2557547.2557553

N. Spirin, J. Han, and ]. Rech, Survey on web spam detection, SIGKDD Explor. Newsl, pp.1-2, 2007.
DOI : 10.1145/2207243.2207252

J. Ginsberg, M. H. Mohebbi, R. S. Patel, L. Brammer, M. S. Smolinski et al., Detecting influenza epidemics using search engine query data, Nature, vol.36, issue.7232, pp.1012-1014, 2008.
DOI : 10.1038/nature07634

H. Liu, J. He, Y. Gu, H. Xiong, and X. Du, Detecting and Tracking Topics and Events from Web Search Logs, ACM Transactions on Information Systems, vol.30, issue.4, pp.1-2129, 2012.
DOI : 10.1145/2382438.2382440

S. Marchal, J. François, R. State, and T. Engel, Semantic based DNS forensics, Proceedings of the 2012 IEEE International Workshop on Information Forensics and Security (WIFS, pp.91-96, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00784965

S. Marchal, . His, M. Degree, and . Sc, degree in computer science from TELE- COM Nancy, a French leading school in computer science He is currently pursuing a joint Ph.D. degree at the Interdisciplinary Centre for Security , Reliability and Trust His interests lie in web security, network security and intrusion detection techniques, 2011.