D. Balenson, L. Tinnel, and T. Benzel, Cybersecurity Experimentation of the Future (CEF): Catalyzing a new generation of experimental cybersecurity research ? final report, Tech. rep., SRI International and USC Information Sciences Institute, 2015.

M. B. Brewer, Research design and issues of validity. Handbook of research methods in social and personality psychology pp, pp.3-16, 2000.
DOI : 10.1017/cbo9780511996481.005

T. E. Carroll, D. Manz, T. Edgar, and F. L. Greitzer, Realizing scientific methods for cyber security, Proceedings of the 2012 Workshop on Learning from Authoritative Security Experiment Results, LASER '12, pp.19-24, 2012.
DOI : 10.1145/2379616.2379619

J. Cohen, The statistical power of abnormal-social psychological research: A review., The Journal of Abnormal and Social Psychology, vol.65, issue.3, p.145, 1962.
DOI : 10.1037/h0045186

J. Cohen, Statistical power analysis for the behavioral sciences, 1988.

J. Cohen, Things I have learned (so far)., American Psychologist, vol.45, issue.12, p.1304, 1990.
DOI : 10.1037/0003-066X.45.12.1304

J. Cohen, A power primer., Psychological Bulletin, vol.112, issue.1, p.155, 1992.
DOI : 10.1037/0033-2909.112.1.155

T. D. Cook and D. T. Campbell, Quasi-experimentation: Design and analysis for field settings, 1979.

Y. Dodge, Oxford Dictionary of Statistical Terms, 2006.

P. D. Ellis, The essential guide to effect sizes: Statistical power, meta-analysis, and the interpretation of research results, 2010.
DOI : 10.1017/CBO9780511761676

B. Everitt, Cambridge dictionary of statistics, 1998.
DOI : 10.1017/CBO9780511779633

F. Faul, E. Erdfelder, A. G. Lang, and A. Buchner, G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences, Behavior Research Methods, vol.47, issue.2, pp.175-191, 2007.
DOI : 10.2307/1164973

A. Field and G. Hole, How to design and report experiments, Sage, 2003.

R. A. Fisher, Statistical methods for research workers, 1925.

C. O. Fritz, P. E. Morris, and J. J. Richler, Effect size estimates: Current use, calculations, and interpretation., Journal of Experimental Psychology: General, vol.141, issue.1, p.2, 2012.
DOI : 10.1037/a0024338

M. J. Gardner and D. G. Altman, Confidence intervals rather than P values: estimation rather than hypothesis testing., BMJ, vol.292, issue.6522, pp.746-750, 1986.
DOI : 10.1136/bmj.292.6522.746

URL : http://www.bmj.com/content/bmj/292/6522/746.full.pdf

D. C. Howell, Statistical methods for psychology, Cengage Learning, 2012.

J. P. Ioannidis, Why Most Published Research Findings Are False, PLoS Medicine, vol.13, issue.8, p.124, 2005.
DOI : 10.1371/journal.pmed.0020124.t004

E. L. Lehmann, The Fisher, Neyman-Pearson theories of testing hypotheses: One theory or two? In: Selected Works of EL Lehmann, pp.201-208, 2012.

R. Maxion, Making Experiments Dependable, 2011.
DOI : 10.1109/MSP.2004.89

URL : http://www.cs.cmu.edu/afs/cs.cmu.edu/user/maxion/www/pubs/Maxion12.pdf

S. E. Maxwell and H. D. Delaney, Designing experiments and analyzing data: A model comparison perspective, 2004.

S. Miller, Experimental design and statistics, Routledge, 2005.

D. C. Montgomery, Design and analysis of experiments, 2012.

J. Neyman and E. S. Pearson, On the use and interpretation of certain test criteria for purposes of statistical inference: Part i. Biometrika pp, pp.175-240, 1928.

R. S. Nickerson, Null hypothesis significance testing: A review of an old and continuing controversy., Psychological Methods, vol.5, issue.2, p.241, 2000.
DOI : 10.1037/1082-989X.5.2.241

S. Peisert and M. Bishop, How to Design Computer Security Experiments, Fifth World Conference on Information Security Education, pp.141-148, 2007.
DOI : 10.1007/978-0-387-73269-5_19

URL : http://nob.cs.ucdavis.edu/~bishop/papers/2007-wise5-2/exper.pdf

K. Popper, The logic of scientific discovery, Routledge, 2005.

S. Wacholder, S. Chanock, M. Garcia-closas, and N. Rothman, Assessing the Probability That a Positive Report is False: An Approach for Molecular Epidemiology Studies, JNCI Journal of the National Cancer Institute, vol.96, issue.6, pp.434-442, 2004.
DOI : 10.1093/jnci/djh075