D. Akhawe and A. P. Felt, Alice in warningland: A large-scale field study of browser security warning effectiveness, USENIX security symposium, vol.13, 2013.

, Publication manual. American Psychological Association, 2009.

M. B. Brewer, Research design and issues of validity. Handbook of research methods in social and personality psychology, pp.3-16, 2000.

E. Bursztein, S. Bethard, C. Fabry, J. C. Mitchell, and D. Jurafsky, How good are humans at solving captchas? a large scale evaluation, Security and Privacy (SP), 2010 IEEE Symposium on, pp.399-413, 2010.

I. Cherapau, I. Muslukhov, N. Asanka, and K. Beznosov, On the impact of touch id on iphone passcodes, SOUPS, pp.257-276, 2015.

J. Cohen, A power primer, Psychological bulletin, vol.112, issue.1, p.155, 1992.

O. S. Collaboration, Estimating the reproducibility of psychological science, Science, vol.349, issue.6251, p.4716, 2015.

K. P. Coopamootoo and T. Groß, Evidence-based methods for privacy and identity management, Privacy and Identity Management. Facing up to Next Steps, pp.105-121, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01629157

K. P. Coopamootoo and T. Groß, A codebook for experimental research: The nifty nine indicators v1.0, 2017.

K. P. Coopamootoo and T. Groß, An empirical investigation of security fatigue-the case of password choice after solving a captcha, The LASER Workshop: Learning from Authoritative Security Experiment Results (LASER 2017). USENIX Association, 2017.

G. Cumming, The new statistics: Why and how, Psychological science, vol.25, issue.1, pp.7-29, 2014.

G. Cumming and R. Calin-jageman, Introduction to the new statistics: Estimation, open science, and beyond. Routledge, 2016.

B. Everitt, Cambridge dictionary of statistics, 1998.

, Guidelines for performing systematic literature reviews in software engineering, Evidence-Based Software Engineering (EBSE), 2007.

A. Field, Discovering statistics using IBM SPSS statistics, Sage, 2013.

A. Field and G. Hole, How to design and report experiments, Sage, 2003.

T. Fordyce, S. Green, and T. Groß, Investigation of the effect of fear and stress on password choice, proceedings of the 7th ACM Workshop on Socio-Technical Aspects in Security and Trust, 2017.

C. O. Fritz, P. E. Morris, and J. J. Richler, Effect size estimates: current use, calculations, and interpretation, Journal of Experimental Psychology: General, vol.141, issue.1, 2012.

M. J. Gardner and D. G. Altman, Confidence intervals rather than p values: estimation rather than hypothesis testing, Br Med J (Clin Res Ed, vol.292, issue.6522, pp.746-750, 1986.

J. Gideon, L. Cranor, S. Egelman, and A. Acquisti, Power strips, prophylactics, and privacy, oh my!, Proceedings of the second symposium on Usable privacy and security, pp.133-144, 2006.

S. Goodman, A dirty dozen: twelve p-value misconceptions, Seminars in hematology, vol.45, pp.135-140, 2008.

T. Groß, Analysis report-investigation of the effect of fear and stress on password choice, 2017.

T. Groß, K. Coopamootoo, and A. Al-jabri, Effect of cognitive depletion on password choice, The LASER Workshop: Learning from Authoritative Security Experiment Results (LASER 2016), pp.55-66, 2016.

S. G. Hart and L. E. Staveland, Development of nasa-tlx (task load index): Results of empirical and theoretical research, Advances in psychology, vol.52, pp.139-183, 1988.

J. P. Ioannidis, Why most published research findings are false, PLoS Med, vol.2, issue.8, p.124, 2005.

R. E. Kirk, The importance of effect magnitude. Handbook of research methods in experimental psychology, pp.83-105, 2003.

K. A. Kluever and R. Zanibbi, Balancing usability and security in a video captcha, Proceedings of the 5th Symposium on Usable Privacy and Security, p.14, 2009.

S. Korff and R. Böhme, Too much choice: End-user privacy decisions in the context of choice proliferation, Symposium on Usable Privacy and Security (SOUPS), pp.69-87, 2014.

D. Lakens, Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and anovas, Frontiers in psychology, vol.4, 2013.

R. Maxion, Making experiments dependable. Dependable and Historic Computing, pp.344-357, 2011.

S. E. Maxwell and H. D. Delaney, Designing experiments and analyzing data: A model comparison perspective, vol.1, 2004.

S. Miller, Experimental design and statistics. Routledge, 2005.

D. C. Montgomery, Design and analysis of experiments, 2012.

R. Moonesinghe, M. J. Khoury, and A. C. Janssens, Most published research findings are false-but a little replication goes a long way, PLoS Med, vol.4, issue.2, p.28, 2007.

R. D. Morey, R. Hoekstra, J. N. Rouder, M. D. Lee, and E. Wagenmakers, The fallacy of placing confidence in confidence intervals, Psychonomic bulletin & review, vol.23, issue.1, pp.103-123, 2016.

R. S. Nickerson, Null hypothesis significance testing: a review of an old and continuing controversy, Psychological methods, vol.5, issue.2, p.241, 2000.

U. Nwadike, T. Groß, and K. P. Coopamootoo, Evaluating users' affect states: Towards a study on privacy concerns, Privacy and Identity Management. Facing up to Next Steps, pp.248-262, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01629168

S. Peisert and M. Bishop, How to design computer security experiments, Fifth World Conference on Information Security Education, pp.141-148, 2007.

K. Popper, The logic of scientific discovery. Routledge, 2005.

J. Rottenberg, R. Ray, and J. Gross, Emotion elicitation using films. handbook of emotion elicitation and assessment. edited by: Coan ja, allen jjb, 2007.

G. K. Sandve, A. Nekrutenko, J. Taylor, and E. Hovig, Ten simple rules for reproducible computational research, PLoS computational biology, vol.9, issue.10, p.1003285, 2013.

S. Schechter, Common pitfalls in writing about security and privacy human subjects experiments, and how to avoid them, 2013.

C. D. Spielberger, R. L. Gorsuch, and R. E. Lushene, Manual for the state-trait anxiety inventory, 1970.

L. Statistics, Testing for normality

V. Stodden, F. Leisch, and R. D. Peng, Implementing reproducible research, 2014.

D. Watson and L. A. Clark, The panas-x: Manual for the positive and negative affect schedule-expanded form, 1999.

R. Westermann, G. Stahl, and F. Hesse, Relative effectiveness and validity of mood induction procedures: analysis, European Journal of social psychology, vol.26, pp.557-580, 1996.

Y. Xie, knitr: a comprehensive tool for reproducible research in r, Implement Reprod Res, vol.1, p.20, 2014.