R. Abelson, Statistics as principled argument, 1995.

G. Anderson, No result is worthless: the value of negative results in science, 2012.

T. Baguley, Standardized or simple effect size: What should be reported?, British Journal of Psychology, vol.54, issue.3, pp.603-617, 2009.
DOI : 10.1348/000712608X377117

T. Baguley, Calculating and graphing within-subject confidence intervals for ANOVA, Behavior Research Methods, vol.39, issue.1, pp.158-175, 2012.
DOI : 10.3758/s13428-011-0123-7

J. Barry, Doing Bayesian Data Analysis: A Tutorial with R and BUGS, Europe???s Journal of Psychology, vol.7, issue.4, pp.778-779, 2011.
DOI : 10.5964/ejop.v7i4.163

M. Beaudouin-lafon, Interaction is the future of computing, HCI Remixed: Reflections on Works That Have Influenced the HCI Community, pp.263-266, 2008.

R. Bender and S. Lange, Adjusting for multiple testing???when and how?, Journal of Clinical Epidemiology, vol.54, issue.4, pp.343-349, 2001.
DOI : 10.1016/S0895-4356(00)00314-0

R. Beyth-marom, F. Fidler, and G. Cumming, Statistical cognition: Towards evidence-based practice in statistics and statistics education, Statistics Education Research Journal, vol.7, issue.2, pp.20-39, 2008.

M. Brewer, Research design and issues of validity. Handbook of research methods in social and personality psychology pp, pp.3-16, 2000.

A. Brodeur, M. Lé, M. Sangnier, and Y. Zylberberg, Star wars: The empirics strike back, Paris School of Economics Working Paper, pp.2012-2041, 2012.
URL : https://hal.archives-ouvertes.fr/halshs-01158500

J. Carifio and R. Perla, Ten Common Misunderstandings, Misconceptions, Persistent Myths and Urban Legends about Likert Scales and Likert Response Formats and their Antidotes, Journal of Social Sciences, vol.3, issue.3, p.106, 2007.
DOI : 10.3844/jssp.2007.106.116

F. Chevalier, P. Dragicevic, and S. Franconeri, The not-so-staggering effect of staggered animated transitions on visual tracking. Visualization and Computer Graphics, IEEE Transactions on, vol.20, issue.12, pp.2241-2250, 2014.

R. Coe, It's the effect size, stupid, Paper presented at the British Educational Research Association annual conference, p.14, 2002.

J. Cohen, Things I have learned (so far)., American Psychologist, vol.45, issue.12, p.1304, 1990.
DOI : 10.1037/0003-066X.45.12.1304

J. Cohen, The earth is round (p???<???.05)., American Psychologist, vol.49, issue.12, p.997, 1994.
DOI : 10.1037/0003-066X.49.12.997

M. Correll and M. Gleicher, Error bars considered harmful: Exploring alternate encodings for mean and error. Visualization and Computer Graphics, IEEE Transactions on, vol.20, issue.12, pp.2142-2151, 2014.

G. Cumming, Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better, Perspectives on Psychological Science, vol.54, issue.4, pp.286-300, 2008.
DOI : 10.1037/1082-989X.1.2.115

G. Cumming, Dance of the p values, 2009.

G. Cumming, Inference by eye: Reading the overlap of independent confidence intervals, Statistics in Medicine, vol.18, issue.1, pp.205-220, 2009.
DOI : 10.1111/j.1467-9280.2007.01881.x

G. Cumming, Understanding the new statistics : effect sizes, confidence intervals, and metaanalysis . Multivariate applications series, 2012.

G. Cumming, The New Statistics, Psychological Science, vol.123, issue.1, 2013.
DOI : 10.1037/0003-066X.54.8.594

G. Cumming and S. Finch, Inference by Eye: Confidence Intervals and How to Read Pictures of Data., American Psychologist, vol.60, issue.2, p.170, 2005.
DOI : 10.1037/0003-066X.60.2.170

G. Cumming and R. Williams, Significant does not equal important: why we need the new statistics [podcast]. ABC Ockham's Razor tinyurl, 2011.

G. Cumming, F. Fidler, and D. Vaux, Error bars in experimental biology, The Journal of Cell Biology, vol.428, issue.1, pp.7-11, 2007.
DOI : 10.7326/0003-4819-126-1-199701010-00006

R. Dawkins, The tyranny of the discontinuous mind, New Statesman, vol.19, pp.54-57, 2011.

K. Van-deemter, Not Exactly: in Praise of Vagueness, 2010.

Z. Dienes, Using Bayes to get the most out of non-significant results, Frontiers in Psychology, vol.14, issue.e54693, 2014.
DOI : 10.1037/a0016972

A. Downey, Think Bayes, 2013.

P. Dragicevic, My technique is 20% faster: Problems with reports of speed improvements in hci, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00739237

P. Dragicevic, C. F. Huot, and S. , Running an HCI experiment in multiple parallel universes, Proceedings of the extended abstracts of the 32nd annual ACM conference on Human factors in computing systems, CHI EA '14, pp.607-618, 2014.
DOI : 10.1145/2559206.2578881

URL : https://hal.archives-ouvertes.fr/hal-00976507

G. Drummond and S. Vowler, Show the data, don't conceal them Advances in physiology education, pp.130-132, 2011.

W. Duckworth and W. Stephenson, Resampling methods: Not just for statisticians anymore, Joint Statistical Meetings, 2003.

A. Ecklund, Beeswarm: the bee swarm plot, an alternative to stripchart, p.5, 2012.

E. Eich, Business Not as Usual, Psychological Science, vol.23, issue.1, pp.3-6, 2014.
DOI : 10.1177/0956797611417632

J. Fekete, V. Wijk, J. Stasko, J. North, and C. , The Value of Information Visualization, In: Information visualization, pp.1-18, 2008.
DOI : 10.1007/978-3-540-70956-5_1

URL : https://hal.archives-ouvertes.fr/hal-00701741

F. Fidler and G. Cumming, Teaching confidence intervals: Problems and potential solutions, Proceedings of the 55th International Statistics Institute Session, 2005.

F. Fidler and G. Loftus, Why figures with error bars should replace p values. Zeitschrift für Psychologie, Journal of Psychology, vol.217, issue.1, pp.27-37, 2009.

F. Fidler, The american psychological association publication manual sixth edition: Implications for statistics education. Data and context in statistics education: Towards an evidence based society, 2010.

R. Fisher, Statistical methods and scientific induction, Journal of the Royal Statistical Society Series BMethodological), pp.69-78, 1955.

V. Franz and G. Loftus, Standard errors and confidence intervals in within-subjects designs: Generalizing Loftus and Masson (1994) and avoiding the biases of alternative accounts, Psychonomic Bulletin & Review, vol.6, issue.3, pp.395-404, 2012.
DOI : 10.3758/s13423-012-0230-1

R. Frick, Interpreting statistical testing: Process and propensity, not population and random sampling, Behavior Research Methods, Instruments, & Computers, vol.6, issue.3, pp.527-535, 1998.
DOI : 10.3758/BF03200686

M. Gardner and D. Altman, Confidence intervals rather than P values: estimation rather than hypothesis testing., BMJ, vol.292, issue.6522, pp.746-750, 1986.
DOI : 10.1136/bmj.292.6522.746

URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1339793

A. Gelman, Commentary, Human Development, vol.35, issue.5, pp.69-72, 2013.
DOI : 10.1159/000277221

A. Gelman, Interrogating <mml:math altimg="si1.gif" display="inline" overflow="scroll" xmlns:xocs="http://www.elsevier.com/xml/xocs/dtd" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.elsevier.com/xml/ja/dtd" xmlns:ja="http://www.elsevier.com/xml/ja/dtd" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:tb="http://www.elsevier.com/xml/common/table/dtd" xmlns:sb="http://www.elsevier.com/xml/common/struct-bib/dtd" xmlns:ce="http://www.elsevier.com/xml/common/dtd" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:cals="http://www.elsevier.com/xml/common/cals/dtd" xmlns:sa="http://www.elsevier.com/xml/common/struct-aff/dtd"><mml:mi>p</mml:mi></mml:math>-values, Journal of Mathematical Psychology, vol.57, issue.5, pp.188-189, 2013.
DOI : 10.1016/j.jmp.2013.03.005

A. Gelman and J. Hill, Data analysis using regression and multilevel/hierarchical models, 2006.
DOI : 10.1017/CBO9780511790942

A. Gelman and H. Stern, The Difference Between ???Significant??? and ???Not Significant??? is not Itself Statistically Significant, The American Statistician, vol.60, issue.4, pp.328-331, 2006.
DOI : 10.1198/000313006X152649

G. Gigerenzer, Mindless statistics, The Journal of Socio-Economics, vol.33, issue.5, pp.587-606, 2004.
DOI : 10.1016/j.socec.2004.09.033

R. Giner-sorolla, Science or Art? How Aesthetic Standards Grease the Way Through the Publication Bottleneck but Undermine Science, Perspectives on Psychological Science, vol.19, issue.6, pp.562-571, 2012.
DOI : 10.1037/a0022790

J. Gliner, N. Leech, and G. Morgan, Problems With Null Hypothesis Significance Testing (NHST): What Do the Textbooks Say?, The Journal of Experimental Education, vol.26, issue.1, pp.83-92, 2002.
DOI : 10.1037/0003-066X.54.8.594

B. Goldacre, What doctors don't know about the drugs they prescribe, 2012.

S. Goodman, Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy, Annals of Internal Medicine, vol.130, issue.12, pp.995-1004, 1999.
DOI : 10.7326/0003-4819-130-12-199906150-00008

W. Hager, The examination of psychological hypotheses by planned contrasts referring to twofactor interactions in fixed-effects ANOVA, Method Psychol Res Online, vol.7, pp.49-77, 2002.

H. Haller and S. Krauss, Misinterpretations of significance: A problem students share with their teachers, Methods of Psychological Research, vol.7, issue.1, pp.1-20, 2002.

H. Hofmann, L. Follett, M. Majumder, and D. Cook, Graphical tests for power comparison of competing designs. Visualization and Computer Graphics, IEEE Transactions on, vol.18, issue.12, pp.2441-2448, 2012.

K. Hornbaek, S. Sander, J. Bargas-avila, G. Simonsen, and J. , Is once enough?, Proceedings of the 32nd annual ACM conference on Human factors in computing systems, CHI '14, pp.3523-3532, 2014.
DOI : 10.1145/2556288.2557004

Y. Jansen, Physical and tangible information visualization, 2014.
URL : https://hal.archives-ouvertes.fr/tel-00981521

M. Kaptein and J. Robertson, Rethinking statistical analysis methods for CHI, Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems, CHI '12, pp.1105-1114, 2012.
DOI : 10.1145/2207676.2208557

O. Keene, The log transformation is special, Statistics in Medicine, vol.28, issue.8, pp.811-819, 1995.
DOI : 10.1002/sim.4780140810

N. Kerr, HARKing: Hypothesizing After the Results are Known, Personality and Social Psychology Review, vol.18, issue.3, pp.196-217, 1998.
DOI : 10.1207/s15327957pspr0203_4

G. Kindlmann and C. Scheidegger, An algebraic process for visualization design. Visualization and Computer Graphics, IEEE Transactions on, vol.20, issue.12, pp.2181-2190, 2014.

K. Kirby and D. Gerlanc, BootES: An R package for bootstrap confidence intervals on effect sizes, Behavior Research Methods, vol.54, issue.4, pp.905-927, 2013.
DOI : 10.3758/s13428-013-0330-5

R. Kirk, Promoting Good Statistical Practices: Some Suggestions, Educational and Psychological Measurement, vol.46, issue.2, pp.213-218, 2001.
DOI : 10.1177/00131640121971185

R. Kline, What's Wrong With Statistical Tests--And Where We Go From Here., 2004.
DOI : 10.1037/10693-003

J. Lai, P. Kalinowski, F. Fidler, and G. Cumming, Dichotomous thinking: A problem beyond NHST. Data and context in statistics education: Towards an evidence based society, 2010.

D. Lakens, M. Pigliucci, and J. Galef, Daniel lakens on p-hacking and other problems in psychology research [podcast]. Rationally Speaking Podcast tinyurl, 2014.

S. Lazic, The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis?, BMC Neuroscience, vol.11, issue.1, p.5, 2010.
DOI : 10.1186/1471-2202-11-5

M. Lee, The " new statistics " are built on fundamentally flawed foundations, 2014.

T. Levine, R. Weber, C. Hullett, H. Park, and L. Lindsey, A Critical Assessment of Null Hypothesis Significance Testing in Quantitative Communication Research, Human Communication Research, vol.7, issue.2, pp.171-187, 2008.
DOI : 10.2307/2280090

T. Levine, R. Weber, H. Park, and C. Hullett, A Communication Researchers??? Guide to Null Hypothesis Significance Testing and Alternatives, Human Communication Research, vol.45, issue.2, pp.188-209, 2008.
DOI : 10.1037/0033-295X.110.3.526

R. Maccallum, S. Zhang, K. Preacher, and D. Rucker, On the practice of dichotomization of quantitative variables., Psychological Methods, vol.7, issue.1, p.19, 2002.
DOI : 10.1037/1082-989X.7.1.19

N. Mazar, O. Amir, and D. Ariely, The Dishonesty of Honest People: A Theory of Self-Concept Maintenance, Journal of Marketing Research, vol.45, issue.6, pp.633-644, 2008.
DOI : 10.1509/jmkr.45.6.633

P. Meehl, Theory-Testing in Psychology and Physics: A Methodological Paradox, Philosophy of Science, vol.34, issue.2, pp.103-115, 1967.
DOI : 10.1086/288135

J. Miller, Short report: Reaction time analysis with outlier exclusion: Bias varies with sample size, The Quarterly Journal of Experimental Psychology Section A, vol.43, issue.4, pp.907-912, 1991.
DOI : 10.1037/0033-295X.83.3.190

R. Newcombe, Interval estimation for the difference between independent proportions: comparison of eleven methods, Statistics in Medicine, vol.17, issue.8, pp.873-890, 1998.
DOI : 10.1002/(SICI)1097-0258(19980430)17:8<873::AID-SIM779>3.0.CO;2-I

R. Newcombe, Two-sided confidence intervals for the single proportion: comparison of seven methods, Statistics in Medicine, vol.17, issue.8, pp.857-872, 1998.
DOI : 10.1002/(SICI)1097-0258(19980430)17:8<857::AID-SIM777>3.0.CO;2-E

G. Newman and B. Scholl, Bar graphs depicting averages are perceptually misinterpreted: The within-the-bar bias, Psychonomic Bulletin & Review, vol.27, issue.4, pp.601-607, 2012.
DOI : 10.3758/s13423-012-0247-5

D. Norman, The Design of Everyday Things, 2002.
DOI : 10.15358/9783800648108

G. Norman, Likert scales, levels of measurement and the " laws " of statistics Advances in health sciences education, pp.625-632, 2010.

R. Nuzzo, Scientific method: Statistical errors, Nature, vol.506, issue.7487, pp.150-152, 2014.
DOI : 10.1038/506150a

J. Osborne and A. Overbay, The power of outliers (and why researchers should always check for them) Practical assessment, research & evaluation, vol.9, issue.6, pp.1-12, 2004.

C. Perin, P. Dragicevic, and J. Fekete, Revisiting Bertin matrices: New interactions for crafting tabular visualizations. Visualization and Computer Graphics, IEEE Transactions on, vol.20, issue.12, pp.2082-2091, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01023890

T. Perneger, What's wrong with Bonferroni adjustments, BMJ, vol.316, issue.7139, pp.1236-1238, 1998.
DOI : 10.1136/bmj.316.7139.1236

P. Pollard and J. Richardson, On the probability of making Type I errors., Psychological Bulletin, vol.102, issue.1, p.159, 1987.
DOI : 10.1037/0033-2909.102.1.159

R. Rawls, BREAKING UP IS HARD TO DO, Chemical & Engineering News, vol.76, issue.25, pp.29-34, 1998.
DOI : 10.1021/cen-v076n025.p029

U. Reips and F. Funke, Interval-level measurement with visual analogue scales in Internet-based research: VAS Generator, Behavior Research Methods, vol.1, issue.3, pp.699-704, 2008.
DOI : 10.3758/BRM.40.3.699

R. Rensink, On the Prospects for a Science of Visualization, pp.147-175, 2014.
DOI : 10.1007/978-1-4614-7485-2_6

C. Ricketts and J. Berry, Teaching Statistics through Resampling, Teaching Statistics, vol.4, issue.2, pp.41-44, 1994.
DOI : 10.1111/j.1467-9639.1994.tb00685.x

R. Rosenthal, Artifacts in Behavioral Research: Robert Rosenthal and Ralph L. Rosnow's Classic Books, 2009.
DOI : 10.1093/acprof:oso/9780195385540.001.0001

R. Rosenthal and K. Fode, The effect of experimenter bias on the performance of the albino rat, Behavioral Science, vol.128, issue.3, pp.183-189, 1963.
DOI : 10.1002/bs.3830080302

R. Rosnow and R. Rosenthal, Statistical procedures and the justification of knowledge in psychological science., American Psychologist, vol.44, issue.10, p.1276, 1989.
DOI : 10.1037/10109-027

J. Rossi, Statistical power of psychological research: What have we gained in 20 years?, Journal of Consulting and Clinical Psychology, vol.58, issue.5, p.646, 1990.
DOI : 10.1037/0022-006X.58.5.646

J. Sauro and J. Lewis, Average task times in usability tests, Proceedings of the 28th international conference on Human factors in computing systems, CHI '10, pp.2347-2350, 2010.
DOI : 10.1145/1753326.1753679

F. Schmidt and J. Hunter, Eight common but false objections to the discontinuation of significance testing in the analysis of research data. What if there were no significance tests pp, pp.37-64, 1997.

J. Simmons, L. Nelson, and U. Simonsohn, False-Positive Psychology, Psychological Science, vol.47, issue.11, pp.1359-1366, 2011.
DOI : 10.1093/biomet/64.2.191

R. Smith, T. Levine, K. Lachlan, and T. Fediuk, The High Cost of Complexity in Experimental Design and Data Analysis: Type I and Type II Error Rates in Multiway ANOVA, Human Communication Research, vol.5, issue.4, pp.515-530, 2002.
DOI : 10.1037//0003-066X.54.8.594

A. Stewart-oaten, Rules and Judgments in Statistics: Three Examples, Ecology, vol.76, issue.6, 1995.
DOI : 10.2307/1940736

B. Thompson, Statistical significance and effect size reporting: Portrait of a possible future, Research in the Schools, vol.5, issue.2, pp.33-38, 1998.

B. Thompson, Statistical Significance Tests, Effect Size Reporting and the Vain Pursuit of Pseudo-Objectivity, Theory & Psychology, vol.9, issue.2, pp.191-196, 1999.
DOI : 10.1177/095935439992007

D. Trafimow and M. Marks, Editorial, Basic and Applied Social Psychology, vol.37, issue.1, pp.1-2, 2015.
DOI : 10.1080/01973533.2014.865505

W. Tryon, Evaluating statistical difference, equivalence, and indeterminacy using inferential confidence intervals: An integrated alternative method of conducting null hypothesis statistical tests., Psychological Methods, vol.6, issue.4, p.371, 2001.
DOI : 10.1037/1082-989X.6.4.371

J. Tukey, We need both exploratory and confirmatory, The American Statistician, vol.34, issue.1, pp.23-25, 1980.

R. Ulrich and J. Miller, Effects of truncation on reaction time analysis., Journal of Experimental Psychology: General, vol.123, issue.1, p.34, 1994.
DOI : 10.1037/0096-3445.123.1.34

P. Velleman and L. Wilkinson, Nominal, ordinal, interval, and ratio typologies are misleading, The American Statistician, vol.47, issue.1, pp.65-72, 1993.

K. Vicente and G. Torenvliet, The Earth is spherical (p < 0.05): alternative methods of statistical inference, Theoretical Issues in Ergonomics Science, vol.1, issue.3, pp.248-271, 2000.
DOI : 10.1080/14639220110037065

B. Victor, Explorable explanations. Online, URL http, 2011.

H. Wainer, How to display data badly, The American Statistician, vol.38, issue.2, pp.137-147, 1984.

H. Wickham and L. Stryjewski, 40 years of boxplots Am Statistician Wierdsma A (2013) What is wrong with tests of normality? Online, URL http, 2011.

R. Wilcox, How many discoveries have been lost by ignoring modern statistical methods?, American Psychologist, vol.53, issue.3, p.300, 1998.
DOI : 10.1037/0003-066X.53.3.300

L. Wilkinson, Statistical methods in psychology journals: Guidelines and explanations., American Psychologist, vol.54, issue.8, p.594, 1999.
DOI : 10.1037/0003-066X.54.8.594

W. Willett, J. B. Isenberg, T. Dragicevic, and P. , Lightweight Relief Shearing for Enhanced Terrain Perception on Interactive Maps, Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, CHI '15, pp.15-3563, 2015.
DOI : 10.1145/2702123.2702172

URL : https://hal.archives-ouvertes.fr/hal-01105179

W. Wilson, A note on the inconsistency inherent in the necessity to perform multiple comparisons., Psychological Bulletin, vol.59, issue.4, p.296, 1962.
DOI : 10.1037/h0040447

M. Wood, Statistical inference using bootstrap confidence intervals, Significance, vol.1, issue.4, pp.180-182, 2004.
DOI : 10.1111/j.1740-9713.2004.00067.x

M. Wood, Bootstrapped Confidence Intervals as an Approach to Statistical Inference, Organizational Research Methods, vol.3, issue.2, pp.454-470, 2005.
DOI : 10.1177/1094428105280059

J. Zacks and B. Tversky, Bars and lines: A study of graphic communication, Memory & Cognition, vol.118, issue.6, pp.1073-1079, 1999.
DOI : 10.3758/BF03201236

S. Ziliak and D. Mccloskey, The cult of statistical significance, 2008.

M. Levels-of-significance and .. , 12 3.2.5 Issues Regarding Publication Bias, p.13