R. Abelson, Statistics as principled argument Lawrence Erlbaum Associates Abelson RP (1997) A retrospective on the significance test ban of 1999. What if there were no significance tests pp, pp.117-141, 1995.

G. Anderson, No result is worthless: the value of negative results in science. Online, URL http://tinyurl.com/anderson-negative APA (2010) The Publication manual of the APA The non-parametric bootstrap as a Bayesian model, DC Bååth R, 2012.

T. Baguley, Standardized or simple effect size: What should be reported?, British Journal of Psychology, vol.54, issue.3, pp.603-617, 2009.
DOI : 10.1348/000712608X377117

T. Baguley, Calculating and graphing within-subject confidence intervals for ANOVA, Behavior Research Methods, vol.39, issue.1, pp.158-175, 2012.
DOI : 10.3758/s13428-011-0123-7

URL : http://irep.ntu.ac.uk/9061/1/205093_8162%20Baguley%20preprint.pdf

M. Bayarri and J. Berger, The interplay of Bayesian and frequentist analysis, Statistical Science, pp.58-80, 2004.

M. Beaudouin-lafon, Interaction is the future of computing, HCI Remixed: Reflections on Works That Have Influenced the HCI Community, pp.263-266, 2008.

R. Bender and S. Lange, Adjusting for multiple testing???when and how?, Journal of Clinical Epidemiology, vol.54, issue.4, pp.343-349, 2001.
DOI : 10.1016/S0895-4356(00)00314-0

R. Beyth-marom, F. Fidler, and G. Cumming, Statistical cognition: Towards evidence-based practice in statistics and statistics education, Statistics Education Research Journal, vol.7, issue.2, pp.20-39, 2008.

M. Brewer, Research design and issues of validity. Handbook of research methods in social and personality psychology pp, pp.3-16, 2000.

A. Brodeur, M. Lé, M. Sangnier, and Y. Zylberberg, Star wars: The empirics strike back, Paris School of Economics Working Paper, pp.2012-2041, 2012.
URL : https://hal.archives-ouvertes.fr/halshs-01158500

J. Carifio and R. Perla, Ten Common Misunderstandings, Misconceptions, Persistent Myths and Urban Legends about Likert Scales and Likert Response Formats and their Antidotes, Journal of Social Sciences, vol.3, issue.3, p.106, 2007.
DOI : 10.3844/jssp.2007.106.116

F. Chevalier, P. Dragicevic, and S. Franconeri, The not-so-staggering effect of staggered animated transitions on visual tracking. Visualization and Computer Graphics, IEEE Transactions on, vol.20, issue.12, pp.2241-2250, 2014.

R. Coe, It's the effect size, stupid, Paper presented at the British Educational Research Association annual conference, p.14, 2002.

J. Cohen, Things I have learned (so far)., American Psychologist, vol.45, issue.12, p.1304, 1990.
DOI : 10.1037/0003-066X.45.12.1304

J. Cohen, The earth is round (p???<???.05)., American Psychologist, vol.49, issue.12, p.997, 1994.
DOI : 10.1037/0003-066X.49.12.997

D. Colquhoun, An investigation of the false discovery rate and the misinterpretation of p-values, Royal Society Open Science, vol.8, issue.48, p.140216, 2014.
DOI : 10.1073/pnas.1313476110

M. Correll and M. Gleicher, Error bars considered harmful: Exploring alternate encodings for mean and error. Visualization and Computer Graphics, IEEE Transactions on, vol.20, issue.12, pp.2142-2151, 2014.
DOI : 10.1109/tvcg.2014.2346298

G. Cumming, Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better, Perspectives on Psychological Science, vol.54, issue.4, pp.286-300, 2008.
DOI : 10.1037/1082-989X.1.2.115

G. Cumming, Dance of the p values, 2009.

G. Cumming, Inference by eye: Reading the overlap of independent confidence intervals, Statistics in Medicine, vol.18, issue.1, pp.205-220, 2009.
DOI : 10.1111/j.1467-9280.2007.01881.x

G. Cumming, Inference by Eye: Confidence Intervals and How to Read Pictures of Data., American Psychologist, vol.60, issue.2, p.170, 2005.
DOI : 10.1037/0003-066X.60.2.170

G. Cumming, W. R. Cumming, G. Fidler, F. Vaux, and D. , Error bars in experimental biology, The Journal of Cell Biology, vol.428, issue.1, pp.7-11, 2007.
DOI : 10.7326/0003-4819-126-1-199701010-00006

R. Dawkins, The tyranny of the discontinuous mind, New Statesman, vol.19, pp.54-57, 2011.

K. Van-deemter, Not Exactly: in Praise of Vagueness Using Bayes to get the most out of non-significant results, 2010.

P. Dragicevic, My technique is 20% faster: Problems with reports of speed improvements in HCI The dance of plots, Research report Dragicevic P, 2012.

P. Dragicevic, C. F. Huot, and S. , Running an HCI experiment in multiple parallel universes, Proceedings of the extended abstracts of the 32nd annual ACM conference on Human factors in computing systems, CHI EA '14, pp.607-618, 2014.
DOI : 10.1145/2559206.2578881

URL : https://hal.archives-ouvertes.fr/hal-00976507

G. Drummond and S. Vowler, Show the data, don't conceal them Advances in physiology education, pp.130-132, 2011.

W. Duckworth and W. Stephenson, Resampling methods: Not just for statisticians anymore, 2003 Joint Statistical Meetings Ecklund A (2012) Beeswarm: the bee swarm plot, an alternative to stripchart. R package version 01, 2003.

E. Eich, Business Not as Usual, Psychological Science, vol.23, issue.1, pp.3-6, 2014.
DOI : 10.1177/0956797611417632

J. Fekete, V. Wijk, J. Stasko, J. North, and C. , The Value of Information Visualization, In: Information visualization, pp.1-18, 2008.
DOI : 10.1007/978-3-540-70956-5_1

URL : https://hal.archives-ouvertes.fr/hal-00701741

F. Fidler, The american psychological association publication manual sixth edition: Implications for statistics education. Data and context in statistics education: Towards an evidence based society, 2010.

F. Fidler and G. Cumming, Teaching confidence intervals: Problems and potential solutions Proceedings of the 55th International Statistics Institute Session Fidler F, Loftus GR (2009) Why figures with error bars should replace p values. Zeitschrift für Psychologie, Journal of Psychology, vol.217, issue.1, pp.27-37, 2005.

R. Fisher, Statistical methods and scientific induction, Journal of the Royal Statistical Society Series BMethodological), pp.69-78, 1955.

V. Franz and G. Loftus, Standard errors and confidence intervals in within-subjects designs: Generalizing Loftus and Masson (1994) and avoiding the biases of alternative accounts, Psychonomic Bulletin & Review, vol.6, issue.3, pp.395-404, 2012.
DOI : 10.3758/s13423-012-0230-1

R. Frick, Interpreting statistical testing: Process and propensity, not population and random sampling, Behavior Research Methods, Instruments, & Computers, vol.6, issue.3, pp.527-535, 1998.
DOI : 10.3758/BF03200686

M. Gardner and D. Altman, Confidence intervals rather than P values: estimation rather than hypothesis testing., BMJ, vol.292, issue.6522, pp.746-750, 1986.
DOI : 10.1136/bmj.292.6522.746

A. Gelman, Type 1, type 2, type S, and type M errors, 2004.

A. Gelman, Commentary, Human Development, vol.35, issue.5, pp.69-72, 2013.
DOI : 10.1159/000277221

A. Gelman, Interrogating <mml:math altimg="si1.gif" display="inline" overflow="scroll" xmlns:xocs="http://www.elsevier.com/xml/xocs/dtd" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.elsevier.com/xml/ja/dtd" xmlns:ja="http://www.elsevier.com/xml/ja/dtd" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:tb="http://www.elsevier.com/xml/common/table/dtd" xmlns:sb="http://www.elsevier.com/xml/common/struct-bib/dtd" xmlns:ce="http://www.elsevier.com/xml/common/dtd" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:cals="http://www.elsevier.com/xml/common/cals/dtd" xmlns:sa="http://www.elsevier.com/xml/common/struct-aff/dtd"><mml:mi>p</mml:mi></mml:math>-values, Journal of Mathematical Psychology, vol.57, issue.5, pp.188-189, 2013.
DOI : 10.1016/j.jmp.2013.03.005

A. Gelman and E. Loken, The Difference Between ???Significant??? and ???Not Significant??? is not Itself Statistically Significant, The American Statistician, vol.60, issue.4, pp.328-331, 2006.
DOI : 10.1198/000313006X152649

G. Gigerenzer, Mindless statistics, The Journal of Socio-Economics, vol.33, issue.5, pp.587-606, 2004.
DOI : 10.1016/j.socec.2004.09.033

G. Gigerenzer, L. Kruger, J. Beatty, T. Porter, L. Daston et al., The empire of chance: How probability changed science and everyday life, 1990.
DOI : 10.1017/CBO9780511720482

R. Giner-sorolla, Science or Art? How Aesthetic Standards Grease the Way Through the Publication Bottleneck but Undermine Science, Perspectives on Psychological Science, vol.19, issue.6, pp.562-571, 2012.
DOI : 10.1037/a0022790

J. Gliner, N. Leech, and G. Morgan, Problems With Null Hypothesis Significance Testing (NHST): What Do the Textbooks Say?, The Journal of Experimental Education, vol.26, issue.1, pp.83-92, 2002.
DOI : 10.1037/0003-066X.54.8.594

B. Goldacre, What doctors don't know about the drugs they prescribe, 2012.
DOI : 10.1037/e668492012-001

S. Goodman, Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy, Annals of Internal Medicine, vol.130, issue.12, pp.995-1004, 1999.
DOI : 10.7326/0003-4819-130-12-199906150-00008

S. Greenland and C. Poole, Living with P Values, Epidemiology, vol.24, issue.1, pp.62-68, 2013.
DOI : 10.1097/EDE.0b013e3182785741

W. Hager, The examination of psychological hypotheses by planned contrasts referring to two-factor interactions in fixed-effects ANOVA, Method Psychol Res Online, vol.7, pp.49-77, 2002.

H. Haller and S. Krauss, Misinterpretations of significance: A problem students share with their teachers, Methods of Psychological Research, vol.7, issue.1, pp.1-20, 2002.

R. Hoekstra, S. Finch, H. Kiers, and A. Johnson, Probability as certainty: Dichotomous thinking and the misuse ofp values, Psychonomic Bulletin & Review, vol.54, issue.6, pp.1033-1037, 2006.
DOI : 10.3758/BF03213921

H. Hofmann, L. Follett, M. Majumder, and D. Cook, Graphical tests for power comparison of competing designs. Visualization and Computer Graphics, IEEE Transactions on, vol.18, issue.12, pp.2441-2448, 2012.

K. Hornbaek, S. Sander, J. Bargas-avila, G. Simonsen, and J. , Is once enough?, Proceedings of the 32nd annual ACM conference on Human factors in computing systems, CHI '14, pp.3523-3532, 2014.
DOI : 10.1145/2556288.2557004

Y. Jansen, Physical and tangible information visualization, 2014.
URL : https://hal.archives-ouvertes.fr/tel-00981521

M. Kaptein and J. Robertson, Rethinking statistical analysis methods for CHI, Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems, CHI '12, pp.1105-1114, 2012.
DOI : 10.1145/2207676.2208557

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.457.1928

O. Keene, The log transformation is special, Statistics in Medicine, vol.28, issue.8, pp.811-819, 1995.
DOI : 10.1002/sim.4780140810

N. Kerr, HARKing: Hypothesizing After the Results are Known, Personality and Social Psychology Review, vol.18, issue.3, pp.196-217, 1998.
DOI : 10.1207/s15327957pspr0203_4

G. Kindlmann and C. Scheidegger, An algebraic process for visualization design. Visualization and Computer Graphics, IEEE Transactions on, vol.20, issue.12, pp.2181-2190, 2014.
DOI : 10.1109/tvcg.2014.2346325

K. Kirby and D. Gerlanc, BootES: An R package for bootstrap confidence intervals on effect sizes, Behavior Research Methods, vol.54, issue.4, pp.905-927, 2013.
DOI : 10.3758/s13428-013-0330-5

R. Kirk, Promoting Good Statistical Practices: Some Suggestions, Educational and Psychological Measurement, vol.46, issue.2, pp.213-218, 2001.
DOI : 10.1177/00131640121971185

R. Kline, M. Pigliucci, and J. Galef, What's wrong with statistical tests?and where we go from here Daniel Lakens on p-hacking and other problems in psychology research, 2004.

C. Lambdin, Significance tests as sorcery: Science is empirical???significance tests are not, Theory & Psychology, vol.46, issue.4861, pp.67-90, 2012.
DOI : 10.1037/0003-066X.54.8.594

S. Lazic, The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis?, BMC Neuroscience, vol.11, issue.1, p.5, 2010.
DOI : 10.1186/1471-2202-11-5

T. Levine, R. Weber, C. Hullett, H. Park, and L. Lindsey, A Critical Assessment of Null Hypothesis Significance Testing in Quantitative Communication Research, Human Communication Research, vol.7, issue.2, pp.171-187, 2008.
DOI : 10.2307/2280090

T. Levine, R. Weber, H. Park, and C. Hullett, A Communication Researchers??? Guide to Null Hypothesis Significance Testing and Alternatives, Human Communication Research, vol.45, issue.2, pp.188-209, 2008.
DOI : 10.1037/0033-295X.110.3.526

G. Loftus, A picture is worth a thousandp values: On the irrelevance of hypothesis testing in the microcomputer age, Behavior Research Methods, Instruments, & Computers, vol.20, issue.2, pp.250-256, 1993.
DOI : 10.3758/BF03204506

R. Maccallum, S. Zhang, K. Preacher, and D. Rucker, On the practice of dichotomization of quantitative variables., Psychological Methods, vol.7, issue.1, p.19, 2002.
DOI : 10.1037/1082-989X.7.1.19

N. Mazar, O. Amir, and D. Ariely, The Dishonesty of Honest People: A Theory of Self-Concept Maintenance, Journal of Marketing Research, vol.45, issue.6, pp.633-644, 2008.
DOI : 10.1509/jmkr.45.6.633

P. Meehl, Theory-Testing in Psychology and Physics: A Methodological Paradox, Philosophy of Science, vol.34, issue.2, pp.103-115, 1967.
DOI : 10.1086/288135

J. Miller, Short report: Reaction time analysis with outlier exclusion: Bias varies with sample size, The Quarterly Journal of Experimental Psychology Section A, vol.43, issue.4, pp.907-912, 1991.
DOI : 10.1037/0033-295X.83.3.190

R. Morey, R. Hoekstra, J. Rouder, M. Lee, and E. Wagenmakers, The fallacy of placing confidence in confidence intervals (version 2) Online draft, URL http, 2015.

M. Nelson, You might want a tolerance interval, 2011.

R. Newcombe, Interval estimation for the difference between independent proportions: comparison of eleven methods, Statistics in Medicine, vol.17, issue.8, pp.873-890, 1998.
DOI : 10.1002/(SICI)1097-0258(19980430)17:8<873::AID-SIM779>3.0.CO;2-I

R. Newcombe, Two-sided confidence intervals for the single proportion: comparison of seven methods, Statistics in Medicine, vol.17, issue.8, pp.857-872, 1998.
DOI : 10.1002/(SICI)1097-0258(19980430)17:8<857::AID-SIM777>3.0.CO;2-E

G. Newman and B. Scholl, Bar graphs depicting averages are perceptually misinterpreted: The within-the-bar bias, Psychonomic Bulletin & Review, vol.27, issue.4, pp.601-607, 2012.
DOI : 10.3758/s13423-012-0247-5

D. Norman, The Design of Everyday Things, 2002.
DOI : 10.15358/9783800648108

G. Norman, Likert scales, levels of measurement and the laws of statistics Advances in health sciences education, pp.625-632, 2010.

R. Nuzzo, Scientific method: Statistical errors, Nature, vol.506, issue.7487, pp.150-152, 2014.
DOI : 10.1038/506150a

J. Osborne and A. Overbay, The power of outliers (and why researchers should always check for them) Practical assessment, research & evaluation, vol.9, issue.6, pp.1-12, 2004.

C. Perin, P. Dragicevic, and J. Fekete, Revisiting Bertin matrices: New interactions for crafting tabular visualizations. Visualization and Computer Graphics, IEEE Transactions on, vol.20, issue.12, pp.2082-2091, 2014.
DOI : 10.1109/tvcg.2014.2346279

URL : https://hal.archives-ouvertes.fr/hal-01023890

P. Pollard and J. Richardson, On the probability of making Type I errors., Psychological Bulletin, vol.102, issue.1, p.159, 1987.
DOI : 10.1037/0033-2909.102.1.159

R. Rawls, BREAKING UP IS HARD TO DO, Chemical & Engineering News, vol.76, issue.25, pp.29-34, 1998.
DOI : 10.1021/cen-v076n025.p029

U. Reips and F. Funke, Interval-level measurement with visual analogue scales in Internet-based research: VAS Generator, Behavior Research Methods, vol.1, issue.3, pp.699-704, 2008.
DOI : 10.3758/BRM.40.3.699

R. Rensink, On the Prospects for a Science of Visualization, pp.147-175, 2014.
DOI : 10.1007/978-1-4614-7485-2_6

C. Ricketts and J. Berry, Teaching Statistics through Resampling, Teaching Statistics, vol.4, issue.2, pp.41-44, 1994.
DOI : 10.1111/j.1467-9639.1994.tb00685.x

R. Rosenthal, R. Rosenthal, and K. Fode, The effect of experimenter bias on the performance of the albino rat, Behavioral Science, vol.128, issue.3, pp.183-189, 1963.
DOI : 10.1002/bs.3830080302

R. Rosnow and R. Rosenthal, Statistical procedures and the justification of knowledge in psychological science., American Psychologist, vol.44, issue.10, p.1276, 1989.
DOI : 10.1037/10109-027

J. Rossi, Statistical power of psychological research: What have we gained in 20 years?, Journal of Consulting and Clinical Psychology, vol.58, issue.5, p.646, 1990.
DOI : 10.1037/0022-006X.58.5.646

J. Sauro and J. Lewis, Average task times in usability tests, Proceedings of the 28th international conference on Human factors in computing systems, CHI '10, pp.2347-2350, 2010.
DOI : 10.1145/1753326.1753679

F. Schmidt and J. Hunter, Eight common but false objections to the discontinuation of significance testing in the analysis of research data. What if there were no significance tests pp, pp.37-64, 1997.

J. Simmons, L. Nelson, and U. Simonsohn, False-Positive Psychology, Psychological Science, vol.47, issue.11, pp.1359-1366, 2011.
DOI : 10.1093/biomet/64.2.191

R. Smith, T. Levine, K. Lachlan, and T. Fediuk, The High Cost of Complexity in Experimental Design and Data Analysis: Type I and Type II Error Rates in Multiway ANOVA, Human Communication Research, vol.5, issue.4, pp.515-530, 2002.
DOI : 10.1037//0003-066X.54.8.594

A. Stewart-oaten, Rules and Judgments in Statistics: Three Examples, Ecology, vol.76, issue.6, 1995.
DOI : 10.2307/1940736

B. Thompson, Statistical significance and effect size reporting: Portrait of a possible future, Research in the Schools, vol.5, issue.2, pp.33-38, 1998.

B. Thompson, Statistical Significance Tests, Effect Size Reporting and the Vain Pursuit of Pseudo-Objectivity, Theory & Psychology, vol.9, issue.2, pp.191-196, 1999.
DOI : 10.1177/095935439992007

D. Trafimow and M. Marks, Editorial, Basic and Applied Social Psychology, vol.37, issue.1, pp.1-2, 2015.
DOI : 10.1080/01973533.2014.865505

W. Tryon, Evaluating statistical difference, equivalence, and indeterminacy using inferential confidence intervals: An integrated alternative method of conducting null hypothesis statistical tests., Psychological Methods, vol.6, issue.4, p.371, 2001.
DOI : 10.1037/1082-989X.6.4.371

J. Tukey, We need both exploratory and confirmatory, The American Statistician, vol.34, issue.1, pp.23-25, 1980.
DOI : 10.1080/00031305.1980.10482706

R. Ulrich and J. Miller, Effects of truncation on reaction time analysis., Journal of Experimental Psychology: General, vol.123, issue.1, p.34, 1994.
DOI : 10.1037/0096-3445.123.1.34

P. Velleman and L. Wilkinson, Nominal, ordinal, interval, and ratio typologies are misleading, The American Statistician, vol.47, issue.1, pp.65-72, 1993.
DOI : 10.1080/00031305.1993.10475938

K. Vicente and G. Torenvliet, The Earth is spherical (p < 0.05): alternative methods of statistical inference, Theoretical Issues in Ergonomics Science, vol.1, issue.3, pp.248-271, 2000.
DOI : 10.1080/14639220110037065

B. Victor, Explorable explanations. Online, URL http, 2011.

H. Wainer, How to display data badly, Amer Statist, vol.38, issue.2, pp.137-147, 1984.
DOI : 10.1080/00031305.1984.10483186

H. Wickham and L. Stryjewski, 40 years of boxplots Am Statistician Wierdsma A (2013) What is wrong with tests of normality? Online, URL http, 2011.

R. Wilcox, How many discoveries have been lost by ignoring modern statistical methods?, American Psychologist, vol.53, issue.3, p.300, 1998.
DOI : 10.1037/0003-066X.53.3.300

L. Wilkinson, Statistical methods in psychology journals: Guidelines and explanations., American Psychologist, vol.54, issue.8, p.594, 1999.
DOI : 10.1037/0003-066X.54.8.594

W. Willett, J. B. Isenberg, T. Dragicevic, and P. , Lightweight Relief Shearing for Enhanced Terrain Perception on Interactive Maps, Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, CHI '15, pp.15-3563, 2015.
DOI : 10.1145/2702123.2702172

URL : https://hal.archives-ouvertes.fr/hal-01105179

W. Wilson, A note on the inconsistency inherent in the necessity to perform multiple comparisons., Psychological Bulletin, vol.59, issue.4, p.296, 1962.
DOI : 10.1037/h0040447

M. Wood, Statistical inference using bootstrap confidence intervals, Significance, vol.1, issue.4, pp.180-182, 2004.
DOI : 10.1111/j.1740-9713.2004.00067.x

M. Wood, Bootstrapped Confidence Intervals as an Approach to Statistical Inference, Organizational Research Methods, vol.3, issue.2, pp.454-470, 2005.
DOI : 10.1177/1094428105280059

J. Zacks and B. Tversky, Bars and lines: A study of graphic communication, Memory & Cognition, vol.118, issue.6, pp.1073-1079, 1999.
DOI : 10.3758/BF03201236

S. Ziliak and D. Mccloskey, The cult of statistical significance, 2008.