, and orange headers. DLMT found similar speedups using smaller budgets for kernels marked with, Cost of best points found on each run, and the iteration where they were found, vol.4

K. Seymour, H. You, and J. Dongarra, A comparison of search heuristics for empirical code optimization, CLUSTER, pp.421-429, 2008.

P. M. Knijnenburg, T. Kisuki, and M. F. O'boyle, Combined selection of tile sizes and unroll factors using iterative compilation, The Journal of Supercomputing, vol.24, issue.1, pp.43-67, 2003.

P. Balaprakash, S. M. Wild, and P. D. Hovland, Can search algorithms save large-scale automatic performance tuning?" in ICCS, pp.2136-2145, 2011.

, An experimental study of global and local search algorithms in empirical performance tuning, International Conference on High Performance Computing for Computational Science, pp.261-269, 2012.

D. Beckingsale, O. Pearce, I. Laguna, and T. Gamblin, Apollo: Reusable models for fast, dynamic tuning of input-dependent code, The 31th IEEE International Parallel and Distributed Processing Symposium, 2017.

T. L. Falch and A. C. Elster, Machine learning-based auto-tuning for enhanced performance portability of opencl applications, Concurrency and Computation: Practice and Experience, vol.29, issue.8, 2017.

P. Balaprakash, A. Tiwari, S. M. Wild, and P. D. Hovland, AutoMOMML: Automatic Multi-objective Modeling with Machine Learning, High Performance Computing: 31st International Conference, ISC High Performance, pp.219-239, 2016.

P. Balaprakash, S. M. Wild, and B. Norris, SPAPT: Search problems in automatic performance tuning, Procedia Computer Science, vol.9, pp.1959-1968, 2012.

A. Hartono, B. Norris, and P. Sadayappan, Annotation-based empirical performance tuning using Orio, Parallel & Distributed Processing, pp.1-11, 2009.

B. Videau, K. Pouget, L. Genovese, T. Deutsch, D. Komatitsch et al., BOAST: A metaprogramming framework to produce portable and efficient computing kernels for hpc applications, The International Journal of High Performance Computing Applications, p.1094342017718068, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01620778

A. Tiwari, C. Chen, J. Chame, M. Hall, and J. K. Hollingsworth, A scalable auto-tuning framework for compiler optimization, Parallel & Distributed Processing, pp.1-12, 2009.

Q. Yi, K. Seymour, H. You, R. Vuduc, and D. Quinlan, POET: Parameterized optimizations for empirical tuning, Parallel and Distributed Processing Symposium, pp.1-8, 2007.

J. Ansel, C. Chan, Y. L. Wong, M. Olszewski, Q. Zhao et al., PetaBricks: a language and compiler for algorithmic choice, vol.44, 2009.

J. R. Rice, The algorithm selection problem, Advances in Computers 15, pp.65-118, 1976.

J. Bilmes, K. Asanovic, C. Chin, and J. Demmel, Optimizing matrix multiply using PHiPAC: a portable, high-performance, ansi c coding methodology, Proceedings of International Conference on Supercomputing, 1997.

J. J. Dongarra and C. R. Whaley, Automatically tuned linear algebra software (ATLAS), Proceedings of SC, vol.98, 1998.

R. Vuduc, J. W. Demmel, and K. A. Yelick, OSKI: A library of automatically tuned sparse matrix kernels, Journal of Physics: Conference Series, vol.16, p.521, 2005.

M. Frigo and S. G. Johnson, FFTW: An adaptive software architecture for the fft, Proceedings of the 1998 IEEE International Conference on, vol.3, pp.1381-1384, 1998.

M. Gerndt and M. Ott, Automatic performance analysis with periscope, Concurrency and Computation: Practice and Experience, vol.22, issue.6, pp.736-748, 2010.

H. Jordan, P. Thoman, J. J. Durillo, S. Pellegrini, P. Gschwandtner et al., A multi-objective auto-tuning framework for parallel codes, High Performance Computing, Networking, Storage and Analysis (SC), 2012 International Conference for, pp.1-12, 2012.

F. Hutter, H. H. Hoos, K. Leyton-brown, and T. Stützle, ParamILS: an automatic algorithm configuration framework, Journal of Artificial Intelligence Research, vol.36, issue.1, pp.267-306, 2009.

J. Ansel, S. Kamil, K. Veeramachaneni, J. Ragan-kelley, J. Bosboom et al., Opentuner: An extensible framework for program autotuning, Proceedings of the 23rd international conference on Parallel architectures and compilation, pp.303-316, 2014.

R. , lhs: Latin Hypercube Samples, 2018.

R. L. Plackett and J. P. Burman, The design of optimum multifactorial experiments, Biometrika, vol.33, issue.4, pp.305-325, 1946.

U. Grömping, R package FrF2 for creating and analyzing fractional factorial 2-level designs, Journal of Statistical Software, vol.56, issue.1, pp.1-56, 2014.

. R-core-team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, 2018.

V. V. Fedorov, Theory of optimal experiments, 1972.

B. Wheeler, AlgDesign: Algorithmic Experimental Design, 2014.

S. Addelman and O. Kempthorne, Some main-effect plans and orthogonal arrays of strength two, The Annals of Mathematical Statistics, pp.1167-1176, 1961.

U. Grömping and R. Fontana, An algorithm for generating good mixed level factorial designs, Beuth University of Applied Sciences, 2018.

J. Fox and S. Weisberg, An R Companion to Applied Regression, Sage, 2011.

D. Balouek, A. Carpen-amarie, G. Charrier, F. Desprez, E. Jeannot et al., Adding virtualization capabilities to the Grid'5000 testbed, Cloud Computing and Services Science, ser. Communications in Computer and Information Science, vol.367, pp.3-20, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00946971

P. Bruel, Git repository with all scripts and data

P. Bruel, A. Goldman, S. R. Chalamalasetti, and D. Milojicic, Autotuning high-level synthesis for fpgas using opentuner and legup, International Conference on Reconfigurable Computing and FPGAs (ReConFig, 2017.