K. Asanovic, The landscape of parallel computing research: a view from Berkeley, 2006.

J. Dongarra, The International Exascale Software Project roadmap, International Journal of High Performance Computing Applications, vol.25, issue.1, pp.3-60, 2011.
DOI : 10.1177/1094342010391989

R. Whaley and J. Dongarra, Automatically Tuned Linear Algebra Software, Proceedings of the IEEE/ACM SC98 Conference, 1998.
DOI : 10.1109/SC.1998.10004

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.108.3487

B. Aarts, OCEANS: Optimizing compilers for embedded applications, Proc. Euro-Par 97, pp.1351-1356, 1997.
DOI : 10.1007/BFb0002894

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.389

F. Matteo and S. Johnson, FFTW: An adaptive software architecture for the FFT, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.1381-1384, 1998.

K. D. Cooper, P. J. Schielke, and D. Subramanian, Optimizing for reduced code space using genetic algorithms, Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pp.1-9, 1999.

M. J. Voss and R. Eigenmann, ADAPT: Automated De-coupled Adaptive Program Transformation, Proceedings 2000 International Conference on Parallel Processing, 2000.
DOI : 10.1109/ICPP.2000.876107

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.8914

T. Kisuki, P. M. Knijnenburg, and M. F. O-'boyle, Combined selection of tile sizes and unroll factors using iterative compilation, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622), pp.237-246, 2000.
DOI : 10.1109/PACT.2000.888348

G. G. Fursin, M. F. O-'boyle, and P. M. Knijnenburg, Evaluating Iterative Compilation, Proceedings of the Workshop on Languages and Compilers for Parallel Computers (LCPC), pp.305-315, 2002.
DOI : 10.1007/11596110_24

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.4652

T. Cristian, I. Chung, and J. K. Hollingsworth, Active harmony: towards automated performance tuning, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, Supercomputing '02, pp.1-11, 2002.

P. Kulkarni, W. Zhao, H. Moon, K. Cho, D. Whalley et al., Finding effective optimization phase sequences, Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pp.12-23, 2003.

B. Singer and M. Veloso, Learning to predict performance from formula modeling and training data, Proceedings of the Conference on Machine Learning, 2000.

J. Lu, H. Chen, P. Yew, and W. Hsu, Design and implementation of a lightweight dynamic optimization system, Journal of Instruction-Level Parallelism, pp.1-24, 2004.

C. Lattner and V. Adve, LLVM: A compilation framework for lifelong program analysis & transformation, International Symposium on Code Generation and Optimization, 2004. CGO 2004., 2004.
DOI : 10.1109/CGO.2004.1281665

B. Franke, M. F. O-'boyle, J. Thomson, and G. Fursin, Probabilistic source-level optimisation of embedded programs, Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), 2005.

Z. Pan and R. Eigenmann, Fast and effective orchestration of compiler optimizations for automatic performance tuning, Proceedings of the International Symposium on Code Generation and Optimization (CGO), pp.319-332, 2006.

K. Hoste and L. Eeckhout, Cole, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization , CGO '08, 2008.
DOI : 10.1145/1356058.1356080

H. David and . Bailey, Peri auto-tuning, Journal of Physics: Conference Series, vol.125, pp.1-6, 2008.

G. Fursin and O. Temam, Collective optimization, ACM Transactions on Architecture and Code Optimization, vol.7, issue.4, pp.1-2029, 2010.
DOI : 10.1145/1880043.1880047

URL : https://hal.archives-ouvertes.fr/inria-00445326

V. Jimenez, I. Gelado, L. Vilanova, M. Gil, G. Fursin et al., Predictive Runtime Code Scheduling for Heterogeneous Architectures, Proceedings of the International Conference on High Performance Embedded Architectures & Compilers, 2009.
DOI : 10.1007/978-3-540-92990-1_4

URL : https://hal.archives-ouvertes.fr/inria-00445304

G. Fursin, J. Cavazos, M. O. Boyle, and O. Temam, MiDataSets: Creating the Conditions for a More Realistic Evaluation of Iterative Optimization, Proceedings of the International Conference on High Performance Embedded Architectures & Compilers, 2007.
DOI : 10.1007/978-3-540-69338-3_17

L. Luo, Y. Chen, C. Wu, S. Long, and G. Fursin, Finding representative sets of optimizations for adaptive multiversioning applications, 3rd Workshop on Statistical and Machine Learning Approaches Applied to Architectures and Compilation (SMART'09), 2009.
URL : https://hal.archives-ouvertes.fr/inria-00436034

S. Benedict, V. Petkov, and M. Gerndt, PERISCOPE: An Online-Based Distributed Performance Analysis Tool, Tools for High Performance Computing, pp.1-16, 2009.
DOI : 10.1007/978-3-642-11261-4_1

R. Miceli, AutoTune: A Plugin-Driven Approach to the Automatic Tuning of Parallel Applications, Proceedings of the 11th International Conference on Applied Parallel and Scientific Computing'12, pp.328-342, 2013.
DOI : 10.1007/978-3-642-36803-5_24

S. Tomov, J. Dongarra, and M. Baboulin, Towards dense linear algebra for hybrid GPU accelerated manycore systems, Parallel Computing, vol.36, issue.5-6, pp.232-240, 2010.
DOI : 10.1016/j.parco.2009.12.005

M. Baboulin, D. Becker, and J. Dongarra, A Parallel Tiled Solver for Dense Symmetric Indefinite Systems on Multicore Architectures, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.14-24, 2012.
DOI : 10.1109/IPDPS.2012.12

URL : https://hal.archives-ouvertes.fr/inria-00631361

A. Monsifrot, F. Bodin, and R. Quiniou, A Machine Learning Approach to Automatic Production of Compiler Heuristics, Proceedings of the International Conference on Artificial Intelligence: Methodology, Systems, Applications, LNCS 2443, pp.41-50, 2002.
DOI : 10.1007/3-540-46148-5_5

M. Stephenson, S. Amarasinghe, M. Martin, and U. Reilly, Meta optimization: Improving compiler heuristics with machine learning [31] Gabriel Marin and John Mellor-Crummey. Cross-architecture performance predictions for scientific applications using parameterized models, Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'03), pp.77-902, 2003.

M. Stephenson and S. Amarasinghe, Predicting Unroll Factors Using Supervised Classification, International Symposium on Code Generation and Optimization, 2005.
DOI : 10.1109/CGO.2005.29

URL : http://cag.lcs.mit.edu/commit/papers/05/stephensonm_supervised.pdf

M. Zhao, B. R. Childers, and M. L. Soffa, A model-based framework: an approach for profit-driven optimization, Third Annual IEEE/ACM Interational Conference on Code Generation and Optimization, pp.317-327, 2005.

F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin et al., Using Machine Learning to Focus Iterative Optimization, International Symposium on Code Generation and Optimization (CGO'06), 2006.
DOI : 10.1109/CGO.2006.37

J. Cavazos, G. Fursin, F. Agakov, E. Bonilla, M. O. Boyle et al., Rapidly Selecting Good Compiler Optimizations using Performance Counters, International Symposium on Code Generation and Optimization (CGO'07), 2007.
DOI : 10.1109/CGO.2007.32

C. Dubach, T. M. Jones, E. V. Bonilla, G. Fursin, and M. F. O-'boyle, Portable compiler optimization across embedded programs and microarchitectures using machine learning, Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO), 2009.

J. Shen, A. L. Varbanescu, H. J. Sips, M. Arntzen, and D. G. Simons, Glinda, Proceedings of the ACM International Conference on Computing Frontiers, CF '13, p.14, 2013.
DOI : 10.1145/2482767.2482785

Y. Huang, L. Peng, C. Wu, Y. Kashnikov, J. Renneke et al., Transforming GCC into a research-friendly environment: plugins for optimization tuning and reordering, function cloning and program instrumentation [39] MILEPOST project archive (MachIne Learning for Embedded PrOgramS opTimization), 2nd International Workshop on GCC Research Opportunities (GROW), colocated with HiPEAC'10 conference, 2010.

G. Fursin and . Gcc, Milepost GCC: Machine Learning Enabled Self-tuning Compiler, International Journal of Parallel Programming, vol.16, issue.2???3, pp.296-327, 2011.
DOI : 10.1007/s10766-010-0161-2

URL : https://hal.archives-ouvertes.fr/hal-00685276

G. Vinton and . Cerf, Where is the science in computer science?, Communications of the ACM, vol.55, issue.10, pp.5-5, 2012.

G. Fursin, Collective Mind: cleaning up the research and experimentation mess in computer engineering using crowdsourcing, big data and machine learning, 1308.
URL : https://hal.archives-ouvertes.fr/hal-00850880

G. Fursin, Collective Tuning Initiative: automating and accelerating development and optimization of computing systems, Proceedings of the GCC Developers' Summit, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00436029

G. Fursin, Iterative Compilation and Performance Prediction for Numerical Applications, 2004.

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, A practical automatic polyhedral program optimization system, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2008.

T. W. Epps and L. B. Pulley, A test for normality based on the empirical characteristic function, Biometrika, vol.70, issue.3, pp.723-726, 1983.
DOI : 10.1093/biomet/70.3.723

H. T. Kung, F. Luccio, and F. P. Preparata, On Finding the Maxima of a Set of Vectors, Journal of the ACM, vol.22, issue.4, pp.469-476, 1975.
DOI : 10.1145/321906.321910

. Algorithmic, Game-Theoretic, and Logical Foundations, 2008.

H. Roubos and M. Setnes, Compact and transparent fuzzy models and classifiers through iterative complexity reduction. Fuzzy Systems, IEEE Transactions on, vol.9, issue.4, pp.516-524, 2001.
DOI : 10.1109/91.940965

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.395.8469

Y. Jin, Fuzzy modeling of high-dimensional systems: complexity reduction and interpretability improvement, IEEE Transactions on Fuzzy Systems, vol.8, issue.2, pp.212-221, 2000.
DOI : 10.1109/91.842154

G. Fursin, M. F. O-'boyle, O. Temam, and G. Watts, A fast and accurate method for determining a lower bound on execution time, Concurrency and Computation: Practice and Experience, vol.16, issue.23, pp.271-292, 2004.
DOI : 10.1002/cpe.774

P. Balaprakash, S. M. Wild, and P. D. Hovland, Can search algorithms save large-scale automatic performance tuning? Procedia Computer Science, Proceedings of the International Conference on Computational Science, pp.2136-2145, 2011.

S. Williams, A. Waterman, and D. Patterson, Roofline, Communications of the ACM, vol.52, issue.4, pp.65-76, 2009.
DOI : 10.1145/1498765.1498785

G. Fursin and A. Cohen, Building a practical iterative interactive compiler, 1st Workshop on Statistical and Machine Learning Approaches Applied to Architectures and Compilation (SMART'07), 2007.
URL : https://hal.archives-ouvertes.fr/inria-00128507

G. Fursin, A. Cohen, M. O. Boyle, and O. Temam, A Practical Method for Quickly Evaluating Program Optimizations, Proceedings of the International Conference on High Performance Embedded Architectures & Compilers, pp.29-46, 2005.
DOI : 10.1007/11587514_4

URL : https://hal.archives-ouvertes.fr/inria-00001054

J. Mars and R. Hundt, Scenario Based Optimization: A Framework for Statically Enabling Online Optimizations, 2009 International Symposium on Code Generation and Optimization, pp.169-179, 2009.
DOI : 10.1109/CGO.2009.24

C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), 2007.

G. E. Hinton, S. Osindero, and Y. Teh, A Fast Learning Algorithm for Deep Belief Nets, Neural Computation, vol.18, issue.7, pp.1527-1554, 2006.
DOI : 10.1162/jmlr.2003.4.7-8.1235

Q. Le, R. Marc-'aurelio-ranzato, M. Monga, K. Devin, G. Chen et al., Building high-level features using large scale unsupervised learning, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2012.
DOI : 10.1109/ICASSP.2013.6639343

URL : http://arxiv.org/abs/1112.6209

D. Ferrucci, Building Watson: An Overview of the DeepQA Project, AI Magazine, vol.31, issue.3, pp.59-79, 2010.

Y. Chen, L. Eeckhout, G. Fursin, L. Peng, O. Temam et al., Evaluating iterative optimization across 1000 data sets, Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI, 2010.
DOI : 10.1145/1806596.1806647

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.188.4481

Q. Yi, K. Seymour, H. You, R. Vuduc, and D. Quinlan, POET: Parameterized Optimizations for Empirical Tuning, 2007 IEEE International Parallel and Distributed Processing Symposium, 2007.
DOI : 10.1109/IPDPS.2007.370637

A. Hartono, B. Norris, and P. Sadayappan, Annotation-based empirical performance tuning using Orio, 2009 IEEE International Symposium on Parallel & Distributed Processing, pp.1-11, 2009.
DOI : 10.1109/IPDPS.2009.5161004

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.149.1902

J. Ansel, C. Chan, Y. L. Wong, M. Olszewski, Q. Zhao et al., PetaBricks: a language and compiler for algorithmic choice, Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation, PLDI '09, pp.38-49, 2009.

M. Baboulin, S. Donfack, J. Dongarra, L. Grigori, A. Rémy et al., A Class of Communication-avoiding Algorithms for Solving General Dense Linear Systems on CPU/GPU Parallel Machines, ICCS, pp.17-26, 2012.
DOI : 10.1016/j.procs.2012.04.003

URL : https://hal.archives-ouvertes.fr/hal-00656457

B. Becker and . Mohr, The Scalasca performance toolset architecture, Concurr. Comput. : Pract. Exper, vol.22, issue.6, pp.702-719, 2010.

A. Calotoiu, T. Hoefler, M. Poke, and F. Wolf, Using automated performance modeling to find scalability bugs in complex codes, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '13, p.45, 2013.
DOI : 10.1145/2503210.2503277

J. Treibig, G. Hager, and G. Wellein, LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments, 2010 39th International Conference on Parallel Processing Workshops, 2010.
DOI : 10.1109/ICPPW.2010.38

R. Piotr, . Luszczek, H. David, . Bailey, J. Jack et al., The HPC Challenge (HPCC) Benchmark Suite, Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC '06, 2006.

M. A. Heroux and J. Dongarra, Toward a new metric for ranking high performance computing systems, 2013.
DOI : 10.2172/1089988

S. Sameer, A. D. Shende, and . Malony, The Tau Parallel Performance System, Int. J. High Perform. Comput. Appl, vol.20, issue.2, pp.287-311, 2006.

Y. Chen, Y. Huang, L. Eeckhout, G. Fursin, L. Peng et al., Evaluating iterative optimization across 1000 data sets, Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2010.
DOI : 10.1145/1806596.1806647

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.188.4481

C. Luk, S. Hong, and H. Kim, Qilin, Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Micro-42, pp.45-55, 2009.
DOI : 10.1145/1669112.1669121

S. Zuckerman, J. Suetterlein, R. Knauerhase, and G. R. Gao, Using a "codelet" program execution model for exascale machines, Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era, EXADAPT '11, pp.64-69, 2011.
DOI : 10.1145/2000417.2000424

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, vol.23, issue.4, pp.187-198, 2011.
DOI : 10.1002/cpe.1631

URL : https://hal.archives-ouvertes.fr/inria-00384363

G. Fursin and C. Dubach, Experience report: community-driven reviewing and validation of publications, Proceedings of the 1st Workshop on Reproducible Research Methodologies and New Publication Models in Computer Engineering (ACM SIGPLAN TRUST'14, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01006563