The landscape of parallel computing research: a view from Berkeley, 2006. ,
The International Exascale Software Project roadmap, International Journal of High Performance Computing Applications, vol.25, issue.1, pp.3-60, 2011. ,
DOI : 10.1177/1094342010391989
Automatically Tuned Linear Algebra Software, Proceedings of the IEEE/ACM SC98 Conference, 1998. ,
DOI : 10.1109/SC.1998.10004
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.108.3487
OCEANS: Optimizing compilers for embedded applications, Proc. Euro-Par 97, pp.1351-1356, 1997. ,
DOI : 10.1007/BFb0002894
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.389
FFTW: An adaptive software architecture for the FFT, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.1381-1384, 1998. ,
Optimizing for reduced code space using genetic algorithms, Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pp.1-9, 1999. ,
ADAPT: Automated De-coupled Adaptive Program Transformation, Proceedings 2000 International Conference on Parallel Processing, 2000. ,
DOI : 10.1109/ICPP.2000.876107
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.8914
Combined selection of tile sizes and unroll factors using iterative compilation, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622), pp.237-246, 2000. ,
DOI : 10.1109/PACT.2000.888348
Evaluating Iterative Compilation, Proceedings of the Workshop on Languages and Compilers for Parallel Computers (LCPC), pp.305-315, 2002. ,
DOI : 10.1007/11596110_24
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.4652
Active harmony: towards automated performance tuning, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, Supercomputing '02, pp.1-11, 2002. ,
Finding effective optimization phase sequences, Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pp.12-23, 2003. ,
Learning to predict performance from formula modeling and training data, Proceedings of the Conference on Machine Learning, 2000. ,
Design and implementation of a lightweight dynamic optimization system, Journal of Instruction-Level Parallelism, pp.1-24, 2004. ,
LLVM: A compilation framework for lifelong program analysis & transformation, International Symposium on Code Generation and Optimization, 2004. CGO 2004., 2004. ,
DOI : 10.1109/CGO.2004.1281665
Probabilistic source-level optimisation of embedded programs, Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), 2005. ,
Fast and effective orchestration of compiler optimizations for automatic performance tuning, Proceedings of the International Symposium on Code Generation and Optimization (CGO), pp.319-332, 2006. ,
Cole, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization , CGO '08, 2008. ,
DOI : 10.1145/1356058.1356080
Peri auto-tuning, Journal of Physics: Conference Series, vol.125, pp.1-6, 2008. ,
Collective optimization, ACM Transactions on Architecture and Code Optimization, vol.7, issue.4, pp.1-2029, 2010. ,
DOI : 10.1145/1880043.1880047
URL : https://hal.archives-ouvertes.fr/inria-00445326
Predictive Runtime Code Scheduling for Heterogeneous Architectures, Proceedings of the International Conference on High Performance Embedded Architectures & Compilers, 2009. ,
DOI : 10.1007/978-3-540-92990-1_4
URL : https://hal.archives-ouvertes.fr/inria-00445304
MiDataSets: Creating the Conditions for a More Realistic Evaluation of Iterative Optimization, Proceedings of the International Conference on High Performance Embedded Architectures & Compilers, 2007. ,
DOI : 10.1007/978-3-540-69338-3_17
Finding representative sets of optimizations for adaptive multiversioning applications, 3rd Workshop on Statistical and Machine Learning Approaches Applied to Architectures and Compilation (SMART'09), 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00436034
PERISCOPE: An Online-Based Distributed Performance Analysis Tool, Tools for High Performance Computing, pp.1-16, 2009. ,
DOI : 10.1007/978-3-642-11261-4_1
AutoTune: A Plugin-Driven Approach to the Automatic Tuning of Parallel Applications, Proceedings of the 11th International Conference on Applied Parallel and Scientific Computing'12, pp.328-342, 2013. ,
DOI : 10.1007/978-3-642-36803-5_24
Towards dense linear algebra for hybrid GPU accelerated manycore systems, Parallel Computing, vol.36, issue.5-6, pp.232-240, 2010. ,
DOI : 10.1016/j.parco.2009.12.005
A Parallel Tiled Solver for Dense Symmetric Indefinite Systems on Multicore Architectures, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.14-24, 2012. ,
DOI : 10.1109/IPDPS.2012.12
URL : https://hal.archives-ouvertes.fr/inria-00631361
A Machine Learning Approach to Automatic Production of Compiler Heuristics, Proceedings of the International Conference on Artificial Intelligence: Methodology, Systems, Applications, LNCS 2443, pp.41-50, 2002. ,
DOI : 10.1007/3-540-46148-5_5
Meta optimization: Improving compiler heuristics with machine learning [31] Gabriel Marin and John Mellor-Crummey. Cross-architecture performance predictions for scientific applications using parameterized models, Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'03), pp.77-902, 2003. ,
Predicting Unroll Factors Using Supervised Classification, International Symposium on Code Generation and Optimization, 2005. ,
DOI : 10.1109/CGO.2005.29
URL : http://cag.lcs.mit.edu/commit/papers/05/stephensonm_supervised.pdf
A model-based framework: an approach for profit-driven optimization, Third Annual IEEE/ACM Interational Conference on Code Generation and Optimization, pp.317-327, 2005. ,
Using Machine Learning to Focus Iterative Optimization, International Symposium on Code Generation and Optimization (CGO'06), 2006. ,
DOI : 10.1109/CGO.2006.37
Rapidly Selecting Good Compiler Optimizations using Performance Counters, International Symposium on Code Generation and Optimization (CGO'07), 2007. ,
DOI : 10.1109/CGO.2007.32
Portable compiler optimization across embedded programs and microarchitectures using machine learning, Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO), 2009. ,
Glinda, Proceedings of the ACM International Conference on Computing Frontiers, CF '13, p.14, 2013. ,
DOI : 10.1145/2482767.2482785
Transforming GCC into a research-friendly environment: plugins for optimization tuning and reordering, function cloning and program instrumentation [39] MILEPOST project archive (MachIne Learning for Embedded PrOgramS opTimization), 2nd International Workshop on GCC Research Opportunities (GROW), colocated with HiPEAC'10 conference, 2010. ,
Milepost GCC: Machine Learning Enabled Self-tuning Compiler, International Journal of Parallel Programming, vol.16, issue.2???3, pp.296-327, 2011. ,
DOI : 10.1007/s10766-010-0161-2
URL : https://hal.archives-ouvertes.fr/hal-00685276
Where is the science in computer science?, Communications of the ACM, vol.55, issue.10, pp.5-5, 2012. ,
Collective Mind: cleaning up the research and experimentation mess in computer engineering using crowdsourcing, big data and machine learning, 1308. ,
URL : https://hal.archives-ouvertes.fr/hal-00850880
Collective Tuning Initiative: automating and accelerating development and optimization of computing systems, Proceedings of the GCC Developers' Summit, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00436029
Iterative Compilation and Performance Prediction for Numerical Applications, 2004. ,
A practical automatic polyhedral program optimization system, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2008. ,
A test for normality based on the empirical characteristic function, Biometrika, vol.70, issue.3, pp.723-726, 1983. ,
DOI : 10.1093/biomet/70.3.723
On Finding the Maxima of a Set of Vectors, Journal of the ACM, vol.22, issue.4, pp.469-476, 1975. ,
DOI : 10.1145/321906.321910
Game-Theoretic, and Logical Foundations, 2008. ,
Compact and transparent fuzzy models and classifiers through iterative complexity reduction. Fuzzy Systems, IEEE Transactions on, vol.9, issue.4, pp.516-524, 2001. ,
DOI : 10.1109/91.940965
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.395.8469
Fuzzy modeling of high-dimensional systems: complexity reduction and interpretability improvement, IEEE Transactions on Fuzzy Systems, vol.8, issue.2, pp.212-221, 2000. ,
DOI : 10.1109/91.842154
A fast and accurate method for determining a lower bound on execution time, Concurrency and Computation: Practice and Experience, vol.16, issue.23, pp.271-292, 2004. ,
DOI : 10.1002/cpe.774
Can search algorithms save large-scale automatic performance tuning? Procedia Computer Science, Proceedings of the International Conference on Computational Science, pp.2136-2145, 2011. ,
Roofline, Communications of the ACM, vol.52, issue.4, pp.65-76, 2009. ,
DOI : 10.1145/1498765.1498785
Building a practical iterative interactive compiler, 1st Workshop on Statistical and Machine Learning Approaches Applied to Architectures and Compilation (SMART'07), 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00128507
A Practical Method for Quickly Evaluating Program Optimizations, Proceedings of the International Conference on High Performance Embedded Architectures & Compilers, pp.29-46, 2005. ,
DOI : 10.1007/11587514_4
URL : https://hal.archives-ouvertes.fr/inria-00001054
Scenario Based Optimization: A Framework for Statically Enabling Online Optimizations, 2009 International Symposium on Code Generation and Optimization, pp.169-179, 2009. ,
DOI : 10.1109/CGO.2009.24
Pattern Recognition and Machine Learning (Information Science and Statistics), 2007. ,
A Fast Learning Algorithm for Deep Belief Nets, Neural Computation, vol.18, issue.7, pp.1527-1554, 2006. ,
DOI : 10.1162/jmlr.2003.4.7-8.1235
Building high-level features using large scale unsupervised learning, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2012. ,
DOI : 10.1109/ICASSP.2013.6639343
URL : http://arxiv.org/abs/1112.6209
Building Watson: An Overview of the DeepQA Project, AI Magazine, vol.31, issue.3, pp.59-79, 2010. ,
Evaluating iterative optimization across 1000 data sets, Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI, 2010. ,
DOI : 10.1145/1806596.1806647
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.188.4481
POET: Parameterized Optimizations for Empirical Tuning, 2007 IEEE International Parallel and Distributed Processing Symposium, 2007. ,
DOI : 10.1109/IPDPS.2007.370637
Annotation-based empirical performance tuning using Orio, 2009 IEEE International Symposium on Parallel & Distributed Processing, pp.1-11, 2009. ,
DOI : 10.1109/IPDPS.2009.5161004
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.149.1902
PetaBricks: a language and compiler for algorithmic choice, Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation, PLDI '09, pp.38-49, 2009. ,
A Class of Communication-avoiding Algorithms for Solving General Dense Linear Systems on CPU/GPU Parallel Machines, ICCS, pp.17-26, 2012. ,
DOI : 10.1016/j.procs.2012.04.003
URL : https://hal.archives-ouvertes.fr/hal-00656457
The Scalasca performance toolset architecture, Concurr. Comput. : Pract. Exper, vol.22, issue.6, pp.702-719, 2010. ,
Using automated performance modeling to find scalability bugs in complex codes, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '13, p.45, 2013. ,
DOI : 10.1145/2503210.2503277
LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments, 2010 39th International Conference on Parallel Processing Workshops, 2010. ,
DOI : 10.1109/ICPPW.2010.38
The HPC Challenge (HPCC) Benchmark Suite, Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC '06, 2006. ,
Toward a new metric for ranking high performance computing systems, 2013. ,
DOI : 10.2172/1089988
The Tau Parallel Performance System, Int. J. High Perform. Comput. Appl, vol.20, issue.2, pp.287-311, 2006. ,
Evaluating iterative optimization across 1000 data sets, Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2010. ,
DOI : 10.1145/1806596.1806647
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.188.4481
Qilin, Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Micro-42, pp.45-55, 2009. ,
DOI : 10.1145/1669112.1669121
Using a "codelet" program execution model for exascale machines, Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era, EXADAPT '11, pp.64-69, 2011. ,
DOI : 10.1145/2000417.2000424
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, vol.23, issue.4, pp.187-198, 2011. ,
DOI : 10.1002/cpe.1631
URL : https://hal.archives-ouvertes.fr/inria-00384363
Experience report: community-driven reviewing and validation of publications, Proceedings of the 1st Workshop on Reproducible Research Methodologies and New Publication Models in Computer Engineering (ACM SIGPLAN TRUST'14, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01006563