E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, H. Ltaief et al., QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, 2011 IEEE International Parallel & Distributed Processing Symposium, 2011.
DOI : 10.1109/IPDPS.2011.90

URL : https://hal.archives-ouvertes.fr/inria-00547614

J. Ansel, C. P. Chan, Y. L. Wong, M. Olszewski, Q. Zhao et al., PetaBricks: A language and compiler for algorithmic choice, Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2009, pp.38-49, 2009.

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, pp.187-198, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00384363

A. Buttari, L. Langou, J. Kurzak, and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009.
DOI : 10.1016/j.parco.2008.10.002

H. Chafi, A. K. Sujeeth, K. J. Brown, H. Lee, A. R. Atreya et al., A domain-specific approach to heterogeneous parallelism, 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp.35-46, 2011.

P. Cooper, U. Dolinsky, A. F. Donaldson, A. Richards, C. Riley et al., Offload ??? Automating Code Migration to Heterogeneous Multicore Systems, High Performance Embedded Architectures and Compilers, 5th International Conference, pp.337-352, 2010.
DOI : 10.1007/978-3-642-11515-8_25

U. Dastgeer, J. Enmyren, and C. Kessler, Auto-tuning SkePU, Proceeding of the 4th international workshop on Multicore software engineering, IWMSE '11, 2011.
DOI : 10.1145/1984693.1984697

K. Fatahalian, D. R. Horn, T. J. Knight, L. Leem, M. Houston et al., Sequoia: Programming the Memory Hierarchy, ACM/IEEE SC 2006 Conference (SC'06), p.83, 2006.
DOI : 10.1109/SC.2006.55

A. Gidenstam, H. Sundell, and P. Tsigas, Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency, Proceedings of the 14th International Conference on Principle of Distributed Systems, pp.302-317, 2010.
DOI : 10.1145/1556444.1556455

P. H. Ha, P. Tsigas, and O. J. Anshus, NB-FEB: A Universal Scalable Easy-to-Use Synchronization Primitive for Manycore Architectures, Principles of Distributed Systems, 13th International Conference, pp.189-203, 2009.
DOI : 10.1007/978-3-642-10877-8_16

M. W. Hall, Y. Gil, and R. F. Lucas, Self-Configuring Applications for Heterogeneous Systems: Program Composition and Optimization Using Cognitive Techniques, Proceedings of the IEEE, pp.849-862, 2008.
DOI : 10.1109/JPROC.2008.917733

C. W. Kessler and W. Löwe, A framework for performance-aware composition of explicitly parallel components, Parallel Computing: Architectures, Algorithms and Applications of Advances in Parallel Computing, pp.227-234, 2007.

N. Leischner, V. Osipov, and P. Sanders, GPU sample sort, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010.
DOI : 10.1109/IPDPS.2010.5470444

M. D. Linderman, J. D. Collins, H. Wang, and T. H. Meng, Merge: A programming model for heterogeneous multi-core systems, Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, pp.287-296, 2008.

M. Sandrieser, S. Benkner, and S. Pllana, Explicit Platform Descriptions for Heterogeneous Many-Core Architectures, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, 2011.
DOI : 10.1109/IPDPS.2011.280

J. Singler, P. Sanders, and F. Putze, MCSTL: The Multi-core Standard Template Library, Euro-Par 2007, Parallel Processing, 13th International Euro-Par Conference, pp.682-694
DOI : 10.1007/978-3-540-74466-5_72

N. Thomas, G. Tanase, O. Tkachyshyn, J. Perdue, N. M. Amato et al., A framework for adaptive algorithm selection in STAPL, Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming , PPoPP '05, pp.277-288, 2005.
DOI : 10.1145/1065944.1065981

S. Tomov, R. Nath, H. Ltaief, and J. Dongarra, Dense linear algebra solvers for multicore with GPU accelerators, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), pp.1-8, 2010.
DOI : 10.1109/IPDPSW.2010.5470941

H. Topcuoglu, S. Hariri, and M. Wu, Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Transactions on Parallel and Distributed Systems, vol.13, issue.3, pp.260-274, 2002.
DOI : 10.1109/71.993206

J. L. Träff and M. Wimmer, Work-stealing for mixed-mode parallelism by deterministic team-building, 23rd ACM Symposium on Parallelism in Algorithms and Architectures, pp.105-115, 2011.

J. R. Wernsing and G. Stitt, Elastic computing: A framework for transparent, portable, and adaptive multi-core heterogeneous computing, Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems (LCTES), pp.115-124, 2010.

J. Larsson-träff-received and A. M. Sc, He spent four years as a Research Associate in the Algorithms Group of the Max-Planck Institute for Computer Science in Saarbrücken, and the Efficient Algorithms Group at the Technical University of Munich. From 1998 until late 2009 he was working at the NEC Laboratories Europe in Sankt Augustin, Germany on efficient implementations of MPI for NEC vector supercomputers. This work led to a doctorate (Dr. Scient.) from the University of Copenhagen in 2009. Since 2010 he is Professor for Scientific Computing at the University of Vienna. His research interests are broadly in parallel processing and include interfaces, algorithms, and architectures. He is currently scientific coordinator for the European FP7 project PEPPHER, With Martti Forsell he organizes the annual Euro-Par Workshop on Highly Parallel Processing on a Chip (HPPC), 1989.

D. Moloney and B. Eng, from Dublin City University in 1985, and Ph.D. in Engineering from Trinity College Dublin in 2010 Since 1985 he worked for Siemens Halbleiter AG (Infineon) in Munich and ST Microelectronics in Milan as a DSP IC designer, before returning to Ireland, series of start-up technology companies including Parthus (CEVA) and, 1994.

. Silansys, David Moloney is currently co-founder (2005) and CTO of Movidius Ltd., a fabless semiconductor company headquartered in Dublin and focused on the design of software programmable multimedia accelerator SoCs. He holds 18 US patents with many others 26