The implementation of the cilk-5 multithreaded language, SIGPLAN Not, pp.212-223, 1998. ,
Intel Threading Building Blocks: Outfitting C++ for Multi- Core Processor Parallelism, 2007. ,
Improving performance of adaptive component-based dataflow middleware, Parallel Computing, vol.38, issue.6-7, pp.289-309, 2012. ,
DOI : 10.1016/j.parco.2012.03.005
Programming Heterogeneous Clusters with Accelerators Using Object-Based Programming, Scientific Programming, vol.19, issue.1, pp.47-62, 2011. ,
DOI : 10.1155/2011/525717
Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations, Euro-Par 2010 -Parallel Processing, pp.235-246, 2010. ,
DOI : 10.1007/978-3-642-15291-7_23
URL : https://hal.archives-ouvertes.fr/inria-00502448
Legion: Expressing locality and independence with logical regions, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, p.66, 2012. ,
DOI : 10.1109/SC.2012.71
PaRSEC: Exploiting Heterogeneity to Enhance Scalability, Computing in Science & Engineering, vol.15, issue.6, pp.36-45, 2013. ,
DOI : 10.1109/MCSE.2013.98
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, vol.23, issue.4, pp.187-198, 2011. ,
DOI : 10.1002/cpe.1631
URL : https://hal.archives-ouvertes.fr/inria-00384363
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, pp.851-862, 2009. ,
DOI : 10.1109/TPDS.2003.1214317
The Design and Implementation of FFTW3, Proceedings of the IEEE, pp.216-231, 2005. ,
DOI : 10.1109/JPROC.2004.840301
Programming matrix algorithms-by-blocks for thread-level parallelism, ACM Transactions on Mathematical Software, vol.36, issue.3 ,
DOI : 10.1145/1527286.1527288
Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, issue.1 ,
DOI : 10.1088/1742-6596/180/1/012037
Hierarchical DAG Scheduling for Hybrid Distributed Systems, 2015 IEEE International Parallel and Distributed Processing Symposium, 2015. ,
DOI : 10.1109/IPDPS.2015.56
URL : https://hal.archives-ouvertes.fr/hal-01078359
Composing multiple starpu applications over heterogeneous machines: A supervised approach, IJHPCA, vol.28, issue.3, pp.285-300, 2014. ,
DOI : 10.1109/ipdpsw.2013.217
URL : https://hal.archives-ouvertes.fr/hal-00824514
Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems, Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, pp.365-376, 2012. ,
DOI : 10.1145/2304576.2304625
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.259.5355
Dense matrix computation on a heterogenous architecture: A block synchronous approach, 2012. ,
Composing parallel software efficiently with lithe, SIGPLAN Not, pp.376-387, 2010. ,
DOI : 10.1145/1806596.1806639
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.172.2385
Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Transactions on Parallel and Distributed Systems, vol.13, issue.3, pp.260-274, 2002. ,
DOI : 10.1109/71.993206
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp.180-186, 2010. ,
DOI : 10.1109/PDP.2010.67
URL : https://hal.archives-ouvertes.fr/inria-00429889
A hybridization methodology for high-performance linear algebra software for gpus, GPU Computing Gems, pp.473-484, 2011. ,
DOI : 10.1016/b978-0-12-385963-1.00034-4
Scheduling of Linear Algebra Kernels on Multiple Heterogeneous Resources, 2016 IEEE 23rd International Conference on High Performance Computing (HiPC), 2016. ,
DOI : 10.1109/HiPC.2016.045
URL : https://hal.archives-ouvertes.fr/hal-01361992
Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems, ACM Transactions on Mathematical Software, vol.43, issue.2 ,
DOI : 10.1145/2898348
URL : https://hal.archives-ouvertes.fr/hal-01333645
Programming model, performance analysis and optimization techniques for the Intel Knights Landing Xeon Phi, IEEE High Performance Extreme Computing Conference (HPEC'16, p.2016 ,
Are Static Schedules so Bad? A Case Study on Cholesky Factorization, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2016. ,
DOI : 10.1109/IPDPS.2016.90
URL : https://hal.archives-ouvertes.fr/hal-01223573