A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009. ,
DOI : 10.1016/j.parco.2008.10.002
FLAME: Formal Linear Algebra Methods Environment, ACM Transactions on Mathematical Software, vol.27, issue.4, pp.422-455, 2001. ,
DOI : 10.1145/504210.504213
A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs, GPU Computing Gems, Jade Edition, pp.473-484, 2011. ,
DOI : 10.1016/B978-0-12-385963-1.00034-4
Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp.29-38, 2014. ,
DOI : 10.1109/IPDPSW.2014.9
URL : https://hal.archives-ouvertes.fr/hal-00925017
Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems, ACM Transactions On Mathematical Software. [Online]. Available, 2014. ,
DOI : 10.1145/2898348
URL : https://hal.archives-ouvertes.fr/hal-01333645
Task-based conjugate gradient: from multi-GPU towards heterogeneous architectures, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01334734
Data-driven execution of fast multipole methods, Concurrency and Computation: Practice and Experience, vol.26, issue.11, 1203. ,
DOI : 10.1002/cpe.3132
Task-Based FMM for Multicore Architectures, SIAM Journal on Scientific Computing, vol.36, issue.1, pp.66-93, 2014. ,
DOI : 10.1137/130915662
URL : https://hal.archives-ouvertes.fr/hal-00807368
Résolution directe rapide pour les eléments finis de frontiere en electromagnétisme et acoustique: H-matrices. Parallélisme et applications industrielles, 2014. ,
Application of the ParalleX execution model to stencil-based problems, International Supercomputing Conference, 2012. ,
DOI : 10.1007/s00450-012-0217-1
Task-based programming for Seismic Imaging: Preliminary Results Available: https, 2014 IEEE International Conference on High Performance Computing and Communications (HPCC), pp.1259-1266, 2014. ,
Lapack: a portable linear algebra library for high-performance computers, The 1990 ACM/IEEE conference on Supercomputing, 1990. ,
A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs, GPU Computing Gems, 2010. ,
DOI : 10.1016/B978-0-12-385963-1.00034-4
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, Concurrency and Computation: Practice and Experience, pp.187-198, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
Quark users' guide: Queueing and runtime for kernels, 2011. ,
Data-Aware Task Scheduling on Multi-accelerator Based Platforms, 2010 IEEE 16th International Conference on Parallel and Distributed Systems, 2010. ,
DOI : 10.1109/ICPADS.2010.129
URL : https://hal.archives-ouvertes.fr/inria-00523937
Automatic task graph generation techniques, System Sciences Proceedings of the Twenty-Eighth Hawaii International Conference on, pp.113-122, 1995. ,
Distributed Dense Numerical Linear Algebra Algorithms on massively parallel architectures: DPLASMA Available: https, Proceedings of the 25th IEEE International Symposium on Parallel & Distributed Processing Workshops and Phd Forum (IPDPSW'11), PDSEC 2011, pp.1432-1441, 2011. ,
Optimizing Compilers for Modern Architectures: A Dependence- Based Approach, 2002. ,
A high-productivity task-based programming model for clusters, Concurrency and Computation: Practice and Experience, pp.2421-2448, 2012. ,
DOI : 10.1002/cpe.2831
Dynamic task execution on shared and distributed memory architectures, 2012. ,
Implementing OmpSs support for regions of data in architectures with multiple address spaces, Proceedings of the 27th international ACM conference on International conference on supercomputing, ICS '13, 2013. ,
DOI : 10.1145/2464996.2465017
DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.1-2, pp.37-51 ,
DOI : 10.1016/j.parco.2011.10.003
A task parallel implementation of a scattered node stencil-based solver for the shallow water equations Programming models based on data versioning for dependency-aware task-based parallelisation, Swedish Workshop on Multi-Core Computing, 2013. [26] A. Zafari, M. Tillenius, and E. Larsson International Conference on Computational Science and Engineering, 2012. ,
Legion: Expressing locality and independence with logical regions, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, 2012. ,
DOI : 10.1109/SC.2012.71
Regent, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '15, 2015. ,
DOI : 10.1145/2807591.2807629
A scalable framework for heterogeneous GPU-based clusters, Proceedinbgs of the 24th ACM symposium on Parallelism in algorithms and architectures, SPAA '12, pp.91-100, 2012. ,
DOI : 10.1145/2312005.2312025
Improving performance of adaptive component-based dataflow middleware, Parallel Computing, vol.38, issue.6-7, pp.289-309, 2012. ,
DOI : 10.1016/j.parco.2012.03.005
Qilin, Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Micro-42, 2009. ,
DOI : 10.1145/1669112.1669121
Locality-aware work stealing on multi-CPU and multi-GPU architectures, Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG), 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00780890
Runtime support for object-based message-driven parallel applications on heterogeneous clusters, 2012. ,
Programming Heterogeneous Clusters with Accelerators Using Object-Based Programming, Scientific Programming, 2011. ,
DOI : 10.1155/2011/525717
Extending Unified Parallel C for GPU Computing, SIAM Conference on Parallel Processing for Scientific Computing (SIAMPP), 2010. ,
An Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters, HeteroPar, 2011. ,
DOI : 10.1007/978-3-642-29737-3_48
Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures, Proceedings of the International, pp.56-65, 2009. ,
DOI : 10.1007/978-3-642-14122-5_9
URL : https://hal.archives-ouvertes.fr/inria-00421333
Performance-effective and low-complexity task scheduling for heterogeneous computing Parallel and Distributed Systems, IEEE Transactions on, vol.13, issue.3, pp.260-274, 2002. ,
Hierarchical DAG Scheduling for Hybrid Distributed Systems Available: https, 29th IEEE International Parallel & Distributed Processing Symposium, 2015. ,
Controlling the Memory Subscription of Distributed Applications with a Task-Based Runtime System Available: https, 21st International Workshop on High-Level Parallel Programming Models and Supportive Environments, 2016. ,