Solving dense linear systems on platforms with multiple hardware accelerators, Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, ser. PPoPP '09, pp.121-130, 2009. ,
A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009. ,
DOI : 10.1016/j.parco.2008.10.002
Towards dense linear algebra for hybrid GPU accelerated manycore systems, Parallel Computing, vol.36, issue.5-6, pp.232-240, 2010. ,
DOI : 10.1016/j.parco.2009.12.005
QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, 2011 IEEE International Parallel & Distributed Processing Symposium, 2011. ,
DOI : 10.1109/IPDPS.2011.90
URL : https://hal.archives-ouvertes.fr/inria-00547614
A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures, 2011 Symposium on Application Accelerators in High-Performance Computing, pp.150-158, 2011. ,
DOI : 10.1109/SAAHPC.2011.18
The implementation of the cilk-5 multithreaded language, Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, ser. PLDI '98, pp.212-223, 1998. ,
Starpu: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, pp.187-198, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
KAAPI, Proceedings of the 2007 international workshop on Parallel symbolic computation, PASCO '07, 2007. ,
DOI : 10.1145/1278177.1278182
URL : https://hal.archives-ouvertes.fr/hal-00647474
Intel threading building blocks, 2007. ,
Productive Programming of GPU Clusters with OmpSs, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.2012-2012, 2012. ,
DOI : 10.1109/IPDPS.2012.58
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, Euro-Par 2009 Parallel Processing, pp.851-862, 2009. ,
DOI : 10.1109/TPDS.2003.1214317
Athapascan-1: On-line building data flow graph in a parallel language, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192), p.88, 1998. ,
DOI : 10.1109/PACT.1998.727176
Multigpu and multi-cpu parallelization for interactive physics simulations, Euro-Par 2010, pp.235-246, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00502448
Quark users' guide: Queueing and runtime for kernels, 2011. ,
libKOMP, an Efficient OpenMP Runtime System for Both Fork-Join and Data Flow Paradigms, IWOMP, pp.102-115, 2012. ,
DOI : 10.1007/978-3-642-30961-8_8
URL : https://hal.archives-ouvertes.fr/hal-00796253
The X-Kaapi's Application Programming Interface. Part I: Data Flow Programming, 2011. ,