Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators, Symposium on Application Accelerators in High Performance Computing (SAAHPC), 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00547616
StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators, Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface, ser. EuroMPI'12, pp.298-299, 2012. ,
DOI : 10.1007/978-3-642-33518-1_40
URL : https://hal.archives-ouvertes.fr/hal-00725477
Parallelizing dense and banded linear algebra libraries using SMPSs, Concurrency and Computation: Practice and Experience, vol.14, issue.7, pp.2438-2456, 2009. ,
DOI : 10.1002/cpe.1463
Space-Efficient Scheduling of Multithreaded Computations, SIAM Journal on Computing, vol.27, issue.1, pp.202-229, 1998. ,
DOI : 10.1137/S0097539793259471
libKOMP, an Efficient OpenMP Runtime System for Both Fork-Join and Data Flow Paradigms, Proceedings of the 8th international conference on OpenMP in a Heterogeneous World, ser. IWOMP'12, pp.102-115978, 2012. ,
DOI : 10.1007/978-3-642-30961-8_8
URL : https://hal.archives-ouvertes.fr/hal-00796253
Implementing OmpSs support for regions of data in architectures with multiple address spaces, Proceedings of the 27th international ACM conference on International conference on supercomputing, ICS '13, pp.359-368, 2013. ,
DOI : 10.1145/2464996.2465017
Productive Programming of GPU Clusters with OmpSs, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.557-568, 2012. ,
DOI : 10.1109/IPDPS.2012.58
A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009. ,
DOI : 10.1016/j.parco.2008.10.002
Twodimensional block partitionings for the parallel sparse cholesky factorization, Numerical Algorithms, vol.16, issue.1, pp.17-38, 1997. ,
DOI : 10.1023/A:1019122726788
URL : https://hal.archives-ouvertes.fr/inria-00073533
A Proposal to Extend the OpenMP Tasking Model with Dependent Tasks, International Journal of Parallel Programming, vol.26, issue.6, pp.292-305, 2009. ,
DOI : 10.1007/s10766-009-0101-1
An Efficient OpenMP Loop Scheduler for Irregular Applications on Large-Scale NUMA Machines, IWOMP, 2013. ,
DOI : 10.1007/978-3-642-40698-0_11
URL : https://hal.archives-ouvertes.fr/hal-00867438
New bounds for parallel prefix circuits, Proceedings of the fifteenth annual ACM symposium on Theory of computing , STOC '83, pp.100-109, 1983. ,
DOI : 10.1145/800061.808738
The implementation of the Cilk-5 multithreaded language, ACM SIGPLAN Notices, vol.33, issue.5, pp.212-223, 1998. ,
DOI : 10.1145/277652.277725
Athapascan-1: On-line building data flow graph in a parallel language, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192), p.88, 1998. ,
DOI : 10.1109/PACT.1998.727176
KAAPI, Proceedings of the 2007 international workshop on Parallel symbolic computation, PASCO '07, 2007. ,
DOI : 10.1145/1278177.1278182
URL : https://hal.archives-ouvertes.fr/hal-00647474
XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, 2013. ,
DOI : 10.1109/IPDPS.2013.66
URL : https://hal.archives-ouvertes.fr/hal-00799904
Flat combining and the synchronization-parallelism tradeoff, Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures, SPAA '10, 2010. ,
DOI : 10.1145/1810479.1810540
Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations, EUROPAR 2010, 2010. ,
DOI : 10.1007/978-3-642-15291-7_23
URL : https://hal.archives-ouvertes.fr/inria-00502448
Scheduling dense linear algebra operations on multicore processors, Concurrency and Computation: Practice and Experience, vol.35, issue.2, pp.15-44, 2010. ,
DOI : 10.1145/1377612.1377615
The Problem with Threads, Computer, vol.39, issue.5, pp.33-42, 2006. ,
DOI : 10.1109/MC.2006.180
X-Kaapi C programming interface, INRIA, Tech. Rep. RT-0417, 2011. ,
The X-Kaapi's Application Programming Interface. Part I: Data Flow Programming, INRIA, 2011. ,
Rose compiler project ,
Intel threading building blocks, 2007. ,
A Tighter Analysis of Work Stealing, The 21st International Symposium on Algorithms and Computation (ISAAC), 2010. ,
DOI : 10.1007/978-3-642-17514-5_25
URL : https://hal.archives-ouvertes.fr/hal-00788864
Dequefree work-optimal parallel STL algorithms, 2008. ,
Quark users' guide: Queueing and runtime for kernels, 2011. ,