E. Agullo, C. Augonnet, J. Dongarra, H. Ltaief, R. Namyst et al., Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators, Symposium on Application Accelerators in High Performance Computing (SAAHPC), 2010.
URL : https://hal.archives-ouvertes.fr/inria-00547616

C. Augonnet, O. Aumage, N. Furmento, R. Namyst, and S. Thibault, StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators, Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface, ser. EuroMPI'12, pp.298-299, 2012.
DOI : 10.1007/978-3-642-33518-1_40

URL : https://hal.archives-ouvertes.fr/hal-00725477

R. M. Badia, J. R. Herrero, J. Labarta, J. M. Pérez, E. S. Quintana-ortí et al., Parallelizing dense and banded linear algebra libraries using SMPSs, Concurrency and Computation: Practice and Experience, vol.14, issue.7, pp.2438-2456, 2009.
DOI : 10.1002/cpe.1463

R. D. Blumofe and C. E. Leiserson, Space-Efficient Scheduling of Multithreaded Computations, SIAM Journal on Computing, vol.27, issue.1, pp.202-229, 1998.
DOI : 10.1137/S0097539793259471

F. Broquedis, T. Gautier, and V. Danjean, libKOMP, an Efficient OpenMP Runtime System for Both Fork-Join and Data Flow Paradigms, Proceedings of the 8th international conference on OpenMP in a Heterogeneous World, ser. IWOMP'12, pp.102-115978, 2012.
DOI : 10.1007/978-3-642-30961-8_8

URL : https://hal.archives-ouvertes.fr/hal-00796253

J. Bueno, X. Martorell, R. M. Badia, E. Ayguadé, and J. Labarta, Implementing OmpSs support for regions of data in architectures with multiple address spaces, Proceedings of the 27th international ACM conference on International conference on supercomputing, ICS '13, pp.359-368, 2013.
DOI : 10.1145/2464996.2465017

J. Bueno, J. Planas, A. Duran, R. M. Badia, X. Martorell et al., Productive Programming of GPU Clusters with OmpSs, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.557-568, 2012.
DOI : 10.1109/IPDPS.2012.58

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009.
DOI : 10.1016/j.parco.2008.10.002

B. Dumitrescu, M. Doreille, J. Roch, and D. Trystram, Twodimensional block partitionings for the parallel sparse cholesky factorization, Numerical Algorithms, vol.16, issue.1, pp.17-38, 1997.
DOI : 10.1023/A:1019122726788

URL : https://hal.archives-ouvertes.fr/inria-00073533

A. Duran, R. Ferrer, E. Ayguadé, R. M. Badia, and J. Labarta, A Proposal to Extend the OpenMP Tasking Model with Dependent Tasks, International Journal of Parallel Programming, vol.26, issue.6, pp.292-305, 2009.
DOI : 10.1007/s10766-009-0101-1

M. Durand, F. Broquedis, T. Gautier, and B. Raffin, An Efficient OpenMP Loop Scheduler for Irregular Applications on Large-Scale NUMA Machines, IWOMP, 2013.
DOI : 10.1007/978-3-642-40698-0_11

URL : https://hal.archives-ouvertes.fr/hal-00867438

F. E. Fich, New bounds for parallel prefix circuits, Proceedings of the fifteenth annual ACM symposium on Theory of computing , STOC '83, pp.100-109, 1983.
DOI : 10.1145/800061.808738

M. Frigo, C. E. Leiserson, and K. H. Randall, The implementation of the Cilk-5 multithreaded language, ACM SIGPLAN Notices, vol.33, issue.5, pp.212-223, 1998.
DOI : 10.1145/277652.277725

F. Galilée, J. Roch, G. G. Cavalheiro, and M. Doreille, Athapascan-1: On-line building data flow graph in a parallel language, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192), p.88, 1998.
DOI : 10.1109/PACT.1998.727176

T. Gautier, X. Besseron, and L. Pigeon, KAAPI, Proceedings of the 2007 international workshop on Parallel symbolic computation, PASCO '07, 2007.
DOI : 10.1145/1278177.1278182

URL : https://hal.archives-ouvertes.fr/hal-00647474

T. Gautier, J. V. Ferreira-lima, N. Maillard, and B. Raffin, XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, 2013.
DOI : 10.1109/IPDPS.2013.66

URL : https://hal.archives-ouvertes.fr/hal-00799904

D. Hendler, I. Incze, N. Shavit, and M. Tzafrir, Flat combining and the synchronization-parallelism tradeoff, Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures, SPAA '10, 2010.
DOI : 10.1145/1810479.1810540

E. Hermann, B. Raffin, F. Faure, T. Gautier, and J. Allard, Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations, EUROPAR 2010, 2010.
DOI : 10.1007/978-3-642-15291-7_23

URL : https://hal.archives-ouvertes.fr/inria-00502448

J. Kurzak, H. Ltaief, J. Dongarra, and R. M. Badia, Scheduling dense linear algebra operations on multicore processors, Concurrency and Computation: Practice and Experience, vol.35, issue.2, pp.15-44, 2010.
DOI : 10.1145/1377612.1377615

E. A. Lee, The Problem with Threads, Computer, vol.39, issue.5, pp.33-42, 2006.
DOI : 10.1109/MC.2006.180

F. Lementec, V. Danjean, and T. Gautier, X-Kaapi C programming interface, INRIA, Tech. Rep. RT-0417, 2011.

F. Lementec, T. Gautier, and V. Danjean, The X-Kaapi's Application Programming Interface. Part I: Data Flow Programming, INRIA, 2011.

D. Quinlan, Rose compiler project

J. Reinders, Intel threading building blocks, 2007.

M. Tchiboukdjian, N. Gast, D. Trystram, J. Roch, and J. Bernard, A Tighter Analysis of Work Stealing, The 21st International Symposium on Algorithms and Computation (ISAAC), 2010.
DOI : 10.1007/978-3-642-17514-5_25

URL : https://hal.archives-ouvertes.fr/hal-00788864

D. Traore, J. Roch, N. Maillard, T. Gautier, and J. Bernard, Dequefree work-optimal parallel STL algorithms, 2008.

A. Yarkhan, J. Kurzak, and J. Dongarra, Quark users' guide: Queueing and runtime for kernels, 2011.