A. Robison, M. Voss, and A. Kukanov, Optimization via Reflection on Work Stealing in TBB, 2008 IEEE International Symposium on Parallel and Distributed Processing, pp.1-8, 2008.
DOI : 10.1109/IPDPS.2008.4536188

F. Broquedis, T. Gautier, and V. Danjean, libKOMP, an Efficient OpenMP Runtime System for Both Fork-Join and Data Flow Paradigms, IWOMP, pp.102-115, 2012.
DOI : 10.1007/978-3-642-30961-8_8

URL : https://hal.archives-ouvertes.fr/hal-00796253

M. Frigo, C. E. Leiserson, and K. H. Randall, The implementation of the Cilk-5 multithreaded language, Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, ser. PLDI '98, pp.212-223, 1998.

R. Blumofe, C. Joerg, B. Kuszmaul, C. Leiserson, K. Randall et al., Cilk: An Efficient Multithreaded Runtime System, Journal of Parallel and Distributed Computing, vol.37, issue.1, pp.55-69, 1996.
DOI : 10.1006/jpdc.1996.0107

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3175

N. S. Arora, R. D. Blumofe, and C. G. Plaxton, Thread Scheduling for Multiprogrammed Multiprocessors, Theory of Computing Systems, vol.34, issue.2, pp.115-144, 2001.
DOI : 10.1007/s00224-001-0004-z

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.130.3853

J. Kurzak, H. Ltaief, J. Dongarra, and R. M. Badia, Scheduling dense linear algebra operations on multicore processors, Concurrency and Computation: Practice and Experience, vol.35, issue.2, pp.15-44, 2010.
DOI : 10.1145/1377612.1377615

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.177.3294

G. Quintana-ortí, F. D. Igual, E. S. Quintana-ortí, and R. A. Van-de-geijn, Solving dense linear systems on platforms with multiple hardware accelerators, ACM SIGPLAN Notices, vol.44, issue.4, pp.121-130, 2009.
DOI : 10.1145/1594835.1504196

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009.
DOI : 10.1016/j.parco.2008.10.002

E. Hermann, B. Raffin, F. C. Faure, T. Gautier, and J. Allard, Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations, Proc. of Euro-Par, pp.235-246, 2010.
DOI : 10.1007/978-3-642-15291-7_23

URL : https://hal.archives-ouvertes.fr/inria-00502448

E. Ayguadé, R. Badia, F. Igual, J. Labarta, R. Mayo et al., An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, Proc. of Euro- Par, pp.851-862, 2009.
DOI : 10.1109/TPDS.2003.1214317

C. Augonnet, S. Thibault, R. Namyst, P. Wacrenier, J. V. Lima et al., StarPU: a unified platform for task scheduling on heterogeneous multicore architectures Concurrency and Computation: Practice and Experience, Exploiting Concurrent GPU Operations for Efficient Work Stealing on Multi-GPUs Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp.187-198, 2011.

F. Galilée, J. Roch, G. G. Cavalheiro, and M. Doreille, Athapascan-1: On-line building data flow graph in a parallel language, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192), pp.88-95, 1998.
DOI : 10.1109/PACT.1998.727176

J. Bueno, J. Planas, A. Duran, R. M. Badia, X. Martorell et al., Productive Programming of GPU Clusters with OmpSs, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012.
DOI : 10.1109/IPDPS.2012.58

F. Song and J. Dongarra, A scalable framework for heterogeneous GPU-based clusters, Proceedinbgs of the 24th ACM symposium on Parallelism in algorithms and architectures, SPAA '12, pp.91-100, 2012.
DOI : 10.1145/2312005.2312025

C. Boeres, G. Chochia, and P. Thanisch, On the scope of applicability of the ETF algorithm, Proc. of the 2nd International Workshop on Parallel Algorithms for Irregularly Structured Problems, ser. IRREGULAR '95, pp.159-164, 1995.
DOI : 10.1007/3-540-60321-2_13

Y. Guo, J. Zhao, V. Cave, and V. Sarkar, SLAW: A scalable locality-aware adaptive work-stealing scheduler, Proc. of IEEE IPDPS, pp.1-12, 2010.
DOI : 10.1109/ipdps.2010.5470425

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.459.8069

T. Gautier, X. Besseron, and L. Pigeon, KAAPI, Proceedings of the 2007 international workshop on Parallel symbolic computation, PASCO '07, 2007.
DOI : 10.1145/1278177.1278182

URL : https://hal.archives-ouvertes.fr/hal-00647474

A. Yarkhan, J. Kurzak, and J. Dongarra, QUARK Users' Guide: QUeueing And Runtime for Kernels, 2011.

R. M. Badia, J. R. Herrero, J. Labarta, J. M. Pérez, E. S. Quintana-ortí et al., Parallelizing dense and banded linear algebra libraries using SMPSs, Concurrency and Computation: Practice and Experience, vol.14, issue.7, pp.2438-2456, 2009.
DOI : 10.1002/cpe.1463

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.140.3457

M. Tchiboukdjian, N. Gast, and D. Trystram, Decentralized list scheduling, Annals of Operations Research, vol.18, issue.2, pp.1-23, 2012.
DOI : 10.1007/s10479-012-1149-7

URL : https://hal.archives-ouvertes.fr/hal-00796248

U. A. Acar, G. E. Blelloch, and R. D. Blumofe, The data locality of work stealing, Proc. of ACM SPAA, ser. SPAA '00, pp.1-12, 2000.

S. Tomov, J. Dongarra, and M. Baboulin, Towards dense linear algebra for hybrid GPU accelerated manycore systems, Parallel Computing, vol.36, issue.5-6, pp.232-240, 2010.
DOI : 10.1016/j.parco.2009.12.005

G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier et al., DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.1-2, pp.37-51, 2012.
DOI : 10.1016/j.parco.2011.10.003

H. P. Huynh, A. Hagiescu, W. Wong, and R. S. Goh, Scalable framework for mapping streaming applications onto multi-GPU systems, Proc. of the 17th ACM PPoPP'12, ser. PPoPP '12, pp.1-10, 2012.