Optimization via Reflection on Work Stealing in TBB, 2008 IEEE International Symposium on Parallel and Distributed Processing, pp.1-8, 2008. ,
DOI : 10.1109/IPDPS.2008.4536188
libKOMP, an Efficient OpenMP Runtime System for Both Fork-Join and Data Flow Paradigms, IWOMP, pp.102-115, 2012. ,
DOI : 10.1007/978-3-642-30961-8_8
URL : https://hal.archives-ouvertes.fr/hal-00796253
The implementation of the Cilk-5 multithreaded language, Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, ser. PLDI '98, pp.212-223, 1998. ,
Cilk: An Efficient Multithreaded Runtime System, Journal of Parallel and Distributed Computing, vol.37, issue.1, pp.55-69, 1996. ,
DOI : 10.1006/jpdc.1996.0107
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3175
Thread Scheduling for Multiprogrammed Multiprocessors, Theory of Computing Systems, vol.34, issue.2, pp.115-144, 2001. ,
DOI : 10.1007/s00224-001-0004-z
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.130.3853
Scheduling dense linear algebra operations on multicore processors, Concurrency and Computation: Practice and Experience, vol.35, issue.2, pp.15-44, 2010. ,
DOI : 10.1145/1377612.1377615
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.177.3294
Solving dense linear systems on platforms with multiple hardware accelerators, ACM SIGPLAN Notices, vol.44, issue.4, pp.121-130, 2009. ,
DOI : 10.1145/1594835.1504196
A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009. ,
DOI : 10.1016/j.parco.2008.10.002
Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations, Proc. of Euro-Par, pp.235-246, 2010. ,
DOI : 10.1007/978-3-642-15291-7_23
URL : https://hal.archives-ouvertes.fr/inria-00502448
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, Proc. of Euro- Par, pp.851-862, 2009. ,
DOI : 10.1109/TPDS.2003.1214317
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures Concurrency and Computation: Practice and Experience, Exploiting Concurrent GPU Operations for Efficient Work Stealing on Multi-GPUs Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp.187-198, 2011. ,
Athapascan-1: On-line building data flow graph in a parallel language, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192), pp.88-95, 1998. ,
DOI : 10.1109/PACT.1998.727176
Productive Programming of GPU Clusters with OmpSs, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012. ,
DOI : 10.1109/IPDPS.2012.58
A scalable framework for heterogeneous GPU-based clusters, Proceedinbgs of the 24th ACM symposium on Parallelism in algorithms and architectures, SPAA '12, pp.91-100, 2012. ,
DOI : 10.1145/2312005.2312025
On the scope of applicability of the ETF algorithm, Proc. of the 2nd International Workshop on Parallel Algorithms for Irregularly Structured Problems, ser. IRREGULAR '95, pp.159-164, 1995. ,
DOI : 10.1007/3-540-60321-2_13
SLAW: A scalable locality-aware adaptive work-stealing scheduler, Proc. of IEEE IPDPS, pp.1-12, 2010. ,
DOI : 10.1109/ipdps.2010.5470425
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.459.8069
KAAPI, Proceedings of the 2007 international workshop on Parallel symbolic computation, PASCO '07, 2007. ,
DOI : 10.1145/1278177.1278182
URL : https://hal.archives-ouvertes.fr/hal-00647474
QUARK Users' Guide: QUeueing And Runtime for Kernels, 2011. ,
Parallelizing dense and banded linear algebra libraries using SMPSs, Concurrency and Computation: Practice and Experience, vol.14, issue.7, pp.2438-2456, 2009. ,
DOI : 10.1002/cpe.1463
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.140.3457
Decentralized list scheduling, Annals of Operations Research, vol.18, issue.2, pp.1-23, 2012. ,
DOI : 10.1007/s10479-012-1149-7
URL : https://hal.archives-ouvertes.fr/hal-00796248
The data locality of work stealing, Proc. of ACM SPAA, ser. SPAA '00, pp.1-12, 2000. ,
Towards dense linear algebra for hybrid GPU accelerated manycore systems, Parallel Computing, vol.36, issue.5-6, pp.232-240, 2010. ,
DOI : 10.1016/j.parco.2009.12.005
DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.1-2, pp.37-51, 2012. ,
DOI : 10.1016/j.parco.2011.10.003
Scalable framework for mapping streaming applications onto multi-GPU systems, Proc. of the 17th ACM PPoPP'12, ser. PPoPP '12, pp.1-10, 2012. ,