A. Umut, A. Acar, M. Charguéraud, and . Rainey, Scheduling parallel programs by work stealing with private deques, Max Planck Institute for Software Systems, 2013.

S. Nimar, R. D. Arora, C. G. Blumofe, and . Plaxton, Thread scheduling for multiprogrammed multiprocessors, Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures , SPAA '98, pp.119-129, 1998.

P. Berenbrink, T. Friedetzky, and L. A. Goldberg, The Natural Work-Stealing Algorithm is Stable, SIAM Journal on Computing, vol.32, issue.5, pp.1260-1279, 2003.
DOI : 10.1137/S0097539701399551

E. Guy, J. T. Blelloch, P. B. Fineman, J. Gibbons, and . Shun, Internally deterministic parallel algorithms can be fast, Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP '12, pp.181-192, 2012.

E. Guy, J. Blelloch, and . Greiner, A provable time and space efficient implementation of NESL, Proceedings of the 1st ACM SIGPLAN International Conference on Functional Programming, pp.213-225, 1996.

R. D. Blumofe and C. E. Leiserson, Scheduling multithreaded computations by work stealing, Foundations of Computer Science IEEE Annual Symposium on, vol.0, pp.356-368, 1994.

R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall et al., Cilk: an efficient multithreaded runtime system, ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP), pp.207-216, 1995.

D. Robert, C. E. Blumofe, and . Leiserson, Scheduling multithreaded computations by work stealing, J. ACM, vol.46, pp.720-748, 1999.

F. , W. Burton, and M. R. Sleep, Executing functional programs on a virtual tree of processors, Functional Programming Languages and Computer Architecture (FPCA '81), pp.187-194, 1981.

D. Chase and Y. Lev, Dynamic circular work-stealing deque, Proceedings of the 17th annual ACM symposium on Parallelism in algorithms and architectures , SPAA'05, pp.21-28, 2005.
DOI : 10.1145/1073970.1073974

D. Chase and Y. Lev, Dynamic circular work-stealing deque, Proceedings of the 17th annual ACM symposium on Parallelism in algorithms and architectures , SPAA'05, 2005.
DOI : 10.1145/1073970.1073974

G. Cong, B. Sreedhar, S. Kodali, D. Krishnamoorthy, V. A. Lea et al., Solving Large, Irregular Graph Problems Using Adaptive Work-Stealing, 2008 37th International Conference on Parallel Processing, pp.536-545, 2008.
DOI : 10.1109/ICPP.2008.88

P. Sivarama and . Dandamudi, The effect of scheduling discipline on dynamic load sharing in heterogeneous distributed systems, Modeling, Analysis, and Simulation of Computer Systems, International Symposium on, p.17, 1997.

J. Dinan, S. Olivier, G. Sabin, J. Prins, P. Sadayappan et al., Dynamic Load Balancing of Unbalanced Computations Using Message Passing, 2007 IEEE International Parallel and Distributed Processing Symposium, pp.1-8, 2007.
DOI : 10.1109/IPDPS.2007.370581

J. Dinan, D. B. Larkins, P. Sadayappan, S. Krishnamoorthy, and J. Nieplocha, Scalable work stealing, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pp.1-5311, 2009.
DOI : 10.1145/1654059.1654113

D. L. Eager, E. D. Lazowska, and J. Zahorjan, A comparison of receiver-initiated and sender-initiated adaptive load sharing, Performance Evaluation, vol.6, issue.1, pp.53-68, 1986.
DOI : 10.1016/0166-5316(86)90008-8

M. Feeley, A message passing implementation of lazy task creation, Parallel Symbolic Computing, pp.94-107, 1992.
DOI : 10.1007/BFb0018649

M. Feeley, An efficient and general implementation of futures on large scale shared-memory multiprocessors, UMI Order, pp.93-22348, 1993.

M. Feeley, Polling efficiently on stock hardware, Proceedings of the conference on Functional programming languages and computer architecture , FPCA '93, pp.179-187, 1993.
DOI : 10.1145/165180.165205

M. Fluet, M. Rainey, J. Reppy, and A. Shaw, Implicitly threaded parallelism in Manticore, Journal of Functional Programming, vol.20, pp.5-61, 2011.

M. Frigo, C. E. Leiserson, and K. H. Randall, The implementation of the Cilk-5 multithreaded language, PLDI, pp.212-223, 1998.

D. Grove, O. Tardieu, D. Cunningham, B. Herta, I. Peshansky et al., A performance model for X10 applications, Proceedings of the 2011 ACM SIGPLAN X10 Workshop on, X10 '11, 2011.
DOI : 10.1145/2212736.2212737

R. H. Halstead and J. , Implementation of multilisp, Proceedings of the 1984 ACM Symposium on LISP and functional programming , LFP '84, pp.9-17, 1984.
DOI : 10.1145/800055.802017

D. Hendler, Y. Lev, M. Moir, and N. Shavit, A dynamic-sized nonblocking work stealing deque, Distributed Computing, vol.43, issue.3, pp.189-207, 2006.
DOI : 10.1007/s00446-005-0144-5

D. Hendler and N. Shavit, Non-blocking steal-half work queues, Proceedings of the twenty-first annual symposium on Principles of distributed computing , PODC '02, pp.280-289, 2002.
DOI : 10.1145/571825.571876

D. Hendler and N. Shavit, Work dealing, Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures , SPAA '02, pp.164-172, 2002.
DOI : 10.1145/564870.564900

T. Hiraishi, M. Yasugi, S. Umatani, and T. Yuasa, Backtracking-based load balancing, ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.55-64, 2009.

C. Intel and . Plus, com/en-us/ articles/intel-cilk-plus/. [29] Intel. Intel Xeon Processor X7550. Specifications at http://ark.intel.com/products, Processor-X7550-(18M-Cache-2_ 00-GHz-6_40-GTs-Intel-QPI)

G. Keller, M. M. Chakravarty, R. Leshchinskiy, S. P. Jones, and B. Lippmeier, Regular, shape-polymorphic, parallel arrays in haskell, Proceedings of the 15th ACM SIG- PLAN international conference on Functional programming, pp.261-272, 2010.