A. Umut, V. Acar, A. Aksenov, M. Charguéraud, and . Rainey, Provably and Practically Efficient Granularity Control, ???v1 Long version, 2019.

A. Umut, G. E. Acar, R. D. Blelloch, and . Blumofe, The Data Locality of Work Stealing. Theory of Computing Systems, vol.35, pp.321-347, 2002.

A. Umut, A. Acar, A. Charguéraud, M. Guatto, F. Rainey et al., Heartbeat Scheduling: Provable Efficiency for Nested Parallelism, Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp.769-782, 2018.

A. Umut, A. Acar, M. Charguéraud, and . Rainey, Scheduling Parallel Programs by Work Stealing with Private Deques, Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '13), 2013.

A. Umut, A. Acar, M. Charguéraud, and . Rainey, A work-efficient algorithm for parallel unordered depth-first search, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, vol.67, p.12, 2015.

A. Umut, A. Acar, M. Charguéraud, and . Rainey, Oracle-guided scheduling for controlling granularity in implicitly parallel languages, Journal of Functional Programming, vol.26, p.23, 2016.

A. and K. P. Gostelow, The Id Report: An Asychronous Language and Computing Machine, 1978.

L. Bergstrom, M. Fluet, M. Rainey, J. Reppy, and A. Shaw, Lazy Tree Splitting. J. Funct. Program, vol.22, pp.382-438, 2012.

G. E. Blelloch, J. T. Fineman, P. B. Gibbons, and J. Shun, Internally deterministic parallel algorithms can be fast, PPoPP '12, pp.181-192, 2012.

E. Guy, J. Blelloch, and . Greiner, A provable time and space efficient implementation of NESL, Proceedings of the 1st ACM SIGPLAN International Conference on Functional Programming, pp.213-225, 1996.

G. E. Blelloch, J. C. Hardwick, J. Sipelstein, M. Zagha, and S. Chatterjee, Implementation of a Portable Nested DataParallel Language, J. Parallel Distrib. Comput, vol.21, pp.4-14, 1994.

D. Robert, C. E. Blumofe, and . Leiserson, Scheduling multithreaded computations by work stealing, J. ACM, vol.46, pp.720-748, 1999.

M. T. Manuel, R. Chakravarty, S. L. Leshchinskiy, G. Jones, S. Keller et al., Data parallel Haskell: a status report, Proceedings of the POPL 2007 Workshop on Declarative Aspects of Multicore Programming, pp.10-18, 2007.

P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra et al., X10: an object-oriented approach to non-uniform cluster computing, Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications (OOPSLA '05), pp.519-538, 2005.

A. Duran, J. Corbalan, and E. Ayguade, An adaptive cut-off for task parallelism, 2008 SC-International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-11, 2008.

M. Feeley, A Message Passing Implementation of Lazy Task Creation, Parallel Symbolic Computing, pp.94-107, 1992.

M. Feeley, An efficient and general implementation of futures on large scale shared-memory multiprocessors, pp.93-22348, 1993.

M. Feeley, Polling efficiently on stock hardware, Proceedings of the conference on Functional programming languages and computer architecture (FPCA '93, pp.179-187, 1993.

M. Fluet, M. Rainey, J. Reppy, and A. Shaw, Implicitly threaded parallelism in Manticore, Journal of Functional Programming, vol.20, pp.1-40, 2011.

M. Frigo, C. E. Leiserson, and K. H. Randall, The Implementation of the Cilk-5 Multithreaded Language, PLDI, pp.212-223, 1998.

A. Guatto, S. Westrick, R. Raghunathan, and U. Fluet, Hierarchical Memory Management for Mutable State, ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP), 2018.

H. Robert and . Halstead, MULTILISP: a language for concurrent symbolic computation, ACM Transactions on Programming Languages and Systems, vol.7, pp.501-538, 1985.

H. Robert and . Halstead, Implementation of Multilisp: Lisp on a Multiprocessor, Proceedings of the 1984 ACM Symposium on LISP and functional Provably and Practically Efficient Granularity Control PPoPP '19, pp.9-17, 1984.

T. Hiraishi, M. Yasugi, S. Umatani, and T. Yuasa, Backtracking-based load balancing, PPoPP '09. ACM, pp.55-64, 2009.

L. Huelsbergen, J. R. Larus, and A. Aiken, Using the run-time sizes of data structures to guide parallel-thread creation, Proceedings of the 1994 ACM conference on LISP and functional programming (LFP '94, pp.79-90, 1994.

M. Shams, V. Imam, and . Sarkar, Habanero-Java library: a Java 8 framework for multicore programming, 2014 International Conference on Principles and Practices of Programming on the Java Platform Virtual Machines, Languages and Tools, PPPJ '14, pp.75-86, 2014.

. Intel, Intel Threading Building Blocks, 2011.

S. Iwasaki and K. Taura, A static cut-off for task parallel programs, Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, pp.139-150, 2016.

A. Suresh-jagannathan, . Navabi, L. Kc-sivaramakrishnan, and . Ziarek, The Design Rationale for Multi-MLton, ML '10: Proceedings of the ACM SIGPLAN Workshop on ML, 2010.

G. Keller, M. T. Manuel, R. Chakravarty, S. P. Leshchinskiy, B. Jones et al., Regular, shape-polymorphic, parallel arrays in Haskell, Proceedings of the 15th ACM SIGPLAN international conference on Functional programming (ICFP '10, pp.261-272, 2010.

D. Lea, A Java fork/join framework, Proceedings of the ACM 2000 conference on Java Grande (JAVA '00, pp.36-43, 2000.

D. Leijen, W. Schulte, and S. Burckhardt, The design of a task parallel library, Proceedings of the 24th ACM SIGPLAN conference on Object Oriented Programming Systems Languages and Applications (OOPSLA '09, pp.227-242, 2009.

P. Lopez, M. Hermenegildo, and S. Debray, A methodology for granularity-based control of parallelism in logic programs, Journal of Symbolic Computation, vol.21, pp.715-734, 1996.

E. Mohr, D. A. Kranz, and R. H. Halstead, Lazy task creation: a technique for increasing the granularity of parallel programs, IEEE Transactions on Parallel and Distributed Systems, vol.2, pp.264-280, 1991.

E. Mohr, D. A. Kranz, and R. H. Halstead, Lazy task creation: a technique for increasing the granularity of parallel programs, Conference record of the 1990 ACM Conference on Lisp and Functional Programming, pp.185-197, 1990.

, OpenMP Application Program Interface, 2008.

J. Pehoushek and J. Weening, Low-cost process creation and dynamic partitioning in Qlisp, Parallel Lisp: Languages and Systems, vol.441, pp.182-199, 1990.

R. Raghunathan, S. K. Muller, U. A. Acar, and G. Blelloch, Hierarchical Memory Management for Parallel Programs, Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming, pp.392-406, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01416237

M. Rainey, Effective Scheduling Techniques for High-Level Parallel Programming Languages, 2010.

D. Sanchez, R. M. Yoo, and C. Kozyrakis, Flexible architectural support for fine-grain scheduling, Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems (ASPLOS '10), pp.311-322, 2010.

A. Tzannes, G. C. Caragea, U. Vishkin, and R. Barua, Lazy Scheduling: A Runtime Adaptive Scheduler for Declarative Parallelism, TOPLAS, vol.36, issue.10, 2014.

J. S. Weening, Parallel Execution of Lisp Programs. Ph.D. Dissertation, 1989.

R. Zadeh, Overview, Models of Computation, Brentâ??s Theorem, 2017.