E. Agullo, O. Aumage, B. Bramas, O. Coulaud, and S. Pitoiset, Bridging the gap between OpenMP and task-based runtime systems for the fast multipole method, IEEE Transactions on Parallel and Distributed Systems, vol.28, issue.10, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01517153

E. Agullo, B. Bramas, O. Coulaud, E. Darve, M. Messner et al., Task-based FMM for heterogeneous architectures, Concurrency and Computation: Practice and Experience, vol.28, issue.9, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01359458

E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, Task-based multifrontal QR solver for GPU-accelerated multicore architectures, IEEE 22nd international conference on high performance computing (HiPC). Piscataway: IEEE, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01166312

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, vol.23, issue.2, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00550877

M. Bauer, S. Treichler, E. Slaughter, and A. Aiken, Legion: expressing locality and independence with logical regions, International conference on high performance computing, networking, storage and analysis, p.66, 2012.

B. Bramas, Optimization and parallelization of the boundary element method for the wave equation in time domain, 2016.
URL : https://hal.archives-ouvertes.fr/tel-01306571

B. Bramas, Impact study of data locality on task-based applications through the Heteroprio scheduler, PeerJ Computer Science, vol.5, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02120736

B. Bramas, Increasing the degree of parallelism using speculative execution in task-based runtime systems, PeerJ Computer Science, vol.5, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02070576

J. Carpaye, J. Roman, and P. Brenner, Design and analysis of a task-based parallelization over a runtime system of an explicit finite-volume CFD code with adaptive time stepping, Journal of Computational Science, vol.28, pp.439-454, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01507613

J. Cong, Z. Li, and R. Bagrodia, Acyclic multi-way partitioning of boolean networks, design automation conference, p.31, 1994.

D. Coulette, E. Franck, P. Helluy, M. Mehrenberger, and L. Navoret, High-order implicit palindromic discontinuous Galerkin method for kinetic-relaxation approximation, Comput. & Fluids, vol.190, pp.485-502, 2019.
URL : https://hal.archives-ouvertes.fr/hal-01706614

A. Danalis, G. Bosilca, A. Bouteiller, T. Herault, and J. Dongarra, PTG: an abstraction for unhindered parallelism, Proceedings of the fourth international workshop on domain-specific languages and high-level frameworks for high performance computing, (WOLFHPC), pp.21-30, 2014.

C. M. Fiduccia and R. M. Mattheyses, A linear-time heuristic for improving network partitions, 19th design automation conference, 1982.

T. Gautier, J. Lima, N. Maillard, and B. Raffin, XKaapi: a runtime system for data-flow task programming on heterogeneous architectures, 2013 IEEE 27th international symposium on parallel & distributed processing (IPDPS), pp.1299-1308, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00799904

S. Grauer-gray, L. Xu, R. Searles, S. Ayalasomayajula, and J. Cavazos, Auto-tuning a high-level language targeted to GPU codes, 2012 innovative parallel computing (InPar, 2012.

B. Hendrickson and T. G. Kolda, Graph partitioning models for parallel computing, Parallel Computing, vol.26, issue.12, 2000.

B. Hendrickson and R. Leland, A multi-level algorithm for partitioning graphs, Supercomputing '95:proceedings of the 1995 ACM/IEEE conference on supercomputing. Piscataway: IEEE, 1995.

J. Herrmann, J. Kho, B. Uçar, K. Kaya, and U. Catalyurek, Acyclic partitioning of large directed acyclic graphs, 2017 17th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGRID). Piscataway: IEEE, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01672010

D. S. Johnson and M. R. Garey, Computers and intractability: a guide to the theory of NPcompleteness, 1979.

G. Karypis, R. Aggarwa, V. Kumar, and S. Shekhar, Multilevel hypergraph partitioning: applications in VLSI domain, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.7, issue.1, 1999.

G. Karypis and V. Kumar, A fast and high quality multilevel scheme for partitioning irregular graphs, SIAM Journal on Scientific Computing, vol.20, issue.1, 1998.

B. Kernighan and S. Lin, An efficient heuristic procedure for partitioning graphs, The Bell System Technical Journal, vol.49, issue.2, 1970.

B. W. Kernighan, Optimal sequential partitions of graphs, Journal of the ACM, vol.18, issue.1, 1971.

G. Kestor, R. Gioiosa, and D. Chavarra-miranda, Prometheus: scalable and accurate emulation of task-based applications on many-core systems, 2015 IEEE international symposium on performance analysis of systems and software (ISPASS). Piscataway: IEEE, 2015.

S. Moustafa, W. Kirschenmann, F. Dupros, and H. Aochi, Task-based programming on emerging parallel architectures for finite-differences seismic numerical kernel, Euro-Par, pp.764-777, 2018.

M. Myllykoski and C. Mikkelsen, Introduction to StarNEig-a task-based library for solving nonsymmetric eigenvalue problems, OpenMP Architecture Review Board, 2013.

J. M. Perez, R. M. Badia, and J. Labarta, A dependency-aware task-based programming environment for multi-core architectures, 2008 IEEE international conference on cluster computing, pp.142-151, 2008.

A. Pothen and L. F. Alvarado, A fast reordering algorithm for parallel sparse triangular solution, SIAM Journal on Scientific and Statistical Computing, vol.13, issue.2, 1992.

K. Purna and D. Bhatia, Temporal partitioning and scheduling data flow graphs for reconfigurable computers, IEEE Transactions on Computers, vol.48, issue.6, 1999.

C. Rossignon, Un modéle de programmation á grain fin pour la parallélisation de solveurs linéaires creux, 2015.

C. Rossignon, H. Pascal, O. Aumage, and S. Thibault, A numa-aware fine grain parallelization framework for multi-core architecture, 2013 IEEE international symposium on parallel distributed processing, workshops and Phd forum, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00858350

V. Sarkar, Partitioning and scheduling parallel programs for multiprocessors, 1989.

V. Sarkar and J. Hennessy, Partitioning parallel programs for macro-dataflow, 1986.

S. E. Schaeffer, Survey: graph clustering, Computer Science Review, vol.1, issue.1, 2007.

J. Shun, F. Roosta-khorasani, K. Fountoulakis, and M. W. Mahoney, Parallel local graph clustering, 2016.

D. Sukkari, H. Ltaief, M. Faverge, and D. Keyes, Asynchronous task-based polar decomposition on single node manycore architectures, IEEE Transactions on Parallel and Distributed Systems, vol.29, issue.2, pp.312-323, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01585079

F. Suter, DAGGEN: a synthethic task graph generator, 2013.