C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, vol.23, issue.4, pp.187-198, 2011.
DOI : 10.1002/cpe.1631

URL : https://hal.archives-ouvertes.fr/inria-00384363

E. Ayguadé, R. Badia, F. Igual, J. Labarta, R. Mayo et al., An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, Proceedings of the 15th Euro-Par Conference, 2009.
DOI : 10.1109/TPDS.2003.1214317

G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier et al., DAGuE: A Generic Distributed DAG Engine for High Performance Computing, IEEE International Symposium on Parallel and Distributed Processing, pp.1151-1158, 2011.

M. Pérache, H. Jourdren, and R. Namyst, MPC: A Unified Parallel Runtime for Clusters of NUMA Machines, Lecture Notes in Computer Science, vol.5168, pp.78-88, 2008.
DOI : 10.1007/978-3-540-85451-7_9

F. Broquedis, N. Furmento, B. Goglin, R. Namyst, and P. Wacrenier, Dynamic task and data placement over NUMA architectures: An openMP runtime perspective. Evolving OpenMP in an Age of Extreme Parallelism, Lecture Notes in Computer Science, vol.5568
URL : https://hal.archives-ouvertes.fr/inria-00367570

H. Casanova, A. Giersch, A. Legrand, M. Quinson, and F. Suter, Versatile, scalable, and accurate simulation of distributed applications and platforms, Journal of Parallel and Distributed Computing, vol.74, issue.10, pp.2899-2917, 2014.
DOI : 10.1016/j.jpdc.2014.06.008

URL : https://hal.archives-ouvertes.fr/hal-01017319

L. Stanisic, S. Thibault, A. Legrand, B. Videau, and J. Méhaut, Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-core Architectures, Euro-Par, pp.50-62, 2014.
DOI : 10.1007/978-3-319-09873-9_5

URL : https://hal.archives-ouvertes.fr/hal-01011633

A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt, Analyzing CUDA workloads using a detailed GPU simulator, 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp.163-174, 2009.
DOI : 10.1109/ISPASS.2009.4919648

S. Collange, M. Daumas, D. Defour, and D. Parello, Barra: A Parallel Functional Simulator for GPGPU, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp.351-360, 2010.
DOI : 10.1109/MASCOTS.2010.43

R. Ubal, B. Jang, P. Mistry, D. Schaa, and D. Kaeli, Multi2Sim, Proceedings of the 21st international conference on Parallel architectures and compilation techniques, PACT '12, pp.335-344, 2012.
DOI : 10.1145/2370816.2370865

A. Rodrigues, K. Hemmert, B. Barrett, C. Kersey, R. Oldfield et al., The structural simulation toolkit, ACM SIGMETRICS Performance Evaluation Review, vol.38, issue.4, pp.37-42, 2011.
DOI : 10.1145/1964218.1964225

A. Rico, F. Cabarcas, C. Villavieja, M. Pavlovic, A. Vega et al., On the simulation of large-scale architectures using multiple application abstraction levels, ACM Transactions on Architecture and Code Optimization, vol.8, issue.4, p.36, 2012.
DOI : 10.1145/2086696.2086715

P. Velho, L. Schnorr, H. Casanova, and A. Legrand, On the validity of flow-level tcp network models for grid and cloud simulations, ACM Transactions on Modeling and Computer Simulation, vol.23, issue.4, 2013.
DOI : 10.1145/2517448

URL : https://hal.archives-ouvertes.fr/hal-00872476

P. Bedaride, A. Degomme, S. Genaud, A. Legrand, G. Markomanolis et al., Toward Better Simulation of MPI Applications on Ethernet/TCP Networks, 4th International Workshop on Performance Modeling, Benchmarking and Simulation of HPC Systems (PMBS), 2013.
DOI : 10.1007/978-3-319-10214-6_8

URL : https://hal.archives-ouvertes.fr/hal-00919507

G. Zheng, G. Kakulapati, and L. Kalé, BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines, Proc. of the 18th International Parallel and Distributed Processing Symposium (IPDPS), 2004.

R. Badia, J. Labarta, J. Giménez, and F. Escalé, Dimemas: Predicting MPI Applications Behaviour in Grid Environments, Proc. of the Workshop on Grid Applications and Programming Tools, 2003.

C. Augonnet, S. Thibault, and R. Namyst, Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures, 3rd Workshop on Highly Parallel Processing on a Chip, 2009.
DOI : 10.1007/978-3-642-14122-5_9

URL : https://hal.archives-ouvertes.fr/inria-00421333

L. Stanisic, A. Legrand, and V. Danjean, An Effective Git And Org-Mode Based Workflow For Reproducible Research, ACM SIGOPS Operating Systems Review, vol.49, issue.1, pp.61-70, 2015.
DOI : 10.1145/2723872.2723881

URL : https://hal.archives-ouvertes.fr/hal-01112795

B. Van-werkhoven, J. Maassen, F. Seinstra, and H. Bal, Performance Models for CPU-GPU Data Transfers, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp.11-20, 2014.
DOI : 10.1109/CCGrid.2014.16

L. Denby and C. Mallows, Variations on the Histogram, Journal of Computational and Graphical Statistics, vol.18, issue.1, pp.21-31, 2009.
DOI : 10.1198/jcgs.2009.0002

D. Clarke, Z. Zhong, V. Rychkov, and A. Lastovetsky, FuPerMod: a software tool for the optimization of data-parallel applications on heterogeneous platforms, The Journal of Supercomputing, vol.17, issue.5, pp.61-69, 2014.
DOI : 10.1007/s11227-014-1207-9

E. Agullo, O. Beaumont, L. Eyraud-dubois, J. Herrmann, S. Kumar et al., Bridging the Gap between Performance and Bounds of Cholesky Factorization on Heterogeneous Platforms, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015.
DOI : 10.1109/IPDPSW.2015.35

URL : https://hal.archives-ouvertes.fr/hal-01120507

C. Augonnet, O. Aumage, N. Furmento, R. Namyst, and S. Thibault, StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators, Proceedings of the 19th European Conference on Recent Advances in the Message Passing Interface (EuroMPI), pp.298-299, 2012.
DOI : 10.1007/978-3-642-33518-1_40

URL : https://hal.archives-ouvertes.fr/hal-00725477

A. Buttari, Fine Granularity Sparse QR Factorization for Multicore Based Systems, Lecture Notes in Computer Science, vol.1, issue.89, pp.226-236, 2012.
DOI : 10.1137/0910005