StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, vol.23, issue.4, pp.187-198, 2011. ,
DOI : 10.1002/cpe.1631
URL : https://hal.archives-ouvertes.fr/inria-00384363
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, Proceedings of the 15th Euro-Par Conference, 2009. ,
DOI : 10.1109/TPDS.2003.1214317
DAGuE: A Generic Distributed DAG Engine for High Performance Computing, IEEE International Symposium on Parallel and Distributed Processing, pp.1151-1158, 2011. ,
MPC: A Unified Parallel Runtime for Clusters of NUMA Machines, Lecture Notes in Computer Science, vol.5168, pp.78-88, 2008. ,
DOI : 10.1007/978-3-540-85451-7_9
Dynamic task and data placement over NUMA architectures: An openMP runtime perspective. Evolving OpenMP in an Age of Extreme Parallelism, Lecture Notes in Computer Science, vol.5568 ,
URL : https://hal.archives-ouvertes.fr/inria-00367570
Versatile, scalable, and accurate simulation of distributed applications and platforms, Journal of Parallel and Distributed Computing, vol.74, issue.10, pp.2899-2917, 2014. ,
DOI : 10.1016/j.jpdc.2014.06.008
URL : https://hal.archives-ouvertes.fr/hal-01017319
Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-core Architectures, Euro-Par, pp.50-62, 2014. ,
DOI : 10.1007/978-3-319-09873-9_5
URL : https://hal.archives-ouvertes.fr/hal-01011633
Analyzing CUDA workloads using a detailed GPU simulator, 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp.163-174, 2009. ,
DOI : 10.1109/ISPASS.2009.4919648
Barra: A Parallel Functional Simulator for GPGPU, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp.351-360, 2010. ,
DOI : 10.1109/MASCOTS.2010.43
Multi2Sim, Proceedings of the 21st international conference on Parallel architectures and compilation techniques, PACT '12, pp.335-344, 2012. ,
DOI : 10.1145/2370816.2370865
The structural simulation toolkit, ACM SIGMETRICS Performance Evaluation Review, vol.38, issue.4, pp.37-42, 2011. ,
DOI : 10.1145/1964218.1964225
On the simulation of large-scale architectures using multiple application abstraction levels, ACM Transactions on Architecture and Code Optimization, vol.8, issue.4, p.36, 2012. ,
DOI : 10.1145/2086696.2086715
On the validity of flow-level tcp network models for grid and cloud simulations, ACM Transactions on Modeling and Computer Simulation, vol.23, issue.4, 2013. ,
DOI : 10.1145/2517448
URL : https://hal.archives-ouvertes.fr/hal-00872476
Toward Better Simulation of MPI Applications on Ethernet/TCP Networks, 4th International Workshop on Performance Modeling, Benchmarking and Simulation of HPC Systems (PMBS), 2013. ,
DOI : 10.1007/978-3-319-10214-6_8
URL : https://hal.archives-ouvertes.fr/hal-00919507
BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines, Proc. of the 18th International Parallel and Distributed Processing Symposium (IPDPS), 2004. ,
Dimemas: Predicting MPI Applications Behaviour in Grid Environments, Proc. of the Workshop on Grid Applications and Programming Tools, 2003. ,
Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures, 3rd Workshop on Highly Parallel Processing on a Chip, 2009. ,
DOI : 10.1007/978-3-642-14122-5_9
URL : https://hal.archives-ouvertes.fr/inria-00421333
An Effective Git And Org-Mode Based Workflow For Reproducible Research, ACM SIGOPS Operating Systems Review, vol.49, issue.1, pp.61-70, 2015. ,
DOI : 10.1145/2723872.2723881
URL : https://hal.archives-ouvertes.fr/hal-01112795
Performance Models for CPU-GPU Data Transfers, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp.11-20, 2014. ,
DOI : 10.1109/CCGrid.2014.16
Variations on the Histogram, Journal of Computational and Graphical Statistics, vol.18, issue.1, pp.21-31, 2009. ,
DOI : 10.1198/jcgs.2009.0002
FuPerMod: a software tool for the optimization of data-parallel applications on heterogeneous platforms, The Journal of Supercomputing, vol.17, issue.5, pp.61-69, 2014. ,
DOI : 10.1007/s11227-014-1207-9
Bridging the Gap between Performance and Bounds of Cholesky Factorization on Heterogeneous Platforms, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015. ,
DOI : 10.1109/IPDPSW.2015.35
URL : https://hal.archives-ouvertes.fr/hal-01120507
StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators, Proceedings of the 19th European Conference on Recent Advances in the Message Passing Interface (EuroMPI), pp.298-299, 2012. ,
DOI : 10.1007/978-3-642-33518-1_40
URL : https://hal.archives-ouvertes.fr/hal-00725477
Fine Granularity Sparse QR Factorization for Multicore Based Systems, Lecture Notes in Computer Science, vol.1, issue.89, pp.226-236, 2012. ,
DOI : 10.1137/0910005