Toward rapid understanding of production hpc applications and systems, 2015 IEEE International Conference on Cluster Computing, pp.464-473, 2015. ,
Reservation strategies for stochastic jobs, IEEE IPDPS, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-01968419
Local adaptive mesh refinement for shock hydrodynamics, J. Comput. Phys, vol.82, issue.1, pp.64-84, 1989. ,
How to achieve minimax expected kullback-leibler distance from an unknown finite distribution, International Conference on Algorithmic Learning Theory, pp.380-394, 2002. ,
Bernstein polynomials and learning theory, Journal of Approximation Theory, vol.128, issue.2, pp.187-206, 2004. ,
A batch scheduler with high level components, CCGrid, pp.776-783, 2005. ,
URL : https://hal.archives-ouvertes.fr/hal-00005106
High-frequency simulations of global seismic wave propagation using specfem3d globe on 62k processors, SC '08, 2008. ,
A performance prediction framework for scientific applications, Future Generation Computer Systems, vol.22, issue.3, pp.336-346, 2006. ,
Learning mixtures of structured distributions over discrete domains, ACM-SIAM SODA, pp.1380-1394, 2013. ,
ScheduleFlow: A simulator for HPC schedulers, vol.19, 2019. ,
Comprehensive resource use monitoring for hpc systems with tacc stats, 2014 First International Workshop on HPC User Support Tools, pp.13-21, 2014. ,
Speculative scheduling for stochastic hpc applications, ICPP, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02158598
Improving backfilling by using machine learning to predict running times, SC'15, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01221186
Mesos: A platform for fine-grained resource sharing in the data center, the 8th USENIX Conference on Networked Systems Design and Implementation, pp.295-308, 2011. ,
On learning distributions from their samples, Conference on Learning Theory, pp.1066-1100, 2015. ,
, Medical-image Analysis and Statistical Interpretation (MASI) Lab
A machine learning approach for predicting execution time of spark jobs, Alexandria Engineering Journal, vol.57, issue.4, pp.3767-3778, 2018. ,
Torque resource manager, SC'06, SC '06, 2006. ,
Backfilling using systemgenerated predictions rather than user runtime estimates, IEEE Transactions on Parallel and Distributed Systems, vol.18, issue.6, pp.789-803, 2007. ,
Diagnosing performance variations in hpc applications using machine learning, High Performance Computing, pp.355-373, 2017. ,
Large-scale cluster management at google with borg, Proceedings of the Tenth European Conference on Computer Systems, EuroSys '15, vol.18, pp.1-18, 2015. ,
Rethinking high performance computing platforms: Challenges, opportunities and recommendations, ACM DIDC '16, pp.19-26, 2016. ,
Cross-platform performance prediction of parallel applications using partial execution, SC'05, p.40, 2005. ,
Slurm: Simple linux utility for resource management, JSSPP, pp.44-60, 2003. ,