Green500 list URL: https://www.green500. org/lists, 2017. ,
URL: http://www.aics.riken, 2017. ,
The Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications, pp.154-165, 2014. ,
LU Factorization for Accelerator-based Systems, pp.217-224, 2011. ,
QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, IPDPS. IEEE, vol.20, pp.932-943, 2011. ,
Characterizing Node Orderings for Improved Performance, pp.1-6, 2015. ,
Opportunities and Challenges of Exascale Computing URL: https : / / science . energy . gov, Tech. rep. U.S. Department of Energy, 2010. ,
Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures, Euro-Par Workshops. Lecture Notes in Computer Science, vol.6043, pp.56-65, 2009. ,
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience 23, pp.21-187, 2011. ,
Some models for scheduling parallel programs with communication delays, Discrete Applied Mathematics, vol.72, issue.196, pp.5-24, 1997. ,
There Goes the Neighborhood: Performance Degradation due to Nearby Jobs, Handbook on Scheduling: From Theory to Applications. International Handbooks on Information Systems, pp.1-41, 2007. ,
Scheduling Data Flow Program in XKaapi: A New Affinity Based Algorithm for Heterogeneous Architectures, pp.Euro-Par ,
Scheduling independent tasks on multi-cores with GPU accelerators, Concurrency and Computation: Practice and Experience 27, pp.1625-1638, 2015. ,
Scheduling Independent Moldable Tasks on Multi-Cores with GPUs, IEEE Transactions on Parallel and Distributed Systems, vol.28, issue.9, pp.2689-2702, 2017. ,
DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.1, pp.37-51, 2012. ,
A Fast 5/2-Approximation Algorithm for Hierarchical Scheduling, Euro-Par Lecture Notes in Computer Science, vol.6271, issue.1, pp.157-167, 2010. ,
A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space, IEEE Transactions on Computers, vol.596, pp.808-821, 2010. ,
Approximation Algorithms for Multiple Strip Packing and Scheduling Parallel Jobs in Platforms, Discrete Mathematics, Algorithms and Applications, pp.553-586, 2011. ,
The Parallel Evaluation of General Arithmetic Expressions, Journal of the ACM, vol.21, issue.2, pp.201-206, 1974. ,
Scheduling Algorithms. Fifth Edition, 2007. ,
Productive Programming of GPU Clusters with OmpSs, pp.557-568, 2012. ,
A class of parallel tiled linear algebra algorithms for multicore architectures Scheduling Unrelated Machines of Few Different Types URL: https, Parallel Computing, vol.35, issue.20, pp.38-53, 2009. ,
Understanding and Improving Computational Science Storage Access through Continuous Characterization, In: ACM Transactions on Storage, vol.7, issue.81, p.77, 2011. ,
Considering Time in Designing Large-Scale Systems for Scientific Computing, pp.1533-1545, 2016. ,
Performance Bounds for Level-Oriented Two- Dimensional Packing Algorithms, In: SIAM Journal on Computing, vol.9, issue.4, pp.808-826, 1980. ,
LogP: Towards a Realistic Model of Parallel Computation, pp.1-12, 1993. ,
Online Scheduling on a CPU-GPU Cluster, In: TAMC. Lecture Notes in Computer Science, vol.7876, pp.1-9, 2013. ,
Exploiting Geometric Partitioning in Task Mapping for Parallel Computers, pp.27-36, 2014. ,
Scheduling Parallel Tasks Approximation Algorithms In: Handbook of Scheduling: Algorithms , Models, and Performance Analysis, Computer & Information Science Series. Chapman and Hall/CRC, 2004. ,
The International Exascale Software Project roadmap, International Journal of High Performance Computing Applications, vol.251, pp.3-60, 2011. ,
Using Formal Grammars to Predict I/O Behaviors in HPC: The Omnisc'IO Approach, IEEE Transactions on Parallel and Distributed Systems, vol.278, pp.2435-2449, 2016. ,
Scheduling for Parallel Processing Computer Communications and Networks, 2009. ,
Fast parallel sorting under LogP: experience with the CM-5, IEEE Transactions on Parallel and Distributed Systems, pp.791-805, 1996. ,
DOI : 10.1109/71.532111
Understanding Application and System Performance Through System-Wide Monitoring, IPDPS Workshops. IEEE, pp.1702-1710, 2016. ,
Topology-Aware Job Scheduling Strategies for Torus Networks In: Cray User Group URL: https://cug.org/proceedings, pp.74-77, 2014. ,
Théorie et pratique de l'ordonnancement d'applications sur les systèmes distribués, 2006. ,
An effective approximation algorithm for the Malleable Parallel Task Scheduling problem, In: Journal of Parallel and Distributed Computing, vol.725, pp.693-704, 2012. ,
Theory and Practice in Parallel Job Scheduling, In: JSSPP. Lecture Notes in Computer Science, vol.1291, pp.1-34, 1997. ,
Exploiting Concurrent GPU Operations for Efficient Work Stealing on Multi-GPUs, pp.75-82, 2012. ,
Tighter Bounds for LPT Scheduling on Uniform Processors, In: SIAM Journal on Computing, vol.163, pp.554-560, 1987. ,
Parallelism in Random Access Machines, pp.114-118, 1978. ,
Scheduling the I/O of HPC Applications Under Congestion, IPDPS. IEEE, pp.1013-1022, 2015. ,
XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures, pp.1299-1308, 2013. ,
KAAPI: A thread scheduling runtime system for data flow computations on cluster of multiprocessors, In: PASCO. ACM, pp.15-23, 2007. ,
Topology-aware Resource Management for HPC Applications, pp.1-17, 2017. ,
Contributions for Resource and Job Management in High Performance Computing URL: https, 2010. ,
Algorithms for Compile-Time Memory Optimization URL: https, In: SODA. ACM/SIAM, pp.907-908, 1999. ,
Bounds for Multiprocessor Scheduling with Resource Constraints, In: SIAM Journal on Computing, vol.4, issue.2, 1975. ,
Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979. | cit, pp.17-81 ,
Optimization and Approximation in Deterministic Sequencing and Scheduling: a Survey, Annals of Discrete Mathematics, vol.52, issue.08, pp.287-326, 1979. ,
Bounds on Multiprocessing Timing Anomalies, In: SIAM Journal on Applied Mathematics, vol.17, issue.2, pp.416-429, 1969. ,
Reproducible MPI Benchmarking is Still Not as Easy as You Think, IEEE Transactions on Parallel and Distributed Systems, vol.2712, pp.3617-3630, 2016. ,
Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations, Euro-Par Lecture Notes in Computer Science, vol.6272, issue.2, pp.235-246, 2010. ,
Ueber die stetige Abbildung einer Line auf ein Flächenstück, Mathematische Annalen, vol.383, pp.459-460, 1891. ,
Using Dual Approximation Algorithms for Scheduling Problems: Theoretical and Practical Results, Journal of the ACM, vol.34, issue.23, pp.144-162, 1987. ,
A Polynomial Approximation Scheme for Scheduling on Uniform Processors: Using the Dual Approximation Approach, In: SIAM Journal on Computing, vol.173, pp.539-551, 1988. ,
CLARISSE: A Middleware for Data-Staging Coordination and Control on Large-Scale HPC Platforms, pp.346-355, 2016. ,
Scheduling Problems on Two Sets of Identical Machines, pp.277-294, 2003. ,
Partitioning Low-diameter Networks to Eliminate Inter-job Interference, pp.439-448, 2017. ,
Linear-time Approximation Schemes for Scheduling Malleable Parallel Tasks URL: https, In: SODA. ACM/SIAM, pp.490-498, 1999. ,
Cost-Effective Diameter-Two Topologies: Analysis and Evaluation Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU, pp.1-36, 2010. ,
Processor Allocation on Cplant: Achieving General Processor Locality Using One-Dimensional Allocation Strategies, pp.296-304, 2002. ,
Approximation Algorithms for Scheduling Unrelated Parallel Machines, In: Mathematical Programming, vol.46, issue.1, pp.259-271, 1990. ,
Scheduling Malleable and Nonmalleable Parallel Tasks URL: https, pp.167-176, 1994. ,
Contiguity and Locality in Backfilling Scheduling, pp.586-595, 2015. ,
Scheduling for new computing platforms with GPUs URL: https, 2014. ,
A computer Oriented Geodetic Data Base; and a New Technique in File Sequencing URL: https, Tech. rep. IBM Ltd, p.72, 1966. ,
A 3/2-Approximation Algorithm for Scheduling Independent Monotonic Malleable Tasks, Solving very large instances of the scheduling of independent tasks problem on the GPU, pp.401-412, 2007. ,
Application-aware metrics for partition selection in cube-shaped topologies, Parallel Computing, vol.405, pp.129-139, 2014. ,
Adapting a message-driven parallel application to GPU-accelerated clusters, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, 2008. ,
DOI : 10.1109/SC.2008.5214716
URL : http://mc.stanford.edu/cgi-bin/images/8/8a/SC08_NAMD.pdf
A PTAS for Assigning Sporadic Tasks on Two-type Heterogeneous Multiprocessors, pp.117-126, 2012. ,
Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems, pp.1-11, 2010. ,
An approximation algorithm for the generalized assignment problem, In: Mathematical ProgrammingFeb, vol.62, issue.1, pp.461-474, 1993. ,
Enabling and Scaling Matrix Computations on Heterogeneous Multi-Core and Multi-GPU Systems, ICS. ACM, pp.365-376, 2012. ,
A Strip-Packing Algorithm with Absolute Performance Bound 2, In: SIAM Journal on Computing, vol.26, issue.2, pp.401-409, 1997. ,
An optimal rounding gives a better approximation for scheduling unrelated machines, Operations Research Letters, vol.33, issue.2, pp.127-133, 2005. ,
On the existence of schedules that are nearoptimal for both makespan and total weighted completion time, Operations Research Letters, vol.21397, pp.115-122, 1997. ,
Towards dense linear algebra for hybrid GPU accelerated manycore systems Topology-Aware Data Aggregation for Intensive I/O on Large-Scale Supercomputers, Parallel Computing In: COMHPC@SC. IEEE, vol.36, issue.20, pp.232-240, 2010. ,
Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing, IEEE Transactions on Parallel and Distributed Systems, vol.133, pp.260-274, 2002. ,
PaCMap: Topology Mapping of Unstructured Communication Patterns onto Non-contiguous Allocations, ICS. ACM, pp.37-46, 2015. ,
Approximate Algorithms for Scheduling Parallelizable Tasks, pp.323-332, 1992. ,
A Bridging Model for Parallel Computation, Communications of the ACM, vol.338, issue.7 8, 1990. ,
Guide: QUeueing And Runtime for Kernels. Tech. rep. ICL-UT-11-02, p.30, 2011. ,
94 A11 List of Tables 3.1 Parameter settings used to generate scheduling instances . . . . . . 57 3.2 HEFT-like heuristics used for comparison, p.61 ,