Scheduling Parallel Programs by Work Stealing with Private Deques

Umut Acar 1 Arthur Charguéraud 2, 3 Mike Rainey 4
3 TOCCATA - Certified Programs, Certified Tools, Certified Floating-Point Computations
LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France, CNRS - Centre National de la Recherche Scientifique : UMR8623
Abstract : Work stealing has proven to be an effective method for scheduling parallel programs on multicore computers. To achieve high performance, work stealing distributes tasks between concurrent queues, called deques, which are assigned to each processor. Each processor operates on its deque locally except when performing load balancing via steals. Unfortunately, concurrent deques suffer from two limitations: 1) local deque operations require expensive memory fences in modern weak-memory architectures, 2) they can be very difficult to extend to support various optimizations and flexible forms of task distribution strategies needed many applications, e.g., those that do not fit nicely into the divide-and-conquer, nested data parallel paradigm. For these reasons, there has been a lot recent interest in implementations of work stealing with non-concurrent deques, where deques remain entirely private to each processor and load balancing is performed via message passing. Private deques eliminate the need for memory fences from local operations and enable the design and implementation of efficient techniques for reducing task-creation overheads and improving task distribution. These advantages, however, come at the cost of communication. It is not known whether work stealing with private deques enjoys the theoretical guarantees of concurrent deques and whether they can be effective in practice. In this paper, we propose two work-stealing algorithms with private deques and prove that the algorithms guarantee similar theoretical bounds as work stealing with concurrent deques. For the analysis, we use a probabilistic model and consider a new parameter, the branching depth of the computation. We present an implementation of the algorithm as a C++ library and show that it compares well to Cilk on a range of benchmarks. Since our approach relies on private deques, it enables implementing flexible task creation and distribution strategies. As a specific example, we show how to implement task coalescing and steal-half strategies, which can be important in fine-grain, non-divide-and-conquer algorithms such as graph algorithms, and apply them to the depth-first-search problem.
Type de document :
Communication dans un congrès
PPOPP - 18th ACM SIGPLAN symposium on Principles and practice of parallel programming, Feb 2013, Shenzhen, China. ACM New York, NY, USA, pp.219-228, 2013, Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Liste complète des métadonnées

Littérature citée [29 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00863028
Contributeur : Arthur Charguéraud <>
Soumis le : mercredi 18 septembre 2013 - 10:25:12
Dernière modification le : vendredi 29 septembre 2017 - 11:26:09
Document(s) archivé(s) le : vendredi 20 décembre 2013 - 14:36:27

Fichier

full.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00863028, version 1

Citation

Umut Acar, Arthur Charguéraud, Mike Rainey. Scheduling Parallel Programs by Work Stealing with Private Deques. PPOPP - 18th ACM SIGPLAN symposium on Principles and practice of parallel programming, Feb 2013, Shenzhen, China. ACM New York, NY, USA, pp.219-228, 2013, Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming. 〈hal-00863028〉

Partager

Métriques

Consultations de la notice

801

Téléchargements de fichiers

475