Skip to Main content Skip to Navigation
Journal articles

Static strategies for worksharing with unrecoverable interruptions

Abstract : One has a large workload that is "divisible"--its constituent work's granularity can be adjusted arbitrarily--and one has access to p remote worker computers that can assist in computing the workload. How can one best utilize the workers? Complicating this question is the fact that each worker is subject to interruptions (of known likelihood) that kill all work in progress on it. One wishes to orchestrate sharing the workload with the workers in a way that maximizes the expected amount of work completed. Strategies are presented for achieving this goal, by balancing the desire to checkpoint often--thereby decreasing the amount of vulnerable work at any point--vs. the desire to avoid the context-switching required to checkpoint. Schedules must also temper the desire to replicate work, because such replication diminishes the effective remote workforce. The current study demonstrates the accessibility of strategies that provably maximize the expected amount of work when there is only one worker (the case p=1) and, at least in an asymptotic sense, when there are two workers (the case p=2); but the study strongly suggests the intractability of exact maximization for p≥2 computers, as work replication on multiple workers joins checkpointing as a vehicle for decreasing the impact of work-killing interruptions. We respond to that challenge by developing efficient heuristics that employ both checkpointing and work replication as mechanisms for decreasing the impact of work-killing interruptions. The quality of these heuristics, in expected amount of work completed, is assessed through exhaustive simulations that use both idealized models and actual trace data.
Complete list of metadata

Cited literature [31 references]  Display  Hide  Download
Contributor : Equipe Roma Connect in order to contact the contributor
Submitted on : Thursday, October 18, 2018 - 9:00:18 AM
Last modification on : Friday, September 30, 2022 - 4:12:20 AM
Long-term archiving on: : Saturday, January 19, 2019 - 12:46:28 PM


Files produced by the author(s)



Anne Benoit, Yves Robert, Arnold Rosenberg, Frédéric Vivien. Static strategies for worksharing with unrecoverable interruptions. Theory of Computing Systems, Springer Verlag, 2013, 53 (3), pp.386-423. ⟨10.1007/s00224-012-9426-z⟩. ⟨hal-00763321⟩



Record views


Files downloads