A New Framework for Evaluating Straggler Detection Mechanisms in MapReduce

Abstract : Big Data systems (e.g., Google MapReduce, Apache Hadoop, Apache Spark) rely increasingly on speculative execution to mask slow tasks, also known as stragglers, because a job’s execution time is dominated by the slowest task instance. Big Data systems typically identify stragglers and speculatively run copies of those tasks with the expectation that a copy may complete faster to shorten job execution times. There is a rich body of recent results on straggler mitigation in MapReduce. However, the majority of these do not consider the problem of accurately detecting stragglers. Instead, they adopt a particular straggler detection approach and then study its effectiveness in terms of performance, e.g., reduction in job completion time or higher efficiency, e.g., high resource utilization. In this article, we consider a complete framework for straggler detection and mitigation. We start with a set of metrics that can be used to characterize and detect stragglers including Precision, Recall, Detection Latency, Undetected Time, and Fake Positive. We then develop an architectural model by which these metrics can be linked to measures of performance including execution time and system energy overheads. We further conduct a series of experiments to demonstrate which metrics and approaches are more effective in detecting stragglers and are also predictive of effectiveness in terms of performance and energy efficiencies. For example, our results indicate that the default Hadoop straggler detector could be made more effective. In a certain case, Precision is low and only 55% of those detected are actual stragglers and the Recall, i.e., percent of actual detected stragglers, is also relatively low at 56%. For the same case, the hierarchical approach (i.e., a green-driven detector based on the default one) achieves a Precision of 99% and a Recall of 29%. This increase in Precision can be translated to achieve lower execution time and energy consumption, and thus higher performance and energy efficiency; compared to the default Hadoop mechanism, the energy consumption is reduced by almost 31%. These results demonstrate how our framework can offer useful insights and be applied in practical settings to characterize and design new straggler detection mechanisms for MapReduce systems.
Complete list of metadatas

Cited literature [27 references]  Display  Hide  Download

https://hal.inria.fr/hal-02172590
Contributor : Shadi Ibrahim <>
Submitted on : Thursday, August 1, 2019 - 8:05:44 AM
Last modification on : Thursday, November 14, 2019 - 10:10:49 AM

File

Paper-CR.pdf
Files produced by the author(s)

Identifiers

Citation

Tien-Dat Phan, Guillaume Pallez, Shadi Ibrahim, Padma Raghavan. A New Framework for Evaluating Straggler Detection Mechanisms in MapReduce. ACM Transactions on Modeling and Performance Evaluation of Computing Systems, ACM, 2019, X, pp.1-22. ⟨10.1145/3328740⟩. ⟨hal-02172590v2⟩

Share

Metrics

Record views

113

Files downloads

549