HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Incorporating Probabilistic Optimizations for Resource Provisioning of Data Processing Workflows

Abstract : Workflow is an important model for big data processing and resource provisioning is crucial to the performance of workflows. Recently, system variations in the cloud and large-scale clusters, such as those in I/O and network performances, have been observed to greatly affect the performance of workflows. Traditional resource provisioning methods, which overlook these variations, can lead to suboptimal resource provisioning results. In this paper, we provide a general solution for workflow performance optimizations considering system variations. Specifically, we model system variations as time-dependent random variables and take their probability distributions as optimization input. Despite its effectiveness, this solution involves heavy computation overhead. Thus, we propose three pruning techniques to simplify workflow structure and reduce the probability evaluation overhead. We implement our techniques in a runtime library, which allows users to incorporate efficient probabilistic optimization into existing resource provisioning methods. Experiments show that probabilistic solutions can improve the performance by 51% compared to state-of-the-art static solutions while guaranteeing budget constraint, and our pruning techniques can greatly reduce the overhead of probabilistic optimization.
Complete list of metadata

Cited literature [32 references]  Display  Hide  Download

Contributor : Shadi Ibrahim Connect in order to contact the contributor
Submitted on : Tuesday, December 10, 2019 - 11:10:12 AM
Last modification on : Wednesday, April 27, 2022 - 5:22:07 PM
Long-term archiving on: : Wednesday, March 11, 2020 - 3:19:28 PM


Files produced by the author(s)



Amelie Chi Zhou, Yao Xiao, Bingsheng He, Shadi Ibrahim, Reynold Cheng. Incorporating Probabilistic Optimizations for Resource Provisioning of Data Processing Workflows. ICPP 2019 - 48th International Conference on Parallel Processing, Aug 2019, Kyoto, Japan. pp.1-10, ⟨10.1145/3337821.3337847⟩. ⟨hal-02389078⟩



Record views


Files downloads