Skip to Main content Skip to Navigation
Theses

Scalable and Efficient Data Management in Distributed Clouds : Service Provisioning and Data Processing

Jad Darrous 1, 2
Abstract : This thesis focuses on scalable data management solutions to accelerate service provisioning and enable efficient execution of data-intensive applications in large-scale distributed clouds. Data-intensive applications are increasingly running on distributed infrastructures (multiple clusters). The main two reasons for such a trend are 1) moving computation to data sources can eliminate the latency of data transmission, and 2) storing data on one site may not be feasible given the continuous increase of data size.On the one hand, most applications run on virtual clusters to provide isolated services, and require virtual machine images (VMIs) or container images to provision such services. Hence, it is important to enable fast provisioning of virtualization services to reduce the waiting time of new running services or applications. Different from previous work, during the first part of this thesis, we worked on optimizing data retrieval and placement considering challenging issues including the continuous increase of the number and size of VMIs and container images, and the limited bandwidth and heterogeneity of the wide area network (WAN) connections.On the other hand, data-intensive applications rely on replication to provide dependable and fast services, but it became expensive and even infeasible with the unprecedented growth of data size. The second part of this thesis provides one of the first studies on understanding and improving the performance of data-intensive applications when replacing replication with the storage-efficient erasure coding (EC) technique.
Complete list of metadatas

Cited literature [331 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02508592
Contributor : Abes Star :  Contact
Submitted on : Sunday, March 15, 2020 - 1:01:51 AM
Last modification on : Saturday, November 7, 2020 - 3:07:22 AM
Long-term archiving on: : Tuesday, June 16, 2020 - 6:25:42 PM

File

DARROUS_Jad_2019LYSEN077_These...
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02508592, version 1

Citation

Jad Darrous. Scalable and Efficient Data Management in Distributed Clouds : Service Provisioning and Data Processing. Distributed, Parallel, and Cluster Computing [cs.DC]. Université de Lyon, 2019. English. ⟨NNT : 2019LYSEN077⟩. ⟨tel-02508592⟩

Share

Metrics

Record views

294

Files downloads

665