Towards Memory-Optimized Data Shuffling Patterns for Big Data Analytics

Bogdan Nicolae; Carlos Costa; Claudia Misale; Kostas Katrinis; Yoonho Park

doi:10.1109/CCGrid.2016.85

Communication Dans Un Congrès Année : 2016

Towards Memory-Optimized Data Shuffling Patterns for Big Data Analytics

(1) , (2) , (3) , (1) , (2)

1
2
3

Bogdan Nicolae

Fonction : Auteur

IBM Research - Ireland

Carlos Costa

Fonction : Auteur

IBM Watson Research Center

Claudia Misale

Fonction : Auteur

University of Torino

Kostas Katrinis

Fonction : Auteur

IBM Research - Ireland

Yoonho Park

Fonction : Auteur

IBM Watson Research Center

Résumé

Big data analytics is an indispensable tool in transforming science, engineering, medicine, healthcare, finance and ultimately business itself. With the explosion of data sizes and need for shorter time-to-solution, in-memory platforms such as Apache Spark gain increasing popularity. However, this introduces important challenges, among which data shuffling is particularly difficult: on one hand it is a key part of the computation that has a major impact on the overall performance and scalability so its efficiency is paramount, while on the other hand it needs to operate with scarce memory in order to leave as much memory available for data caching. In this context, efficient scheduling of data transfers such that it addresses both dimensions of the problem simultaneously is non-trivial. State-of-the-art solutions often rely on simple approaches that yield sub-optimal performance and resource usage. This paper contributes a novel shuffle data transfer strategy that dynamically adapts to the computation with minimal memory utilization, which we briefly underline as a series of design principles.

Mots clés

big data analytics data shuffling memory-efficient I/O elastic buffering

Domaines

Calcul parallèle, distribué et partagé [cs.DC]

Fichier principal

short.pdf (89.16 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Bogdan Nicolae : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01355227

Soumis le : lundi 22 août 2016-16:21:55

Dernière modification le : mardi 9 janvier 2024-12:34:04

Archivage à long terme le : mercredi 23 novembre 2016-12:50:56

Dates et versions

hal-01355227 , version 1 (22-08-2016)

Identifiants

HAL Id : hal-01355227 , version 1
DOI : 10.1109/CCGrid.2016.85

Citer

Bogdan Nicolae, Carlos Costa, Claudia Misale, Kostas Katrinis, Yoonho Park. Towards Memory-Optimized Data Shuffling Patterns for Big Data Analytics. CCGrid’16: 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 2016, Cartagena, Colombia. pp.409-412, ⟨10.1109/CCGrid.2016.85⟩. ⟨hal-01355227⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

43 Consultations

364 Téléchargements

Towards Memory-Optimized Data Shuffling Patterns for Big Data Analytics

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Altmetric

Partager