Active Data: A Programming Model to Manage Data Life Cycle Across Heterogeneous Systems and Infrastructures

Abstract : The Big Data challenge consists in managing, storing, analyzing and visualizing these huge and ever growing data sets to extract sense and knowledge. As the volume of data grows exponentially, the management of these data becomes more complex in proportion. A key point is to handle the complexity of the data life cycle, i.e. the various operations performed on data: transfer, archiving, replication, deletion, etc. Indeed, data-intensive applications span over a large variety of devices and e-infrastructures which implies that many systems are involved in data management and processing. We propose Active Data, a programming model to automate and improve the expressiveness of data management applications. We first define the concept of data life cycle and introduce a formal model that allows to expose data life cycle across heterogeneous systems and infrastructures. The Active Data programming model allows code execution at each stage of the data life cycle: routines provided by programmers are executed when a set of events (creation, replication, transfer, deletion) happen to any data. We implement and evaluate the model with four use cases: a storage cache to Amazon-S3, a cooperative sensor network, an incremental implementation of the MapReduce programming model and automated data provenance tracking across heterogeneous systems. Altogether, these scenarios illustrate the adequateness of the model to program applications that manage distributed and dynamic data sets. We also show that applications that do not leverage on data life cycle can still benefit from Active Data to improve their performances.
Type de document :
Article dans une revue
Future Generation Computer Systems, Elsevier, 2015, 55, pp.17. 〈10.1016/j.future.2015.05.015〉
Liste complète des métadonnées

Littérature citée [48 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01241491
Contributeur : Gilles Fedak <>
Soumis le : vendredi 11 décembre 2015 - 11:15:42
Dernière modification le : vendredi 20 avril 2018 - 15:44:26
Document(s) archivé(s) le : samedi 29 avril 2017 - 11:13:45

Fichier

AD_FGCS_2015.pdf
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité - Pas d'utilisation commerciale - Partage selon les Conditions Initiales 4.0 International License

Identifiants

Collections

Citation

Anthony Simonet, Gilles Fedak, Matei Ripeanu. Active Data: A Programming Model to Manage Data Life Cycle Across Heterogeneous Systems and Infrastructures. Future Generation Computer Systems, Elsevier, 2015, 55, pp.17. 〈10.1016/j.future.2015.05.015〉. 〈hal-01241491〉

Partager

Métriques

Consultations de la notice

503

Téléchargements de fichiers

264