SDAC: Porting Scientific Data to Spark RDDs

Tian Yang; Kenjiro Taura; Liu Chao

doi:10.1007/978-3-319-68210-5_13

Communication Dans Un Congrès Année : 2017

SDAC: Porting Scientific Data to Spark RDDs

(1, 2) , (2) , (1)

1
2

Tian Yang

Fonction : Auteur
PersonId : 1027965

Beihang University

The University of Tokyo

Kenjiro Taura

Fonction : Auteur

The University of Tokyo

Liu Chao

Fonction : Auteur

Beihang University

Résumé

Scientific data processing has exposed a range of technical problems in industrial exploration and specific-domain applications due to its huge input volume and data format diversity. While Big Data analytic frameworks such as Hadoop and Spark lack their native supports for processing increasing heterogeneous scientific data efficiently. In this paper, we introduce our work named SDAC (Scientific Data Auto Chunk) for porting various scientific data to RDDs to support parallel processing and analytics in Apache Spark framework. With the integration of auto-chunk task granularity-specify method, a better-planned theoretical pipeline can be derived to navigate data partitioning and parallel I/O. We showcase performance comparison with H5Spark within 6 benchmarks in both standalone and distributed mode. Experimental results showed SDAC module achieved an overall improvement of 2.1 times over H5Spark in standalone mode, and 1.34 times in distributed mode.

Mots clés

Scientific data Spark RDDs HDF5

Domaines

Informatique [cs]

Fichier principal

457609_1_En_13_Chapter.pdf (436.43 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Hal Ifip : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01705439

Soumis le : vendredi 9 février 2018-14:26:03

Dernière modification le : mercredi 3 novembre 2021-04:18:55

Archivage à long terme le : jeudi 10 mai 2018-12:41:00

Dates et versions

hal-01705439 , version 1 (09-02-2018)

Licence

Paternité

Identifiants

HAL Id : hal-01705439 , version 1
DOI : 10.1007/978-3-319-68210-5_13

Citer

Tian Yang, Kenjiro Taura, Liu Chao. SDAC: Porting Scientific Data to Spark RDDs. 14th IFIP International Conference on Network and Parallel Computing (NPC), Oct 2017, Hefei, China. pp.127-130, ⟨10.1007/978-3-319-68210-5_13⟩. ⟨hal-01705439⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IFIP-LNCS IFIP IFIP-TC IFIP-TC10 IFIP-NPC IFIP-WG10-3 IFIP-LNCS-10578

122 Consultations

153 Téléchargements

SDAC: Porting Scientific Data to Spark RDDs

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager