In-Transit Molecular Dynamics Analysis with Apache Flink

Henrique Zanúz 1 Bruno Raffin 2 Omar Mures 3 Emilio Padrón 3
1 DATAMOVE - Data Aware Large Scale Computing
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
2 MOAIS - PrograMming and scheduling design fOr Applications in Interactive Simulation
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : In this paper, an on-line parallel analytics framework is proposed to process and store in transit all the data being generated by a Molecular Dynamics (MD) simulation run using staging nodes in the same cluster executing the simulation. The implementation and deployment of such a parallel workflow with standard HPC tools, managing problems such as data partitioning and load balancing, can be a hard task for scientists. In this paper we propose to leverage Apache Flink, a scalable stream processing engine from the Big Data domain, in this HPC context. Flink enables to program analyses within a simple window based map/reduce model, while the runtime takes care of the deployment, load balancing and fault tolerance. We build a complete in transit analytics workflow, connecting an MD simulation to Apache Flink and to a distributed database, Apache HBase, to persist all the desired data. To demonstrate the expressivity of this programming model and its suitability for HPC scientific environments, two common analytics in the MD field have been implemented. We assessed the performance of this framework, concluding that it can handle simulations of sizes used in the literature while providing an effective and versatile tool for scientists to easily incorporate on-line parallel analytics in their current workflows.
Complete list of metadatas

Cited literature [23 references]  Display  Hide  Download

https://hal.inria.fr/hal-01889939
Contributor : Bruno Raffin <>
Submitted on : Monday, October 8, 2018 - 10:59:34 AM
Last modification on : Tuesday, November 13, 2018 - 5:34:05 PM
Long-term archiving on : Wednesday, January 9, 2019 - 2:23:34 PM

Identifiers

  • HAL Id : hal-01889939, version 1

Citation

Henrique Zanúz, Bruno Raffin, Omar Mures, Emilio Padrón. In-Transit Molecular Dynamics Analysis with Apache Flink. ISAV 2018 - In Situ Infrastructures for Enabling Extreme-scale Analysis and Visualization, Nov 2018, DALLAS, United States. pp.1-8. ⟨hal-01889939⟩

Share

Metrics

Record views

189

Files downloads

338