Skip to Main content Skip to Navigation
New interface
Conference papers

In-Transit Molecular Dynamics Analysis with Apache Flink

Henrique C Zanúz 1 Bruno Raffin 1 Omar A Mures 2 Emilio J Padrón 2 
1 DATAMOVE - Data Aware Large Scale Computing
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : In this paper, an on-line parallel analytics framework is proposed to process and store in transit all the data being generated by a Molecular Dynamics (MD) simulation run using staging nodes in the same cluster executing the simulation. The implementation and deployment of such a parallel workflow with standard HPC tools, managing problems such as data partitioning and load balancing, can be a hard task for scientists. In this paper we propose to leverage Apache Flink, a scalable stream processing engine from the Big Data domain, in this HPC context. Flink enables to program analyses within a simple window based map/reduce model, while the runtime takes care of the deployment, load balancing and fault tolerance. We build a complete in transit analytics workflow, connecting an MD simulation to Apache Flink and to a distributed database, Apache HBase, to persist all the desired data. To demonstrate the expressivity of this programming model and its suitability for HPC scientific environments, two common analytics in the MD field have been implemented. We assessed the performance of this framework, concluding that it can handle simulations of sizes used in the literature while providing an effective and versatile tool for scientists to easily incorporate on-line parallel analytics in their current workflows.
Complete list of metadata

Cited literature [23 references]  Display  Hide  Download
Contributor : Bruno Raffin Connect in order to contact the contributor
Submitted on : Monday, October 8, 2018 - 10:59:34 AM
Last modification on : Wednesday, July 6, 2022 - 4:14:29 AM
Long-term archiving on: : Wednesday, January 9, 2019 - 2:23:34 PM


  • HAL Id : hal-01889939, version 1


Henrique C Zanúz, Bruno Raffin, Omar A Mures, Emilio J Padrón. In-Transit Molecular Dynamics Analysis with Apache Flink. ISAV 2018 - In Situ Infrastructures for Enabling Extreme-scale Analysis and Visualization, Nov 2018, DALLAS, United States. pp.1-8. ⟨hal-01889939⟩



Record views


Files downloads