Layer Decomposition: An Effective Structure-based Approach for Scientific Workflow Similarity

Johannes Starlinger 1 Sarah Cohen-Boulakia 2, 3, 4, 5, 6 Sanjeev Khanna 7 Susan Davidson 7 Ulf Leser 1
3 AMIB - Algorithms and Models for Integrative Biology
LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau], LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France
5 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
6 VIRTUAL PLANTS - Modeling plant morphogenesis at different scales, from genes to phenotype
CRISAM - Inria Sophia Antipolis - Méditerranée , INRA - Institut National de la Recherche Agronomique, UMR AGAP - Amélioration génétique et adaptation des plantes méditerranéennes et tropicales
Abstract : Scientific workflows have become a valuable tool for large-scale data processing and analysis. This has led to the creation of specialized online repositories to facilitate workflow sharing and reuse. Over time, these repositories have grown to sizes that call for advanced methods to support workflow discovery, in particular for effective similarity search. Here, we present a novel and intuitive workflow similarity measure that is based on layer decomposition. Layer decomposition accounts for the directed dataflow underlying scientific workflows, a property which has not been adequately considered in previous methods. We comparatively evaluate our algorithm using a gold standard for 24 query workflows from a repository of almost 1500 scientific workflows, and show that it a) delivers the best results for similarity search, b) has a much lower runtime than other, often highly complex competitors in structure-aware workflow comparison, and c) can be stacked easily with even faster, structure-agnostic approaches to further reduce runtime while retaining result quality.
Liste complète des métadonnées

Cited literature [20 references]  Display  Hide  Download

https://hal.inria.fr/hal-01066076
Contributor : Sarah Cohen-Boulakia <>
Submitted on : Friday, September 19, 2014 - 10:34:09 AM
Last modification on : Tuesday, April 16, 2019 - 1:32:27 AM
Document(s) archivé(s) le : Saturday, December 20, 2014 - 11:01:27 AM

File

starlingerEscience.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01066076, version 1

Citation

Johannes Starlinger, Sarah Cohen-Boulakia, Sanjeev Khanna, Susan Davidson, Ulf Leser. Layer Decomposition: An Effective Structure-based Approach for Scientific Workflow Similarity. IEEE e-Science conference, Oct 2014, Guarujá, Brazil. ⟨hal-01066076⟩

Share

Metrics

Record views

1156

Files downloads

529