Similarity Search for Scientific Workflows

Johannes Starlinger 1 Bryan Brancotte 2, 3 Sarah Cohen-Boulakia 2, 3, 4, 5, 6 Ulf Leser 1
3 AMIB - Algorithms and Models for Integrative Biology
LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau], LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France, X - École polytechnique, CNRS - Centre National de la Recherche Scientifique : UMR8623
5 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
6 VIRTUAL PLANTS - Modeling plant morphogenesis at different scales, from genes to phenotype
CRISAM - Inria Sophia Antipolis - Méditerranée , INRA - Institut National de la Recherche Agronomique, Centre de coopération internationale en recherche agronomique pour le développement [CIRAD] : UMR51
Abstract : With the increasing popularity of scientific workflows, public repositories are gaining importance as a means to share, find, and reuse such workflows. As the sizes of these repositories grow, methods to compare the scientific workflows stored in them become a necessity, for instance, to allow duplicate detection or similarity search. Scientific workflows are complex objects, and their comparison entails a number of distinct steps from comparing atomic elements to comparison of the workflows as a whole. Various studies have implemented methods for scientific workflow comparison and came up with often contradicting conclusions upon which algorithms work best. Comparing these results is cumbersome, as the original studies mixed different approaches for different steps and used different evaluation data and metrics. We contribute to the field (i) by disecting each previous approach into an explicitly defined and comparable set of subtasks, (ii) by comparing in isolation different approaches taken at each step of scientific workflow comparison, reporting on an number of unexpected findings, (iii) by investigating how these can best be combined into aggregated measures, and (iv) by making available a gold standard of over 2000 similarity ratings contributed by 15 workflow experts on a corpus of almost 1500 workflows and re-implementations of all methods we evaluated.
Type de document :
Article dans une revue
Proceedings of the VLDB Endowment (PVLDB), VLDB Endowment, 2014, 7 (12), pp.1143-1154. 〈10.14778/2732977.2732988〉
Liste complète des métadonnées

Littérature citée [35 références]  Voir  Masquer  Télécharger


https://hal.inria.fr/hal-01066046
Contributeur : Christophe Godin <>
Soumis le : vendredi 9 janvier 2015 - 10:42:27
Dernière modification le : mercredi 10 octobre 2018 - 14:28:13
Document(s) archivé(s) le : vendredi 10 avril 2015 - 10:17:15

Fichiers

p1143-starlinger.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

Citation

Johannes Starlinger, Bryan Brancotte, Sarah Cohen-Boulakia, Ulf Leser. Similarity Search for Scientific Workflows. Proceedings of the VLDB Endowment (PVLDB), VLDB Endowment, 2014, 7 (12), pp.1143-1154. 〈10.14778/2732977.2732988〉. 〈hal-01066046〉

Partager

Métriques

Consultations de la notice

1055

Téléchargements de fichiers

421