Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Progressive Data Science: Potential and Challenges

Abstract : Data science requires time-consuming iterative manual activities. In particular, activities such as data selection, preprocessing, transformation, and mining, highly depend on iterative trial-and-error processes that could be sped up significantly by providing quick feedback on the impact of changes. The idea of progressive data science is to compute the results of changes in a progressive manner, returning a first approximation of results quickly and allow iterative refinements until converging to a final result. Enabling the user to interact with the intermediate results allows an early detection of erroneous or suboptimal choices, the guided definition of modifications to the pipeline and their quick assessment. In this paper, we discuss the progressiveness challenges arising in different steps of the data science pipeline. We describe how changes in each step of the pipeline impact the subsequent steps and outline why progressive data science will help to make the process more effective. Computing progressive approximations of outcomes resulting from changes creates numerous research challenges, especially if the changes are made in the early steps of the pipeline. We discuss these challenges and outline first steps towards progressiveness, which, we argue, will ultimately help to significantly speed-up the overall data science process.
Document type :
Preprints, Working Papers, ...
Complete list of metadata

Cited literature [61 references]  Display  Hide  Download

https://hal.inria.fr/hal-01961871
Contributor : Jean-Daniel Fekete Connect in order to contact the contributor
Submitted on : Sunday, July 14, 2019 - 3:58:01 PM
Last modification on : Thursday, July 8, 2021 - 3:50:49 AM

File

1812.08032.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01961871, version 1
  • ARXIV : 1812.08032

Citation

Cagatay Turkay, Nicola Pezzotti, Carsten Binnig, Hendrik Strobelt, Barbara Hammer, et al.. Progressive Data Science: Potential and Challenges. 2019. ⟨hal-01961871⟩

Share

Metrics

Les métriques sont temporairement indisponibles