Static Analysis of Data Transformations in Jupyter Notebooks - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2023

Static Analysis of Data Transformations in Jupyter Notebooks

Résumé

Jupyter notebooks used to pre-process and polish raw data for data science and machine learning processes are challenging to analyze. Their data-centric code manipulates dataframes through call to library functions with complex semantics, and the properties to track over it vary widely depending on the verification task. This paper presents a novel abstract domain that simplifies writing analyses for such programs, by extracting a unique CFG from the notebook that contains all transformations applied to the data. Several properties can then be determined by analyzing such CFG, that is simpler than the original Python code. We present a first use case that exploits our analysis to infer the required shape of the dataframes manipulated by the notebook. CCS Concepts: • Theory of computation → Program analysis; Abstraction; • Software and its engineering → Automated static analysis.
Fichier principal
Vignette du fichier
soap2023.pdf (656.84 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04249950 , version 1 (19-10-2023)

Licence

Paternité

Identifiants

Citer

Luca Negrini, Guruprerana Shabadi, Caterina Urban. Static Analysis of Data Transformations in Jupyter Notebooks. 12th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis (SOAP 2023), Jun 2023, Orlando FL, United States. pp.8-13, ⟨10.1145/3589250.3596145⟩. ⟨hal-04249950⟩
21 Consultations
10 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More