Tracing Data Pollution in Large Business Applications

Abstract : In large business applications, various data processing activities can be done locally or outsourced, split or combined and the resulting data flows have to be exchanged, shared or integrated from multiple data processing units. There are indeed various alternative paths for data processing and data consolidation. But some data flows and data processing applications are most likely exposed to generating and propagating data errors; some of them are more critical too. Actually, we usually ignore the impact of data errors in large and complex business applications because : 1) it is often very difficult to systematically audit data, detect and trace data errors in such large applications, 2) we usually don't have the complete picture of all the data processing units involved in every data processing paths; they are viewed as black-boxes, and 3) we usually ignore the total cost of detecting and eliminating data anomalies and surprisingly we also ignore the cost of " doing nothing " to resolve them. In this paper, the objectives of our ongoing research are the following: to propose a probabilistic model reflecting data error propagation in large business applications, to determine the most critical or impacted data processing paths and their weak points or vulnerabilities in terms of data quality, to advocate adequate locations for data quality checkpoints, and to predict the cost of doing-nothing versus the cost of data cleaning activities.
Type de document :
Communication dans un congrès
Proceedings of the 13th International Conference on Information Quality (IQ’08), , Nov 2008, Cambridge, MA, United States
Liste complète des métadonnées

https://hal.inria.fr/hal-01856123
Contributeur : Laure Berti-Equille <>
Soumis le : jeudi 9 août 2018 - 17:04:32
Dernière modification le : vendredi 16 novembre 2018 - 01:27:52
Document(s) archivé(s) le : samedi 10 novembre 2018 - 13:19:10

Fichier

tracing.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01856123, version 1

Collections

Citation

Laure Berti-Équille. Tracing Data Pollution in Large Business Applications. Proceedings of the 13th International Conference on Information Quality (IQ’08), , Nov 2008, Cambridge, MA, United States. 〈hal-01856123〉

Partager

Métriques

Consultations de la notice

419

Téléchargements de fichiers

23