Skip to Main content Skip to Navigation
New interface
Conference papers

Tracing Data Pollution in Large Business Applications

Abstract : In large business applications, various data processing activities can be done locally or outsourced, split or combined and the resulting data flows have to be exchanged, shared or integrated from multiple data processing units. There are indeed various alternative paths for data processing and data consolidation. But some data flows and data processing applications are most likely exposed to generating and propagating data errors; some of them are more critical too. Actually, we usually ignore the impact of data errors in large and complex business applications because : 1) it is often very difficult to systematically audit data, detect and trace data errors in such large applications, 2) we usually don't have the complete picture of all the data processing units involved in every data processing paths; they are viewed as black-boxes, and 3) we usually ignore the total cost of detecting and eliminating data anomalies and surprisingly we also ignore the cost of " doing nothing " to resolve them. In this paper, the objectives of our ongoing research are the following: to propose a probabilistic model reflecting data error propagation in large business applications, to determine the most critical or impacted data processing paths and their weak points or vulnerabilities in terms of data quality, to advocate adequate locations for data quality checkpoints, and to predict the cost of doing-nothing versus the cost of data cleaning activities.
Complete list of metadata

Cited literature [13 references]  Display  Hide  Download
Contributor : Laure Berti-Equille Connect in order to contact the contributor
Submitted on : Thursday, August 9, 2018 - 5:04:32 PM
Last modification on : Friday, August 5, 2022 - 2:54:52 PM
Long-term archiving on: : Saturday, November 10, 2018 - 1:19:10 PM


Files produced by the author(s)


  • HAL Id : hal-01856123, version 1


Laure Berti-Équille. Tracing Data Pollution in Large Business Applications. Proceedings of the 13th International Conference on Information Quality (IQ’08), , Nov 2008, Cambridge, MA, United States. ⟨hal-01856123⟩



Record views


Files downloads