Skip to Main content Skip to Navigation
Documents associated with scientific events

New Directions for Data Quality Mining

Abstract : As data types and data structures change to keep up with evolving technologies and applications, data quality problems too have evolved and become more complex. Data streams, web logs, wikipedias, biomedical applications, video streams and social networking websites generate a mind boggling variety of data types. Data quality mining, the use of data mining to manage, measure and improve data quality, has focused mostly on addressing each category of data glitch separately as a static entity. In this tutorial we highlight new directions in data quality mining, particularly: (a) the applicability and effectiveness of the methodologies for various data types such as structured, semi-structured and stream data, (b) the detection of concomitant data glitches like the occurrence of outliers in data with missing values and duplicates (c) the design of sequential approaches to data quality mining, such as workflows composed of a sequence of tasks for data quality exploration and analysis. We give a brief overview of past work, introduce current research in this area, and highlight new directions and open problems in data quality mining. The tutorial includes extensive case studies, applications and practical examples.
Complete list of metadata

Cited literature [110 references]  Display  Hide  Download
Contributor : Laure Berti-Equille <>
Submitted on : Friday, August 10, 2018 - 3:47:40 PM
Last modification on : Wednesday, June 16, 2021 - 3:41:25 AM
Long-term archiving on: : Sunday, November 11, 2018 - 1:21:15 PM


Files produced by the author(s)


  • HAL Id : hal-01856320, version 1


Laure Berti-Équille, Tamraparni Dasu. New Directions for Data Quality Mining. International Conference on Knowledge Discovery and Data Mining (KDD 2009), Jun 2009, Paris, France. ⟨hal-01856320⟩



Record views


Files downloads