The 11th International Workshop on Quality in DataBases in conjunction with VLDB 2016

Abstract : Data quality problems arise frequently when data is integrated from disparate sources. In the context of Big Data applications, data quality is becoming more important because of the unprecedented volume, large variety, and high velocity. The challenges caused by volume and velocity of Big Data have been addressed by many research projects and commercial solutions and can be partially solved by modern, scalable data management systems. However, variety remains to be a daunting challenge for Big Data Integration and requires also special methods for data quality management. Variety (or heterogeneity) exists at several levels: at the instance level, the same entity might be described with different attributes; at the schema level, the data is structured with various schemas; but also at the level of the modeling language, different data models can be used (e.g., relational, XML, or a document-oriented JSON representation). This might lead to data quality issues such as consistency, understandability, or completeness. The heterogeneity of data sources in the Big Data Era requires new integration approaches which can handle the large volume and speed of the generated data as well as the variety and quality of the data. Traditional ‘schema first’ approaches as in the relational world with data warehouse systems and ETL (Extract-Transform-Load) processes are inappropriate for a flexible and dynamically changing data management landscape. The requirement for pre-defined, explicit schemas is a limitation which has drawn interest of many developers and researchers to NoSQL data management systems as these systems should provide data management features for a high amount of schema-less data. Nevertheless, a one-size-fits-all Big Data system is unlikely to solve all the challenges which are required from data management systems today. Instead, multiple classes of systems, optimized for specific requirements or hardware platforms, will co-exist in a data management landscape. Thus, heterogeneity and data quality are challenges for many Big Data applications. While in some applications, a limited data quality for individual data items does not cause serious problems when a huge amount of data is aggregated, data quality problems in data sources are often revealed by the integration of these sources with other information. Data quality has been coined as ‘fitness for use’; thus, if data is used in another context than originally planned, data quality might become an issue. Similar observations have been also made for data warehouses which lead to a separate research area about data warehouse quality. The workshop QDB 2016 aims at discussing recent advances and challenges on data quality management in database systems, and focuses especially on problems in related to Big Data Integration and Big Data Quality. The workshop will provide a forum for the presentation of research results, a panel discussion, and an attractive keynote speaker.
Complete list of metadatas

https://hal.inria.fr/hal-01856096
Contributor : Laure Berti-Equille <>
Submitted on : Thursday, August 9, 2018 - 4:32:52 PM
Last modification on : Friday, May 17, 2019 - 1:20:16 AM

Identifiers

  • HAL Id : hal-01856096, version 1

Citation

Laure Berti-Equille, Christoph Quix, Verikat Gudivada, Rihan Hai, Hongzhi Wang. The 11th International Workshop on Quality in DataBases in conjunction with VLDB 2016. International Quality in Databases workshop (QDB 2016) in conjunction with VLDB 2016,, Sep 2016, Delhi, India. 2016. ⟨hal-01856096⟩

Share

Metrics

Record views

118