Skip to Main content Skip to Navigation
Conference papers

Toward Generic Abstractions for Data of Any Model

Nelly Barret 1, 2 Ioana Manolescu 1, 2 Prajna Upadhyay 2 
2 CEDAR - Rich Data Analytics at Cloud Scale
LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau], Inria Saclay - Ile de France
Abstract : Digital data sharing leads to unprecedented opportunities to develop data-driven systems for supporting economic activities, the social and political life, and science. Many open-access datasets are RDF graphs, but others are CSV files, Neo4J property graphs, JSON or XML documents, etc. Potential users need to understand a dataset in order to decide if it is useful for their goal. While some datasets come with a schema and/or documentation, this is not always the case. Data summarization or schema inference tools have been proposed, specializing in XML, or JSON, or the RDF data models. In this work, we present a dataset abstraction approach, which () applies on relational, CSV, XML, JSON, RDF or Property Graph data; () computes an abstraction meant for humans (as opposed to a schema meant for a parser); () integrates Information Extraction data profiling, to also classify dataset content among a set of categories of interest to the user. Our abstractions are conceptually close to an Entity-Relationship diagram, if one allows nested and possibly heterogeneous structure within entities.
Document type :
Conference papers
Complete list of metadata
Contributor : Nelly Barret Connect in order to contact the contributor
Submitted on : Tuesday, September 14, 2021 - 6:02:53 PM
Last modification on : Friday, February 4, 2022 - 3:12:22 AM


Files produced by the author(s)


  • HAL Id : hal-03344041, version 2


Nelly Barret, Ioana Manolescu, Prajna Upadhyay. Toward Generic Abstractions for Data of Any Model. BDA 2021 - Informal publication only, Oct 2021, Paris, France. ⟨hal-03344041v2⟩



Record views


Files downloads