Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, Epiciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Skip to Main content Skip to Navigation
Conference papers

Graph-based keyword search in heterogeneous data sources

Angelos Christos Anadiotis 1 Mhd yamen Haddad 1 Ioana Manolescu 1 
1 CEDAR - Rich Data Analytics at Cloud Scale
LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau], Inria Saclay - Ile de France
Abstract : Data journalism is the field of investigative journalism which focuses on digital data by treating them as first-class citizens. Following the trends in human activity, which leaves strong digital traces, data journalism becomes increasingly important. However, as the number and the diversity of data sources increase, heterogeneous data models with different structure, or even no structure at all, need to be considered in query answering. Inspired by our collaboration with Le Monde, a leading French newspaper, we designed a novel query algorithm for exploiting such heterogeneous corpora through keyword search. We model our underlying data as graphs and, given a set of search terms, our algorithm nds links between them within and across the heterogeneous datasets included in the graph. We draw inspiration from prior work on keyword search in structured and unstructured data, which we extend with the data heterogeneity dimension, which makes the keyword search problem computationally harder. We implement our algorithm and we evaluate its performance using synthetic and real-world datasets.
Document type :
Conference papers
Complete list of metadata
Contributor : Ioana Manolescu Connect in order to contact the contributor
Submitted on : Wednesday, September 9, 2020 - 10:33:55 AM
Last modification on : Saturday, June 25, 2022 - 8:29:24 PM
Long-term archiving on: : Wednesday, December 2, 2020 - 11:42:57 PM


Files produced by the author(s)


  • HAL Id : hal-02934277, version 1
  • ARXIV : 2009.04283


Angelos Christos Anadiotis, Mhd yamen Haddad, Ioana Manolescu. Graph-based keyword search in heterogeneous data sources. BDA 2020 - 36ème Conférence sur la Gestion de Données – Principes, Technologies et Applications, Oct 2020, Online, France. ⟨hal-02934277⟩



Record views


Files downloads