Skip to Main content Skip to Navigation
Reports

Joining Distributed Database Summaries

Abstract : The database summarization system coined SaintEtiQ provides multi-level summaries of tabular data stored into a centralized database. Summaries are computed online with a conceptual hierarchical clustering algorithm. However, in many companies, data are distributed among several sites, either homogeneously (i.e. , sites contain data for a common set of features) or heterogeneously (i.e. , sites contain data for different features). Consequently, the current centralized version of SaintEtiQ is either not feasible or even not desirable due to privacy or resource issues. In this paper, we propose two new algorithms for summarizing heterogeneously distributed data without a prior "unification" of the data sources: Subspace-Oriented Join Algorithm (SOJA) and Tree Alignement-based Join Algorithm (TAJA). The main idea of such algorithms consists in applying innovative joins on two local models, computed over two disjoint sets of features, to provide a global summary over the full feature set without scanning the raw data. SOJA takes one of the two input trees as the base model and the other one is processed to complete the first one, whereas TAJA rearranges summaries by levels in a top-down manner. Then, we propose a consistent quality measure to quantify how good our joined hierarchies are. Finally, an experimental study, using synthetic data sets, shows that our joining processes (SOJA and TAJA) result in high quality clustering schemas of the entire distributed data and are very efficient in terms of computational time w.r.t. the centralized approach.
Document type :
Reports
Complete list of metadatas

https://hal.inria.fr/inria-00346528
Contributor : Guillaume Raschia <>
Submitted on : Thursday, December 11, 2008 - 5:01:57 PM
Last modification on : Wednesday, April 11, 2018 - 1:56:31 AM
Long-term archiving on: : Tuesday, June 8, 2010 - 4:36:44 PM

File

RR-6768.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00346528, version 1

Collections

Citation

Mounir Bechchi, Guillaume Raschia, Noureddine Mouaddib. Joining Distributed Database Summaries. [Research Report] RR-6768, INRIA. 2008, pp.29. ⟨inria-00346528⟩

Share

Metrics

Record views

325

Files downloads

282