Skip to Main content Skip to Navigation
New interface
Reports (Research report)

Joining Distributed Database Summaries

Abstract : The database summarization system coined SaintEtiQ provides multi-level summaries of tabular data stored into a centralized database. Summaries are computed online with a conceptual hierarchical clustering algorithm. However, in many companies, data are distributed among several sites, either homogeneously (i.e. , sites contain data for a common set of features) or heterogeneously (i.e. , sites contain data for different features). Consequently, the current centralized version of SaintEtiQ is either not feasible or even not desirable due to privacy or resource issues. In this paper, we propose two new algorithms for summarizing heterogeneously distributed data without a prior "unification" of the data sources: Subspace-Oriented Join Algorithm (SOJA) and Tree Alignement-based Join Algorithm (TAJA). The main idea of such algorithms consists in applying innovative joins on two local models, computed over two disjoint sets of features, to provide a global summary over the full feature set without scanning the raw data. SOJA takes one of the two input trees as the base model and the other one is processed to complete the first one, whereas TAJA rearranges summaries by levels in a top-down manner. Then, we propose a consistent quality measure to quantify how good our joined hierarchies are. Finally, an experimental study, using synthetic data sets, shows that our joining processes (SOJA and TAJA) result in high quality clustering schemas of the entire distributed data and are very efficient in terms of computational time w.r.t. the centralized approach.
Document type :
Reports (Research report)
Complete list of metadata
Contributor : Guillaume Raschia Connect in order to contact the contributor
Submitted on : Thursday, December 11, 2008 - 5:01:57 PM
Last modification on : Wednesday, October 26, 2022 - 8:16:10 AM
Long-term archiving on: : Tuesday, June 8, 2010 - 4:36:44 PM


Files produced by the author(s)


  • HAL Id : inria-00346528, version 1


Mounir Bechchi, Guillaume Raschia, Noureddine Mouaddib. Joining Distributed Database Summaries. [Research Report] RR-6768, INRIA. 2008, pp.29. ⟨inria-00346528⟩



Record views


Files downloads