Joining Distributed Database Summaries - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Rapport (Rapport De Recherche) Année : 2008

Joining Distributed Database Summaries

Résumé

The database summarization system coined SaintEtiQ provides multi-level summaries of tabular data stored into a centralized database. Summaries are computed online with a conceptual hierarchical clustering algorithm. However, in many companies, data are distributed among several sites, either homogeneously (i.e. , sites contain data for a common set of features) or heterogeneously (i.e. , sites contain data for different features). Consequently, the current centralized version of SaintEtiQ is either not feasible or even not desirable due to privacy or resource issues. In this paper, we propose two new algorithms for summarizing heterogeneously distributed data without a prior "unification" of the data sources: Subspace-Oriented Join Algorithm (SOJA) and Tree Alignement-based Join Algorithm (TAJA). The main idea of such algorithms consists in applying innovative joins on two local models, computed over two disjoint sets of features, to provide a global summary over the full feature set without scanning the raw data. SOJA takes one of the two input trees as the base model and the other one is processed to complete the first one, whereas TAJA rearranges summaries by levels in a top-down manner. Then, we propose a consistent quality measure to quantify how good our joined hierarchies are. Finally, an experimental study, using synthetic data sets, shows that our joining processes (SOJA and TAJA) result in high quality clustering schemas of the entire distributed data and are very efficient in terms of computational time w.r.t. the centralized approach.
Fichier principal
Vignette du fichier
RR-6768.pdf (595.57 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

inria-00346528 , version 1 (11-12-2008)

Identifiants

  • HAL Id : inria-00346528 , version 1

Citer

Mounir Bechchi, Guillaume Raschia, Noureddine Mouaddib. Joining Distributed Database Summaries. [Research Report] RR-6768, INRIA. 2008, pp.29. ⟨inria-00346528⟩
119 Consultations
163 Téléchargements

Partager

Gmail Facebook X LinkedIn More