Optimal Grid Exploitation algorithms for Data Mining

Abstract : Although many Data Mining tasks have been parallelized and can thus be executed on dedicated clusters, few solutions currently exist to solve Data Mining problems on a grid or a non-specialized network of workstations. The current tendency is to focus on the use of grids and/or desktop grids in order to exploit any available workstations with no considerations of their physical positions. If a grid specific algorithm has some common characteristics with a dedicated-cluster algorithm, many constraints are inherent to the use of the grid. In particular, resource volatility and communications cost reduce the parallelism effectiveness. The DisDaMin project (DIStributed DAta MINing) revisits the data mining tasks and proposes new exploitable algorithms for grids. The DisDaMin mechanisms first implement a specific fragmentation of the data using clustering methods, and then realize asynchronous collaborative techniques according to the specifics of execution on grids. The use of this fragmentation method makes it possible to carry out optimal local processing on each node, with a minimum of communications. Using this, we introduce the distributed algorithm DICCoop, an adaptation of DIC (see [3]). Simulations were performed to prove the efficiency of the proposed mechanisms and are hosted on the french national grid GRID5000 (part of the European CoreGrid). We analyse the impact of the numerous
Document type :
Conference papers
Liste complète des métadonnées

Contributor : Ist Rennes <>
Submitted on : Wednesday, April 18, 2012 - 3:47:56 PM
Last modification on : Thursday, February 21, 2019 - 10:52:46 AM




Valérie Fiolet, Richard Olejnik, Guillem Lefait, Bernard Toursel. Optimal Grid Exploitation algorithms for Data Mining. ISPDC'06, Jul 2006, Timisoara, Romania. pp.246-252, ⟨10.1109/ISPDC.2006.36⟩. ⟨hal-00688828⟩



Record views