Optimal Grid Exploitation algorithms for Data Mining

Abstract : Although many Data Mining tasks have been parallelized and can thus be executed on dedicated clusters, few solutions currently exist to solve Data Mining problems on a grid or a non-specialized network of workstations. The current tendency is to focus on the use of grids and/or desktop grids in order to exploit any available workstations with no considerations of their physical positions. If a grid specific algorithm has some common characteristics with a dedicated-cluster algorithm, many constraints are inherent to the use of the grid. In particular, resource volatility and communications cost reduce the parallelism effectiveness. The DisDaMin project (DIStributed DAta MINing) revisits the data mining tasks and proposes new exploitable algorithms for grids. The DisDaMin mechanisms first implement a specific fragmentation of the data using clustering methods, and then realize asynchronous collaborative techniques according to the specifics of execution on grids. The use of this fragmentation method makes it possible to carry out optimal local processing on each node, with a minimum of communications. Using this, we introduce the distributed algorithm DICCoop, an adaptation of DIC (see [3]). Simulations were performed to prove the efficiency of the proposed mechanisms and are hosted on the french national grid GRID5000 (part of the European CoreGrid). We analyse the impact of the numerous
Type de document :
Communication dans un congrès
ISPDC'06, Jul 2006, Timisoara, Romania. IEEE, pp.246-252, 2006, 〈10.1109/ISPDC.2006.36〉
Liste complète des métadonnées

Contributeur : Ist Rennes <>
Soumis le : mercredi 18 avril 2012 - 15:47:56
Dernière modification le : mardi 24 avril 2018 - 13:36:24




Valérie Fiolet, Richard Olejnik, Guillem Lefait, Bernard Toursel. Optimal Grid Exploitation algorithms for Data Mining. ISPDC'06, Jul 2006, Timisoara, Romania. IEEE, pp.246-252, 2006, 〈10.1109/ISPDC.2006.36〉. 〈hal-00688828〉



Consultations de la notice