Optimizing distributed data mining applications based on object clustering methods
Résumé
The exponential computational cost involved in traditional data mining methods enforces search for less complex new algorithms. Especially, data mining on Grid is a challenge due to the lack of shared memory in Grid computing, which puts special attention to communication optimization. The aim of the DisDaMin project (Distributed Data Mining), descibed in the paper, is solving data mining problems by using new distributed algorithms intented for execution in Grid environments. The DisDaMin implements intelligent fragmentation of data by clustering methods and asynchronous collaborative processing adjusted to Grid environments. The DG-ADAJ environment provides adaptive control of distributed applications written in Java for Desktop Grid. It constitutes a component-based middleware, which allows for optimized distribution of applications on clusters of Java Virtual Machines, monitoring of application execution and dynamic on-line balancing of processing and communication. The DG-ADAJ system provides a middleware platform for Desktop Grid that could be used as a deployment base for DisDaMin algorithms. In this paper, we propose static object placement optimization algorithms for fragmentation of data in the DisDaMin project. The algorithms use DG-ADAJ's object clustering methods to provide optimized local processing on each node with minimized inter-node communication.