HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

IDDCA: A New Clustering Approach For Sampling

Daniel Gracia Pérez 1 Hugues Berry 1 Olivier Temam 1
1 ALCHEMY - Architectures, Languages and Compilers to Harness the End of Moore Years
LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, CNRS - Centre National de la Recherche Scientifique : UMR8623, Inria Saclay - Ile de France
Abstract : Clustering methods are machine-learning algorithms that can be used to easily select the most representative samples within a huge program trace. k-means is a popular clustering method for sampling. While k-means performs well, it has several shortcomings: (1) it depends on a random initialization, so that clustering results may vary across runs; (2) the maximal number of clusters is a user-selected parameter, but its optimal value can be benchmark/trace-dependent; (3) k-means is a multi-pass algorithm which may be less practical for a large number of intervals. To solve these issues, we adapted an alternative clustering method, called DCA, to the issue of sampling. Unlike k-means, DCA and its sampling-specific adaptation, IDDCA, do not require the user to be exposed to internal clustering parameters: it dynamically defines the number of clusters for each target program and the method parameters dynamically adapt to the target program. For an ordered input (e.g., a trace of intervals), the method is deterministic. Finally, it is an online and thus single-pass algorithm, resulting in a significant execution time gain over an existing and popular k-means implementation. Within the context of a variable-size sampling approach, we show that IDDCA can achieve an average CPI error of 1.62% over the 26 SPEC benchmarks, with a maximum error of 5.72% and an average of 403 million instructions.
Document type :
Conference papers
Complete list of metadata

Contributor : Hugues Berry Connect in order to contact the contributor
Submitted on : Sunday, January 29, 2006 - 7:13:03 PM
Last modification on : Friday, February 4, 2022 - 3:31:00 AM
Long-term archiving on: : Monday, September 17, 2012 - 11:21:24 AM


  • HAL Id : inria-00001062, version 1


Daniel Gracia Pérez, Hugues Berry, Olivier Temam. IDDCA: A New Clustering Approach For Sampling. MoBS: Workshop on Modeling, Benchmarking, and Simulation, Jun 2005, Madison, Wisconsin. ⟨inria-00001062⟩



Record views


Files downloads