IDDCA: A New Clustering Approach For Sampling

Daniel Gracia Pérez; Hugues Berry; Olivier Temam

Communication Dans Un Congrès Année : 2005

IDDCA: A New Clustering Approach For Sampling

(1) , (1) , (1)

Daniel Gracia Pérez

Fonction : Auteur
PersonId : 830143

Architectures, Languages and Compilers to Harness the End of Moore Years

Hugues Berry

Fonction : Auteur
PersonId : 7219
IdHAL : huguesberry
ORCID : 0000-0003-3470-683X
IdRef : 14027278X

Architectures, Languages and Compilers to Harness the End of Moore Years

Olivier Temam

Fonction : Auteur
PersonId : 830062

Architectures, Languages and Compilers to Harness the End of Moore Years

Résumé

Clustering methods are machine-learning algorithms that can be used to easily select the most representative samples within a huge program trace. k-means is a popular clustering method for sampling. While k-means performs well, it has several shortcomings: (1) it depends on a random initialization, so that clustering results may vary across runs; (2) the maximal number of clusters is a user-selected parameter, but its optimal value can be benchmark/trace-dependent; (3) k-means is a multi-pass algorithm which may be less practical for a large number of intervals. To solve these issues, we adapted an alternative clustering method, called DCA, to the issue of sampling. Unlike k-means, DCA and its sampling-specific adaptation, IDDCA, do not require the user to be exposed to internal clustering parameters: it dynamically defines the number of clusters for each target program and the method parameters dynamically adapt to the target program. For an ordered input (e.g., a trace of intervals), the method is deterministic. Finally, it is an online and thus single-pass algorithm, resulting in a significant execution time gain over an existing and popular k-means implementation. Within the context of a variable-size sampling approach, we show that IDDCA can achieve an average CPI error of 1.62% over the 26 SPEC benchmarks, with a maximum error of 5.72% and an average of 403 million instructions.

Domaines

Architectures Matérielles [cs.AR]

Fichier principal

article_clustering.pdf (1.66 Mo)

Hugues Berry : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00001062

Soumis le : dimanche 29 janvier 2006-19:13:03

Dernière modification le : lundi 12 février 2024-10:38:04

Archivage à long terme le : lundi 17 septembre 2012-11:21:24

Dates et versions

inria-00001062 , version 1 (29-01-2006)

Identifiants

HAL Id : inria-00001062 , version 1

Citer

Daniel Gracia Pérez, Hugues Berry, Olivier Temam. IDDCA: A New Clustering Approach For Sampling. MoBS: Workshop on Modeling, Benchmarking, and Simulation, Jun 2005, Madison, Wisconsin. ⟨inria-00001062⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

EC-PARIS CNRS INRIA UMR8623 INRIA2 UNIV-PARIS-SACLAY

313 Consultations

135 Téléchargements

IDDCA: A New Clustering Approach For Sampling

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager