A Band-pass Prefetching : An Effective Prefetch Management Mechanism using Prefetch-fraction Metric in Multi-core Systems

Abstract : In multi-core systems, an application's prefetcher can interfere with the memory requests of other applications using the shared resources, such as last level cache and memory bandwidth. In order to minimize prefetcher-caused interference, prior mechanisms have been proposed to dynamically control prefetcher aggressiveness at run-time. These mechanisms use several parameters to capture prefetch usefulness as well as prefetcher-caused interference, performing aggressive control decisions. However, these mechanisms do not capture the actual interference at the shared resources and most often lead to incorrect aggressiveness control decisions. Therefore, prior works leave scope for performance improvement. Towards this end, we propose a solution to manage prefetching in multi-core systems. In particular, we make two fundamental observations: First, a strong positive correlation exists between the accuracy of a prefetcher and the amount of prefetch requests it generates relative to an application's total (demand and prefetch) requests. Second, a strong positive correlation exists between the ratio of total prefetch to demand requests and the ratio of average last level cache miss service times of demand to prefetch requests. In this paper, we propose Band-pass prefetching that builds on those two observations, a simple and low-overhead mechanism to effectively manage prefetchers in multi-core systems. Our solution consists of local and global prefetcher aggressiveness control components, which altogether, control the flow of prefetch requests between a range of prefetch to demand requests ratios. From our experiments on 16-core multi-programmed workloads, on systems using stream prefetching, we observe that Band-pass prefetching achieves 12.4% (geometric-mean) improvement on harmonic speedup over the baseline that implements no prefetching, while aggressive prefetching without prefetcher aggressiveness control and state-of-the-art HPAC, P-FST, and CAFFEINE achieve 8.2%, 8.4%, 1.4%, and 9.7%, respectively. Further evaluation of the proposed Band-pass prefetching mechanism on systems using AMPM prefetcher shows similar performance trends. For a 16-core system, Band-pass prefetching requires only a modest hardware cost of 239 bytes.
Document type :
Journal articles
Complete list of metadatas

Cited literature [41 references]  Display  Hide  Download

https://hal.inria.fr/hal-01519648
Contributor : André Seznec <>
Submitted on : Tuesday, May 9, 2017 - 8:46:19 AM
Last modification on : Thursday, March 14, 2019 - 9:44:05 AM
Long-term archiving on : Thursday, August 10, 2017 - 12:22:23 PM

File

Band-passPrefetching_CameraRea...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01519648, version 1

Citation

Aswinkumar Sridharan, Biswabandan Panda, André Seznec. A Band-pass Prefetching : An Effective Prefetch Management Mechanism using Prefetch-fraction Metric in Multi-core Systems. ACM Transactions on Architecture and Code Optimization, Association for Computing Machinery, 2017. ⟨hal-01519648⟩

Share

Metrics

Record views

600

Files downloads

319