Prediction-based superpage-friendly TLB designs

Abstract : —This work demonstrates that a set of commercial and scale-out applications exhibit significant use of superpages and thus suffer from the fixed and small superpage TLB structures of some modern core designs. Other processors better cope with superpages at the expense of using power-hungry and slow fully-associative TLBs. We consider alternate designs that allow all pages to freely share a single, power-efficient and fast set-associative TLB. We propose a prediction-guided multi-grain TLB design that uses a superpage prediction mechanism to avoid multiple lookups in the common case. In addition, we evaluate the previously proposed skewed TLB [1] which builds on principles similar to those used in skewed associative caches [2]. We enhance the original skewed TLB design by using page size prediction to increase its effective associativity. Our prediction-based multi-grain TLB design delivers more hits and is more power efficient than existing alternatives. The predictor uses a 32-byte prediction table indexed by base register values. I. INTRODUCTION Over the last 50 years virtual memory has been an intrinsic facility of computer systems, providing each process the illusion of an equally large and contiguous address space while enforcing isolation and access control. Page tables act as gate-keepers; they maintain the mappings of virtual pages to physical frames (i.e., translations) along with additional information (e.g., access privileges). Except for a reserved part of memory, any code or data structure which currently resides in the computer's physical memory has such a translation. Page tables are usually organized as multi-level, hierarchical tables with four levels being common for 64-bit systems. Therefore multiple sequential memory references are necessary to retrieve the translation of the smallest supported page size. Hardware Translation Lookaside Buffers (TLBs) cache translations which are the result of accessing the page table (page-walk). A TLB access is in the critical path of each instruction fetch and memory reference; the translation is needed to complete the tag comparison in physically-tagged L1 caches. Thus, a short TLB latency is crucial. There are technology trends that compound making TLB performance and energy critical in today's systems. Physical memory sizes and application footprints have been increasing without a commensurate increase in TLB size and thus coverage. As a result, while TLBs still reap the benefits of spatial and temporal locality due to their entries' coarse tracking granularity, they now fall short of growing workload footprints. The use of superpages (i.e., large contiguous virtual memory regions which map to contiguous physical frames) can extend
Type de document :
Communication dans un congrès
21st IEEE symposium on High Performance Computer Architecture, 2015, San Francisco, United States. Proceedings of the 21st IEEE symposium on High Performance Computer Architecture, 2015, 〈10.1109/HPCA.2015.7056034〉
Liste complète des métadonnées

https://hal.inria.fr/hal-01193176
Contributeur : André Seznec <>
Soumis le : vendredi 4 septembre 2015 - 16:39:33
Dernière modification le : mercredi 10 octobre 2018 - 21:38:02

Identifiants

Citation

Misel-Myrto Papadopoulou, Xin Tong, André Seznec, Andreas Moshovos. Prediction-based superpage-friendly TLB designs. 21st IEEE symposium on High Performance Computer Architecture, 2015, San Francisco, United States. Proceedings of the 21st IEEE symposium on High Performance Computer Architecture, 2015, 〈10.1109/HPCA.2015.7056034〉. 〈hal-01193176〉

Partager

Métriques

Consultations de la notice

624