Reducing GPU Address Translation Overhead with Virtual Caching

Yoon, Hongil; Lowe-Power, Jason; Sohi, Gurindar S.

Reducing GPU Address Translation Overhead with Virtual Caching

Files

TR1842.pdf (1.3 MB)

Date

2016-12-05T15:39:34Z

Authors

Yoon, Hongil

Lowe-Power, Jason

Sohi, Gurindar S.

Type

Technical Report

Abstract

Heterogeneous computing on tightly-integrated CPU-GPU systems is ubiquitous, and to increase programmability, many of these systems support virtual address accesses from GPU hardware. However, there is no free lunch. Supporting virtual memory entails address translations on every memory access, which greatly impacts performance (about 77% performance degradation on average). To mitigate this overhead, we propose a software-transparent, practical GPU virtual cache hierarchy. We show that a virtual cache hierarchy is an effective GPU address translation bandwidth filter. We make several empirical observations advocating for GPU virtual caches: (1) mirroring CPU-style memory management unit in GPUs is not effective, because GPU workloads show very high Translation Lookaside Buffer (TLB) miss ratio and high miss bandwidth. (2) many requests that miss in TLBs find corresponding valid data in the GPU cache hierarchy. (3) The GPU’s accelerator nature simplifies implementing a deep virtual cache hierarchy (i.e., fewer virtual address synonyms and homonyms). We evaluate both L1-only virtual cache designs and an entire virtual cache hierarchy (private L1s and a shared L2 caches). We find that virtual caching on GPUs considerably improves performance. Our experimental evaluation shows that the proposed entire GPU virtual cache design significantly reduces the overheads of virtual address translation providing an average speedup of 1.77x over a baseline physically cached system. L1-only virtual cache designs show modest performance benefits (1.35x speedup). By using a whole GPU virtual cache hierarchy, we can obtain additional performance benefits.

Keywords

Virtual Caching, TLBs, Virtually indexed virtually tagged caches, Synonyms, GPU, Address Translation, GPU Virtual Cache Hierarchy

Citation

TR1842

URI

http://digital.library.wisc.edu/1793/75577

Collections

CS Technical Reports

Full item page

Reducing GPU Address Translation Overhead with Virtual Caching

Files

Date

Authors

Advisors

License

DOI

Type

Journal Title

Journal ISSN

Volume Title

Publisher

Grantor

Abstract

Description

Keywords

Related Material and Data

Citation

Sponsorship

URI

Collections

Endorsement

Review

Supplemented By

Referenced By