Title
Adaptive GPU Cache BypassingConference
Published in the Proceedings of the 8th Workshop on General Purpose Processing on GPUs (GPGPU-8), June, 2014 (acceptance rate: 11/17 ≈ 65%)Authors
Yingying Tian, Sooraj Puthoor, Joseph L. Greathouse, Bradford M. Beckmann, Daniel JiménezAbstract
Modern graphics processing units (GPUs) include hardware-controlled caches to reduce bandwidth requirements and energy consumption. However, current GPU cache hierarchies are inefficient for general purpose GPU (GPGPU) computing. GPGPU workloads tend to include data structures that would not fit in any reasonably sized caches, leading to very low cache hit rates. This problem is exacerbated by the design of current GPUs, which share small caches between many threads. Caching these streaming data structures needlessly burns power while evicting data that may otherwise fit into the cache.
We propose a GPU cache management technique to improve the efficiency of small GPU caches while further reducing their power consumption. It adaptively bypasses the GPU cache for blocks that are unlikely to be referenced again before being evicted. This technique saves energy by avoiding needless insertions and evictions while avoiding cache pollution, resulting in better performance. We show that, with a 16KB L1 data cache, dynamic bypassing achieves similar performance to a double-sized L1 cache while reducing energy consumption by 25% and power by 18%.
The technique is especially interesting for programs that do not use programmer-managed scratchpad memories. We give a case study to demonstrate the inefficiency of current GPU caches compared to programmer-managed scratchpad memories and show the extent to which cache bypassing can make up for the potential performance loss where the effort to program scratchpad memories is impractical.