Accelerating Matrix Processing with GPUs


Published in the Proceedings of the 24th IEEE Symposium on Computer Arithmetic (ARITH 24), July, 2017 (acceptance rate: 22/50 = 40%)


Nicholas Malaya, Shuai Che, Joseph L. Greathouse, René van Oostrum, Michael J. Schulte


Matrix operations are common and expensive computations in a variety of applications. They occur frequently in high-performance computing, graphics, graph processing, and machine learning applications.

This paper discusses how to map a variety of important matrix computations, including sparse matrix-vector multiplication (SpMV), sparse triangle solve (SpTS), graph processing, and dense matrixmatrix multiplication, to GPUs. Since many emerging systems will use heterogeneous architectures (e.g. CPUs and GPUs) to attain the desired performance targets under strict power constraints, this paper discusses implications and future research for matrix processing with heterogeneous designs.

Conclusions common to the matrix operations discussed in this paper are: (1) Future algorithms should be written to ensure that the essential computations fit into local memory, which may require direct programmer management. (2) Algorithms are needed that expose high levels of parallelism. (3) While the scale of computation is often sufficient to support algorithms with superior asymptotic order, additional considerations, such as memory capacity and bandwidth, must also be carefully managed. (4) Libraries should be used to provide portable performance.




PDF Copyright © 2017 IEEE. Hosted on this personal website as per this IEEE policy.