Curriculum Vitae
Updated Februrary 23, 2023
Contact Information
Address: Joseph Lee GreathouseAustin, TX USA
E-mail: joseph.l.greathouse@gmail.com
WWW: http://www.computermachines.org/
Research Interests
My research and development work focuses on the hardware/software interface. Put succinctly, I strive to make hardware do new and potentially unintended things. This includes creating new software-visible hardware features in heterogeneous systems and optimizing software for novel hardware platforms.
Hardware performance monitors and on-chip firmware can yield intriguing data with a few well-placed tweaks. Power, energy, and thermal optimizations present a plethora of unsolved problems, and heterogeneous processors add an interesting dimension, as different styles of cores must share thermal and power infrastructure.
My Ph.D. dissertation focused on hardware methods for accelerating software analyses such as data race detectors and memory checkers. I used existing hardware, like performance counters, for this in addition to designing new hardware mechanisms.
I have applied these interests and background towards AMD's Instinct accelerators and ROCm GPGPU software stack. I am a software architect for these products, and have helped designed multiple hardware and software features for these products. Hardware examples the cache coherence protocols in our heterogeneous systems, optimizations in our virtual memory system, new GPU performance monitoring infrastructure, multi-device power control algorithms, and RAS features. In the software domain, I have designed multiple user-visible software APIs that surface new hardware features, hardware-optimized algorithms for math libraries, and numerous workarounds for unexpected hardware behavior.
Professional Experience
- Advanced Micro Devices, Inc., Austin, TX USA
Fellow
July 2022 - Present- I am a software architect in the performance engineering team in AMD's Radeon Technologies Group. My work focuses on hardware/software interface topics for AMD's ROCm platform for general-purpose GPU computing.
- My work covers a wide variety of topics. I create optimized sparse linear algebra algorithms for AMD GPUs, optimize our machine learning and high-performance computing codes for power and energy usage, architect optimized GPU software solutions, define new hardware mechanisms for power and performance analysis and optimization, and write user training materials.
Principal Member of Technical Staff
July 2019 - June 2022- As part of the performance engineering team in AMD's Radeon Technologies Group, this work focused on software, firmware, and hardware optimizations for AMD's ROCm software platform and Instinct accelerator hardware.
- Software architect for the AMD Instinct MI200 and MI300 programs. This included hardware feature design in areas such as cache coherence, virtual memory, performance monitoring, power control, and RAS. I also developed software interfaces for multiple new hardware features, created new user-level APIs, and delivered software and firmware workarounds for unexpected hardware behavior.
- I wrote hundreds of pages of documentation about AMD's hardware and software, created hundreds of slides of training material, and presented dozens of training sessions to large internal and external audiences.
- Beyond the HW/SW architecture role, I continued to create optimized sparse linear algebra algorithms for AMD GPUs, optimize our machine learning and high-performance computing codes for power and energy usage.
Senior Member of Technical Staff
July 2016 - June 2019- I started this position in AMD Research, leading a team of 10 engineers studying performance and power monitoring, estimation, and management mechanisms for CPUs and GPUs as part of AMD's PathForward exascale research program.
- We published research focusing on power and thermal management at HPCA 2017, ARCS 2017, ITherm 2018, ICCAD 2018, and MICRO 2019. The simulation tools we build were also used in a major HPCA 2017 industry track publication.
- This group also researched system and software topics for heterogeneous systems such as GPGPU buffer overflow protection (published at CGO 2017 and IWOCL 2018) and GPGPU system call overheads (published at IISWC 2018).
- I then moved to be a performance engineer in AMD's ROCm GPGPU software engineering team, where I worked on optimizing our GPU software, firmware, and hardware in order to meet the demands of our GPU compute customers.
- My work in this group included user support of our software stack, new hardware bring-up and software optimization for the Radeon Instinct MI60 and Instinct MI100 accelerators, and software feature development such as HIP Cooperative Groups and rocSPARSE algorithmic development.
Member of Technical Staff
July 2014 - June 2016- For AMD's FastForward 2 research, a major focus was extending the high-level simulation work from the FastForward program. Details of some of our extended power and performance models can be found in our papers at HPCA 2015, IISWC 2015, and IISWC 2016.
- Beyond modeling and simulation work, I focused on further GPGPU software, including further sparse linear algebra work as described at HiPC 2015 and IWOCL 2015 and hash table algorithms published at USENIX ATC 2016.
- I also performed system administration tasks during this period of time, setting up clusters of computers running new HSA software on AMD heterogeneous processors for multiple government labs to use in their own research.
August 2012 - June 2014- As part of AMD Research's FastForward research on exascale computing, my research broadly centered on creating a high-level performance and power simulator based on analytic scaling of real hardware measurements.
- This simulator was described in a ModSim 2013 paper. We invented multiple new CPU and GPU performance and power estimation algorithms for it, including one described at USENIX ATC 2014.
- This simulator was used as a major part of AMD Research's studies of heterogeneous CPU-GPU PIM systems, as described in MSPC 2013 and HPDC 2014.
- During this time, I also worked with exascale proxy applications to formulate new GPGPU algorithms for AMD GPUs and APUs, such as the GPU-based sparse matrix-vector multiplication algorithm published at SC14.
- University of Michigan, Ann Arbor, MI, USA
Research Assistant
May 2007 - August 2012- Identified methods of distributing software analyses across many users to reduce slowdowns.
- Managed graduate and undergraduate students through development of prototype systems.
- Kelly Services / Intel Corp., Champaign, IL, USA
Research Contractor
May 2010 - October 2010- Researched approaches for improving speed and accuracy of Intel Inspector XE data race detector.
- Utilized unique features of Intel processors to yield orders-of-magnitude performance gains.
- International Business Machines Corp., Rochester, MN, USA
Speed Team Intern
May 2008 - August 2008- Designed and constructed an InfiniBand compliance verification suite that caught numerous bugs.
- Added the suite into the IBM PowerVM I/O firmware development process and found multiple bugs.
Education
- University of Michigan, Ann Arbor
Ph.D., Computer Science and Engineering
May 2012
Thesis Topic: Hardware Mechanisms for Distributed Dynamic Software Analysis
Advisor: Prof. Todd Austin - University of Michigan, Ann Arbor
M.S.E. Computer Science and Engineering
May 2008
Concentration: Hardware Systems
GPA: 7.73/9.0 (3.79/4.0) - University of Illinois at Urbana-Champaign
B.S. Computer Engineering with Honors
May 2006
Minor: International Engineering – Japanese
GPA: 3.71/4.0
Conference Publications
- Raghavendra Pradyumna Pothukuchi, Joseph L. Greathouse, Karthik Rao, Christopher Erb, Leonardo Piga, Petros Voulgaris, Josep Torrellas, "Tangram: Integrated Control of Heterogeneous Computers," Published in the Proceedings of the 52nd IEEE/ACM International Symposium on Microarchitecture (MICRO-52), October, 2019
- Joseph L. Greathouse, Gabriel H. Loh, "Machine Learning for Performance and Power Modeling of Heterogeneous Systems," Published in the Proceedings of the 2018 International Conference on Computer Aided Design (ICCAD 2018), November, 2018
- Arkaprava Basu, Joseph L. Greathouse, Guru Venkataramani, Ján Veselý, "Interference from GPU System Service Requests," Published in the Proceedings of the 2018 IEEE International Symposium on Workload Characterization (IISWC 2018), September, 2018 (Nominated for Best Paper)
- Xudong An, Manish Arora, Wei Huang, William C. Brantley, Joseph L. Greathouse, "3D Numerical Analysis of Two-Phase Immersion Cooling for Electronic Components," Published in the Proceedings of the 17th IEEE Intersociety Conference on Thermomechanical Phenomena in Electronic Systems (ITherm 2018), May, 2018
- Nicholas Malaya, Shuai Che, Joseph L. Greathouse, René van Oostrum, Michael J. Schulte, "Accelerating Matrix Processing with GPUs," Published in the Proceedings of the 24th IEEE Symposium on Computer Arithmetic (ARITH 24), July, 2017
- Marko Ščrbak, Joseph L. Greathouse, Nuwan Jayasena, Krishna Kavi, "DVFS Space Exploration in Power Constrained Processing-in-Memory Systems," Published in the Proceedings of the 30th International Conference on Architecture of Computing Systems (ARCS 2017), April, 2017
- Abhinandan Majumdar, Leonardo Piga, Indrani Paul, Joseph L. Greathouse, Wei Huang, David H. Albonesi, "Dynamic GPGPU Power Management using Adaptive Model Predictive Control," Published in the Proceedings of the 23rd IEEE Symposium on High Performance Computer Architecture (HPCA 2017), February, 2017
- Thiruvengadam Vijayaraghavan, Yasuko Eckert, Gabriel H. Loh, Michael J. Schulte, Mike Ignatowski, Indrani Paul, Bradford M. Beckmann, Steven K. Reinhardt, William C. Brantley, Joseph L. Greathouse, Onur Kayiran, Matthew Poremba, Wei Huang, Arun Karunanithi, Greg Sadowski, Vilas Sridharan, Steven E. Raasch, Mitesh Meswani, "Design and Analysis of an APU for Exascale Computing," Published in the Proceedings of the 23rd IEEE Symposium on High Performance Computer Architecture (HPCA 2017 Industry Track), February, 2017
- Christopher Erb, Mike Collins, Joseph L. Greathouse, "Dynamic Buffer Overflow Detection for GPGPUs," Published in the Proceedings of the 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2017), February, 2017
- Vignesh Adhinarayanan, Indrani Paul, Joseph L. Greathouse, Wei Huang, Ashutosh Pattnaik, Wu-chun Feng, "Measuring and Modeling On-Chip Interconnect Power on Real Hardware," Published in the Proceedings of the 2016 IEEE International Symposium on Workload Characterization (IISWC 2016), September, 2016 (Awarded Best Paper)
- Alex D. Breslow, Dong Ping Zhang, Joseph L. Greathouse, Nuwan Jayasena, Dean M. Tullsen, "Horton Tables: Fast Hash Tables for In-Memory Data-Intensive Computing," Published in the Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC), June, 2016
- Mayank Daga, Joseph L. Greathouse, "Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices," Published in the Proceedings of the 2015 IEEE International Conference on High Performance Computing (HiPC 2015), December, 2015
- Abhinandan Majumdar, Gene Wu, Kapil Dev, Joseph L. Greathouse, Indrani Paul, Wei Huang, Arjun Karthik Venugopal, Leonardo Piga, Chip Freitag, Sooraj Puthoor, "A Taxonomy of GPGPU Performance Scaling," Published in the Proceedings of the 2015 IEEE International Symposium on Workload Characterization (IISWC 2015), October, 2015
- Gene Wu, Joseph L. Greathouse, Alexander Lyashevsky, Nuwan Jayasena, Derek Chiou, "GPGPU Performance and Power Estimation Using Machine Learning," Published in the Proceedings of the 21st IEEE Symposium on High Performance Computer Architecture (HPCA 2015), February, 2015
- Bo Su, Junli Gu, Li Shen, Wei Huang, Joseph L. Greathouse, Zhiying Wang, "PPEP: Online Performance, Power, and Energy Prediction Framework and DVFS Space Exploration," Published in the Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47), December, 2014
- Joseph L. Greathouse, Mayank Daga, "Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Storage Format," Published in the Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC14), November, 2014
- Dong Ping Zhang, Nuwan Jayasena, Alexander Lyashevsky, Joseph L. Greathouse, Lifan Xu, Michael Ignatowski, "TOP-PIM: Throughput-Oriented Programmable Processing in Memory," Published in the Proceedings of the 23rd International Symposium on High Performance Parallel and Distributed Computing (HPDC '14), June, 2014 (Nominated for Best Paper)
- Bo Su, Joseph L. Greathouse, Junli Gu, Michael Boyer, Li Shen, Zhiying Wang, "Implementing a Leading Loads Performance Predictor on Commodity Processors," Published in the Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC '14), June, 2014
- Andrea Pellegrini, Joseph L. Greathouse, Valeria Bertacco, "Viper: Virtual Pipelines for Enhanced Reliability," Published in the Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA 2012), June, 2012
- Joseph L. Greathouse, Hongyi Xin, Yixin Luo, Todd Austin, "A Case for Unlimited Watchpoints," Published in the Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2012), March, 2012
- Joseph L. Greathouse, Zhiqiang Ma, Matthew I. Frank, Ramesh Peri, Todd Austin, "Demand-Driven Software Race Detection using Hardware Performance Counters," Published in the Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA 2011), June, 2011
- Joseph L. Greathouse, Chelsea LeBlanc, Todd Austin, Valeria Bertacco, "Highly Scalable Distributed Dataflow Analysis," Published in the Proceedings of the 2011 International Symposium on Code Generation and Optimization (CGO 2011), April, 2011 (Awarded Best Student Presentation at CGO2011)
- Joseph L. Greathouse, Ilya Wagner, David A. Ramos, Gautam Bhatnagar, Todd Austin, Valeria Bertacco and Seth Pettie, "Testudo: Heavyweight Security Analysis via Statistical Sampling," Published in the Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41), November, 2008
Workshop Publications
- Christopher Erb, Joseph L. Greathouse, "clARMOR: A Dynamic Buffer Overflow Detector for OpenCL Kernels," Published in the Proceedings of the International Workshop on OpenCL (IWOCL 2018), May, 2018
- Joseph L. Greathouse, Kent Knox, Jakub Poła, Kiran Varaganti, Mayank Daga, "clSPARSE: A Vendor-Optimized Open-Source Sparse BLAS Library," Published in the Proceedings of the International Workshop on OpenCL (IWOCL 2016), April, 2016
- Yingying Tian, Sooraj Puthoor, Joseph L. Greathouse, Bradford M. Beckmann, Daniel Jiménez, "Adaptive GPU Cache Bypassing," Published in the Proceedings of the 8th Workshop on General Purpose Processing on GPUs (GPGPU-8), February, 2015
- Adam McLaughlin, Indrani Paul, Joseph L. Greathouse, Srilatha Manne, Sudhakar Yalamanchili, "A Power Characterization and Management of GPU Graph Traversal," Published at the Fourth Workshop on Architectures and Systems for Big Data (ASBD 2014), June, 2014
- Joseph L. Greathouse, Alexander Lyashevsky, Mitesh Meswani, Nuwan Jayasena, Michael Ignatowski, "Simulation of Exascale Nodes through Runtime Hardware Monitoring," Published at the Workshop on Modeling & simulation of Exascale Systems & Applications (ModSim 2013), September, 2013
- Dong Ping Zhang, Nuwan Jayasena, Alexander Lyashevsky, Joseph L. Greathouse, Mitesh Meswani, Mark Nutter, Michael Ignatowski, "A New Perspective on Processing-in-memory Architecture Design," Published at the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness (MSPC 2013), June, 2013.
- Joseph L. Greathouse, Todd Austin, "Position Paper: The Potential of Sampling for Dynamic Analysis," Published in the Proceedings of the 6th ACM SIGPLAN Workshop on Programming Languages and Analysis for Security (PLAS 2011), June, 2011
Software Projects
- AMD Research Instruction Based Sampling (IBS) Toolkit
https://github.com/jlgreathouse/AMD_IBS_Toolkit - clSPARSE - A Vendor-Optimized Sparse BLAS Library for GPUs Using OpenCL
https://github.com/clMathLibraries/clSPARSE - clARMOR - A Buffer Overflow Detector for OpenCL GPU Kernels
https://github.com/ROCm-Developer-Tools/clARMOR
Patents
- Shijia Wei, Joseph L. Greathouse, John Kalamatianos, "Per-instruction Energy Debugging Using Instruction Sampling Hardware". U.S. Patent Number 11,556,162, Granted January 17, 2023.
- Gregory P. Rodgers, Joseph L. Greathouse, "Compiler-initiated Tile Replacement to Enable Hardware Acceleration Resources". U.S. Patent Number 11,347,486, Granted May 31, 2022.
- Arkaprava Basu, Joseph L. Greathouse, "Enforcing Central Processing Unit Quality of Service Guarantees When Servicing Accelerator Requests". U.S. Patent Number 11,275,613, Granted March 15, 2022.
- Karthik Rao, Wei Huang, Xudong An, Manish Arora, Joseph L. Greathouse, "Runtime Localized Cooling of High-Performance Processors". U.S. Patent Number 11,137,809, Granted October 5, 2021.
- Khaled Hamidouche, Michael W. LeBeane, Nicholas P. Malaya, Joseph L. Greathouse, "Optimized and Scalable Sparse Triangular Linear Systems on Networks of Accelerators". U.S. Patent Number 10,936,697, Granted March 2, 2021.
- Raghavendra Pradyumna Pothukuchi, Joseph L. Greathouse, Leonardo de Paula Rosa Piga, "Distributed Multi-Input Multi-Output Control Theoretic Method to Manage Heterogeneous Systems". U.S. Patent Number 10,928,789, Granted February 23, 2021
- Jagadish B. Kotra, Karthik Rao, Joseph L. Greathouse, "Method and Apparatus for Temperature-Gradient Aware Data-Placement for 3D Stacked DRAMs". U.S. Patent Number 10,725,670, Granted July 28, 2020.
- Joseph L. Greathouse, Mitesh R. Meswani, Sooraj Puthoor, Dmitri Yudanov, James M. O'Connor, "Heterogeneous Graphics Processing Unit for Scheduling Thread Groups for Execution on Variable Width SIMD Units". U.S. Patent Number 10,713,059, Granted July 14, 2020.
- Joseph L. Greathouse, "High-Performance Sparse Triangular Solve on Graphics Processing Units". U.S. Patent Number 10,691,772, Granted June 23, 2020.
- Arkaprava Basu, Joseph L. Greathouse, "Dynamically Adapting Mechanism for Translation Lookaside Buffer Shootdowns". U.S. Patent Number 10,552,339, Granted February 4, 2020.
- Joseph L. Greathouse, Christopher D. Erb, Michael G. Collins, "Detecting Buffer Overflows in General-Purpose GPU Applications". U.S. Patent Number 10,067,710, Granted September 4, 2018.
- Dmitri Yudanov, Sergey Blagodurov, Arkaprava Basu, Sooraj Puthoor, Joseph L. Greathouse, "Predicting a Context Portion to Move Between a Context Buffer and Registers Based on Context Portions Previously Used by at least One Other Thread". U.S. Patent Number 10,019,283, Granted July 10, 2018.
- Leonardo de Paula Rosa Piga, Abhinandan Majumdar, Indrani Paul, Wei Huang, Manish Arora, Joseph L. Greathouse, "Hardware Accuracy Counters for Application Precision and Quality Feedback". U.S. Patent Number 9,990,203, Granted June 5, 2018.
- Mayank Daga, Joseph L. Greathouse, "Efficient Sparse Matrix-Vector Multiplication on Parallel Processors". U.S. Patent Number 9,697,176. Granted July 4, 2017.
- Joseph L. Greathouse, David S. Christie, "Randomly Branching Using Hardware Watchpoints". U.S. Patent Number 9,483,379. Granted November 1, 2016.
- Joseph L. Greathouse, David S. Christie, "Randomly Branching Using Performance Counters". U.S. Patent Number 9,448,909. Granted September 20, 2016.
- Joseph L. Greathouse, Anton Chernoff, "User-level Hardware Branch Records". U.S. Patent Number 9,372,733. Granted June 21, 2016.
Presentations
- "Accelerating Dynamic Software Analyses," Microsoft Research, Feb. 23, 2012
- "On-Demand Dynamic Software Analysis," AMD Tech Topic Series, Dec. 12, 2011
- "Hardware Support for On-Demand Software Analysis," University of Michigan CSE Graduate Student Honors Competition, Dec. 8, 2011
- "Accelerating Dynamic Software Analyses," Microsoft Research Silicon Valley, Dec. 2, 2011
- "Accelerating Dynamic Software Analyses," VMware, Dec. 1, 2011
- "On-Demand Dynamic Software Analysis," Intel Labs, Nov. 29, 2011
- "Sampling Dynamic Dataflow Analyses," University of British Columbia, Jun. 10, 2011
Posters
- Scalable Security Vulnerability Analysis via Sampling, 2011 GSRC Annual Symposium, Nov. 16, 2011
- Testudo: Heavyweight Security Analysis via Statistical Sampling, 2008 Engineering Graduate Symposium, University of Michigan, Nov. 7, 2008.
Teaching Experience
- University of Michigan – Graduate Student Instructor
January 2012 - April 2012
EECS 570 - Parallel Computer Architecture
Responsible for guiding multiple graduate student research projects related to parallel computing.
Set up software infrastructure for assignments on parallel programming and cache coherency protocols. - University of Illinois – Undergraduate Teaching Assistant
January 2005 - August 2006
ECE 290 - Computer Engineering I
Graded homework assignments and tests for four semesters
Taught discussion section for this undergraduate digital logic course during the summer of 2006. - University of Illinois – Grader
August 2005 - December 2005
CS 433 - Computer System Organization
Graded homework assignments for this undergraduate computer architecture course.
Professional Activities
- Program committee member for MICRO (2022), IISWC (2020), ICPP (2020), ISPASS (2015), HPPAC (2015–2018)
- External reviewer for MICRO (2009, 2013, 2014, 2017, 2020), HPCA (2013, 2014), IEEE CAL (2015–2017), IEEE TPDS (2017), IEEE TCAD (2017, 2018), IEEE TMSCS (2018), SC (2017), SRCS (2013), FMCAD (2010), and MDPI Computation (2018, 2020)
- External reviewer (through Todd Austin) for ASPLOS (2012, 2013), CODES (2011), DATE (2008–2012), FMCAD (2010), HPCA (2009, 2010, 2012), ISCA (2009, 2010, 2012), and MICRO (2008, 2011, 2012), and PACT (2012)
- Judge for SRC TechCon (2015)
- Association for Computing Machinery, Senior Member
- Institute for Electrical and Electronics Engineers, Senior Member
- U of M Advanced Computer Architecture Laboratory Reading Group organizer, 2009-2010, compute cluster administrator (2008--2011)
Awards and Honors
- Awards at Advanced Micro Devices, Inc.
- AMD Q3 2022 Next 5% Award for work on work breaking the exaflop barrier
- AMD Q1 2020 Next 5% Award for work on AMD's Frontier supercomputer design win
- AMD Executive Spotlight Award: Q4 2019, Q4 2022, Q2 2021
- AMD DCGPU Recognition Award: Q1 2021, Q2 2022, Q4 2022
- AMD Spotlight Award: Q2 2017
- Academic Awards and Honors
- IISWC 2016 Best Paper Award
- CGO 2011 Best Student Presentation Award
- Nomination for Best Paper at IISWC 2018
- Nomination for Best Paper at HPDC 2014
- Awards and Honors at the University of Michigan
- 2011 University of Michigan CSE Graduate Student Honors Competition 1st Place
- University of Michigan EECS Departmental Fellowship, 2006-2007
- Honors at the University of Illinois
- Eta Kappa Nu Electrical and Computer Engineering Honor Society
- Tau Beta Pi Engineering Honor Society
- Illinois Chancellor's Scholar
- Illinois Engineering James Scholar
Skills
- Programming Languages
C, C++, HIP, CUDA, OpenCL, x86 assembly, AMD GCN, CDNA, and RDNA assembly, Python - Software Systems
Linux kernel, multiple AMD-internal simulation, firmware, and analysis tools