Curriculum Vitae

Updated February 14, 2024

PDF
Contact Information
Address: Joseph Lee Greathouse
              Austin, TX USA
E-mail: joseph.l.greathouse@gmail.com
WWW: https://www.computermachines.org/
Research Interests

My work sits at the interface between computer hardware and software. This includes creating new software-visible hardware features in heterogeneous systems and optimizing software for novel hardware platforms.

My Ph.D. dissertation focused on hardware methods for accelerating software analyses such as data race detectors and memory checkers. I used existing hardware, like performance counters, for this in addition to designing new hardware mechanisms.

I have applied these interests and background towards AMD's Instinct accelerators and ROCm GPGPU software. I am a software architect and have helped design multiple hardware and software features for these products. Hardware examples include heterogeneous cache coherence protocols, virtual memory optimizations, new GPU performance monitoring infrastructure, multi-device power control algorithms, and RAS features. In the software domain, I have designed multiple user-visible software APIs that surface new hardware features, hardware-optimized algorithms for math libraries, and numerous workarounds for unexpected hardware behavior.

Professional Experience
  • Advanced Micro Devices, Inc., Austin, TX USA
    Fellow
    July 2022 - Present
    • I am a software architect in the performance engineering team in AMD's Adaptive, Embedded and AI Group. My work focuses on hardware/software interface topics for AMD's ROCm platform for general-purpose GPU computing.
    • I am a software architect for our AMD Instinct products, including MI200, MI300, and future designs. I create high-performance sparse linear algebra algorithms for AMD GPUs, optimize our machine learning and high-performance software for power and performance, architect GPU software, firmware, and hardware solutions, write training materials, provide internal and external support on advanced technical topics, and gather customer technical requirements.
    • My responsibilities cover nearly every stage in the lifetime of our products. I interface with customers, internal developers, and research teams to set advanced product development plans. I collaborate with hardware, firmware, and software architects to define these products. I work with multiple development and verification teams to define how these products will be built and tested. I provide deep technical support and debugging expertise during both pre- and post-silicon bringup; this includes designing software workarounds for hardware issues, as well as root-causing and providing fixes in RTL. After our hardware is in production, I design and optimize our software, provide customer training and support, and gather feedback to feed into the next generation of our products.

    Principal Member of Technical Staff
    July 2019 - June 2022
    • As part of the performance engineering team in AMD's Radeon Technologies Group, this work focused on software, firmware, and hardware optimizations for AMD's ROCm software platform and Instinct accelerator hardware.
    • Software architect for the AMD Instinct MI200 and MI300 programs. This included hardware feature design in areas such as cache coherence, virtual memory, performance monitoring, power control, and RAS. I also developed software interfaces for multiple new hardware features, created new user-level APIs, and delivered software and firmware workarounds for unexpected hardware behavior.
    • I wrote hundreds of pages of documentation about AMD's hardware and software, created hundreds of slides of training material, and presented dozens of training sessions to large internal and external audiences.
    • Beyond the HW/SW architecture role, I continued to create optimized sparse linear algebra algorithms for AMD GPUs, optimize our machine learning and high-performance computing codes for power and energy usage.

    Senior Member of Technical Staff
    July 2016 - June 2019
    • I started this position in AMD Research, leading a team of 10 engineers studying performance and power monitoring, estimation, and management mechanisms for CPUs and GPUs as part of AMD's PathForward exascale research program.
    • We published research focusing on power and thermal management at HPCA 2017, ARCS 2017, ITherm 2018, ICCAD 2018, and MICRO 2019. The simulation tools we build were also used in a major HPCA 2017 industry track publication.
    • This group also researched system and software topics for heterogeneous systems such as GPGPU buffer overflow protection (published at CGO 2017 and IWOCL 2018) and GPGPU system call overheads (published at IISWC 2018).
    • I then moved to be a performance engineer in AMD's ROCm GPGPU software engineering team, where I worked on optimizing our GPU software, firmware, and hardware in order to meet the demands of our GPU compute customers.
    • My work in this group included user support of our software stack, new hardware bring-up and software optimization for the Radeon Instinct MI60 and Instinct MI100 accelerators, and software feature development such as HIP Cooperative Groups and rocSPARSE algorithmic development.

    Member of Technical Staff
    July 2014 - June 2016
    • For AMD's FastForward 2 research, a major focus was extending the high-level simulation work from the FastForward program. Details of some of our extended power and performance models can be found in our papers at HPCA 2015, IISWC 2015, and IISWC 2016.
    • Beyond modeling and simulation work, I focused on further GPGPU software, including further sparse linear algebra work as described at HiPC 2015 and IWOCL 2015 and hash table algorithms published at USENIX ATC 2016.
    • I also performed system administration tasks during this period of time, setting up clusters of computers running new HSA software on AMD heterogeneous processors for multiple government labs to use in their own research.
    Senior Design Engineer
    August 2012 - June 2014
    • As part of AMD Research's FastForward research on exascale computing, my research broadly centered on creating a high-level performance and power simulator based on analytic scaling of real hardware measurements.
    • This simulator was described in a ModSim 2013 paper. We invented multiple new CPU and GPU performance and power estimation algorithms for it, including one described at USENIX ATC 2014.
    • This simulator was used as a major part of AMD Research's studies of heterogeneous CPU-GPU PIM systems, as described in MSPC 2013 and HPDC 2014.
    • During this time, I also worked with exascale proxy applications to formulate new GPGPU algorithms for AMD GPUs and APUs, such as the GPU-based sparse matrix-vector multiplication algorithm published at SC14.
  • University of Michigan, Ann Arbor, MI, USA
    Research Assistant
    May 2007 - August 2012
    • Identified methods of distributing software analyses across many users to reduce slowdowns.
    • Managed graduate and undergraduate students through development of prototype systems.
  • Kelly Services / Intel Corp., Champaign, IL, USA
    Research Contractor
    May 2010 - October 2010
    • Researched approaches for improving speed and accuracy of Intel Inspector XE data race detector.
    • Utilized unique features of Intel processors to yield orders-of-magnitude performance gains for this tool; details can be found in our ISCA 2011 publication.
  • International Business Machines Corp., Rochester, MN, USA
    Speed Team Intern
    May 2008 - August 2008
    • Designed and constructed an InfiniBand compliance verification suite that caught numerous bugs.
    • Added the suite into the IBM PowerVM I/O firmware development process and found multiple bugs.
Education
Conference Publications
  • Gabriel H. Loh, Michael J. Schulte, Mike Ignatowski, Vignesh Adhinarayanan, Shaizeen Aga, Derrick Aguren, Varun Agrawal, Ashwin M. Aji, John Alsop, Paul Bauman, Bradford M. Beckmann, Majed Valad Beigi, Sergey Blagodurov, Travis Boraten, Michael Boyer, William Brantley, Noel Chalmers, Shaoming Chen, Kevin Cheng, Michael L. Chu, David Cownie, Nicholas Curtis, Joris del Pino, Nam Duong, Alexandru Dutu, Yasuko Eckert, Christopher Erb, Chip Freitag, Joseph L. Greathouse, Sudhanva Gurumurthi, Anthony Gutierrez, Khaled Hamidouche, Sachin Hossamani, Wei Huang, Mahzabeen Islam, Nuwan Jayasena, John Kalamatianos, Onur Kayiran, Jagadish Kotra, Alan Lee, Daniel Lowell, Niti Madan, Abhinandan Majumdar, Nicholas Malaya, Srilatha Manne, Susumu Mashimo, Damon McDougall, Elliott Mednick, Michael Mishkin, Mark Nutter, Indrani Paul, Matthew Poremba, Brandon Potter, Kishore Punniyamurthy, Sooraj Puthoor, Steven E. Raasch, Karthik Rao, Greg Rodgers, Marko Scrbak, Mohammad Seyedzadeh, John Slice, Vilas Sridharan, Rene van Oostrum, Eric van Tassell, Abhinav Vishnu, Samuel Wasmundt, Mark Wilkening, Noah Wolfe, Mark Wyse, Adithya Yalavarti, Dmitri Yudanov, "A Research Retrospective on AMD's Exascale Computing Journey," Published in the Proceedings of the 50th International Symposium on Computer Architecture (ISCA 2023), June, 2023
  • Raghavendra Pradyumna Pothukuchi, Joseph L. Greathouse, Karthik Rao, Christopher Erb, Leonardo Piga, Petros Voulgaris, Josep Torrellas, "Tangram: Integrated Control of Heterogeneous Computers," Published in the Proceedings of the 52nd IEEE/ACM International Symposium on Microarchitecture (MICRO-52), October, 2019
  • Joseph L. Greathouse, Gabriel H. Loh, "Machine Learning for Performance and Power Modeling of Heterogeneous Systems," Published in the Proceedings of the 2018 International Conference on Computer Aided Design (ICCAD 2018), November, 2018
  • Arkaprava Basu, Joseph L. Greathouse, Guru Venkataramani, Ján Veselý, "Interference from GPU System Service Requests," Published in the Proceedings of the 2018 IEEE International Symposium on Workload Characterization (IISWC 2018), September, 2018 (Nominated for Best Paper)
  • Xudong An, Manish Arora, Wei Huang, William C. Brantley, Joseph L. Greathouse, "3D Numerical Analysis of Two-Phase Immersion Cooling for Electronic Components," Published in the Proceedings of the 17th IEEE Intersociety Conference on Thermomechanical Phenomena in Electronic Systems (ITherm 2018), May, 2018
  • Nicholas Malaya, Shuai Che, Joseph L. Greathouse, René van Oostrum, Michael J. Schulte, "Accelerating Matrix Processing with GPUs," Published in the Proceedings of the 24th IEEE Symposium on Computer Arithmetic (ARITH 24), July, 2017
  • Marko Ščrbak, Joseph L. Greathouse, Nuwan Jayasena, Krishna Kavi, "DVFS Space Exploration in Power Constrained Processing-in-Memory Systems," Published in the Proceedings of the 30th International Conference on Architecture of Computing Systems (ARCS 2017), April, 2017
  • Abhinandan Majumdar, Leonardo Piga, Indrani Paul, Joseph L. Greathouse, Wei Huang, David H. Albonesi, "Dynamic GPGPU Power Management using Adaptive Model Predictive Control," Published in the Proceedings of the 23rd IEEE Symposium on High Performance Computer Architecture (HPCA 2017), February, 2017
  • Thiruvengadam Vijayaraghavan, Yasuko Eckert, Gabriel H. Loh, Michael J. Schulte, Mike Ignatowski, Indrani Paul, Bradford M. Beckmann, Steven K. Reinhardt, William C. Brantley, Joseph L. Greathouse, Onur Kayiran, Matthew Poremba, Wei Huang, Arun Karunanithi, Greg Sadowski, Vilas Sridharan, Steven E. Raasch, Mitesh Meswani, "Design and Analysis of an APU for Exascale Computing," Published in the Proceedings of the 23rd IEEE Symposium on High Performance Computer Architecture (HPCA 2017 Industry Track), February, 2017
  • Christopher Erb, Mike Collins, Joseph L. Greathouse, "Dynamic Buffer Overflow Detection for GPGPUs," Published in the Proceedings of the 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2017), February, 2017
  • Vignesh Adhinarayanan, Indrani Paul, Joseph L. Greathouse, Wei Huang, Ashutosh Pattnaik, Wu-chun Feng, "Measuring and Modeling On-Chip Interconnect Power on Real Hardware," Published in the Proceedings of the 2016 IEEE International Symposium on Workload Characterization (IISWC 2016), September, 2016 (Awarded Best Paper)
  • Alex D. Breslow, Dong Ping Zhang, Joseph L. Greathouse, Nuwan Jayasena, Dean M. Tullsen, "Horton Tables: Fast Hash Tables for In-Memory Data-Intensive Computing," Published in the Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC), June, 2016
  • Mayank Daga, Joseph L. Greathouse, "Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices," Published in the Proceedings of the 2015 IEEE International Conference on High Performance Computing (HiPC 2015), December, 2015
  • Abhinandan Majumdar, Gene Wu, Kapil Dev, Joseph L. Greathouse, Indrani Paul, Wei Huang, Arjun Karthik Venugopal, Leonardo Piga, Chip Freitag, Sooraj Puthoor, "A Taxonomy of GPGPU Performance Scaling," Published in the Proceedings of the 2015 IEEE International Symposium on Workload Characterization (IISWC 2015), October, 2015
  • Gene Wu, Joseph L. Greathouse, Alexander Lyashevsky, Nuwan Jayasena, Derek Chiou, "GPGPU Performance and Power Estimation Using Machine Learning," Published in the Proceedings of the 21st IEEE Symposium on High Performance Computer Architecture (HPCA 2015), February, 2015
  • Bo Su, Junli Gu, Li Shen, Wei Huang, Joseph L. Greathouse, Zhiying Wang, "PPEP: Online Performance, Power, and Energy Prediction Framework and DVFS Space Exploration," Published in the Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47), December, 2014
  • Joseph L. Greathouse, Mayank Daga, "Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Storage Format," Published in the Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC14), November, 2014
  • Dong Ping Zhang, Nuwan Jayasena, Alexander Lyashevsky, Joseph L. Greathouse, Lifan Xu, Michael Ignatowski, "TOP-PIM: Throughput-Oriented Programmable Processing in Memory," Published in the Proceedings of the 23rd International Symposium on High Performance Parallel and Distributed Computing (HPDC '14), June, 2014 (Nominated for Best Paper)
  • Bo Su, Joseph L. Greathouse, Junli Gu, Michael Boyer, Li Shen, Zhiying Wang, "Implementing a Leading Loads Performance Predictor on Commodity Processors," Published in the Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC '14), June, 2014
  • Andrea Pellegrini, Joseph L. Greathouse, Valeria Bertacco, "Viper: Virtual Pipelines for Enhanced Reliability," Published in the Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA 2012), June, 2012
  • Joseph L. Greathouse, Hongyi Xin, Yixin Luo, Todd Austin, "A Case for Unlimited Watchpoints," Published in the Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2012), March, 2012
  • Joseph L. Greathouse, Zhiqiang Ma, Matthew I. Frank, Ramesh Peri, Todd Austin, "Demand-Driven Software Race Detection using Hardware Performance Counters," Published in the Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA 2011), June, 2011
  • Joseph L. Greathouse, Chelsea LeBlanc, Todd Austin, Valeria Bertacco, "Highly Scalable Distributed Dataflow Analysis," Published in the Proceedings of the 2011 International Symposium on Code Generation and Optimization (CGO 2011), April, 2011 (Awarded Best Student Presentation at CGO2011)
  • Joseph L. Greathouse, Ilya Wagner, David A. Ramos, Gautam Bhatnagar, Todd Austin, Valeria Bertacco and Seth Pettie, "Testudo: Heavyweight Security Analysis via Statistical Sampling," Published in the Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41), November, 2008
Workshop Publications
Software Projects
Patents
Presentations
  • "Accelerating Dynamic Software Analyses," Microsoft Research, Feb. 23, 2012
  • "On-Demand Dynamic Software Analysis," AMD Tech Topic Series, Dec. 12, 2011
  • "Hardware Support for On-Demand Software Analysis," University of Michigan CSE Graduate Student Honors Competition, Dec. 8, 2011
  • "Accelerating Dynamic Software Analyses," Microsoft Research Silicon Valley, Dec. 2, 2011
  • "Accelerating Dynamic Software Analyses," VMware, Dec. 1, 2011
  • "On-Demand Dynamic Software Analysis," Intel Labs, Nov. 29, 2011
  • "Sampling Dynamic Dataflow Analyses," University of British Columbia, Jun. 10, 2011
Videos
Posters
  • Scalable Security Vulnerability Analysis via Sampling, 2011 GSRC Annual Symposium, Nov. 16, 2011
  • Testudo: Heavyweight Security Analysis via Statistical Sampling, 2008 Engineering Graduate Symposium, University of Michigan, Nov. 7, 2008.
Teaching Experience
  • University of Michigan – Graduate Student Instructor
    January 2012 - April 2012
    EECS 570 - Parallel Computer Architecture
    Responsible for guiding multiple graduate student research projects related to parallel computing.
    Set up software infrastructure for assignments on parallel programming and cache coherency protocols.
  • University of Illinois – Undergraduate Teaching Assistant
    January 2005 - August 2006
    ECE 290 - Computer Engineering I
    Graded homework assignments and tests for four semesters
    Taught discussion section for this undergraduate digital logic course during the summer of 2006.
  • University of Illinois – Grader
    August 2005 - December 2005
    CS 433 - Computer System Organization
    Graded homework assignments for this undergraduate computer architecture course.
Professional Activities
  • Program committee member for MICRO (2022), IISWC (2020), ICPP (2020), ISPASS (2015), HPPAC (2015–2018)
  • External reviewer for MICRO (2009, 2013, 2014, 2017, 2020), HPCA (2013, 2014), IEEE CAL (2015–2017), IEEE TPDS (2017), IEEE TCAD (2017, 2018), IEEE TMSCS (2018), SC (2017), SRCS (2013), FMCAD (2010), and MDPI Computation (2018, 2020)
  • External reviewer (through Todd Austin) for ASPLOS (2012, 2013), CODES (2011), DATE (2008–2012), FMCAD (2010), HPCA (2009, 2010, 2012), ISCA (2009, 2010, 2012), and MICRO (2008, 2011, 2012), and PACT (2012)
  • Judge for SRC TechCon (2015)
  • Association for Computing Machinery, Senior Member
  • Institute for Electrical and Electronics Engineers, Senior Member
  • U of M Advanced Computer Architecture Laboratory Reading Group organizer, 2009-2010, compute cluster administrator (2008--2011)
Awards and Honors
  • Awards at Advanced Micro Devices, Inc.
    • AMD Q1 2024 Next 5% Award for work on AMD Instinct MI300 execution
    • AMD Q3 2022 Next 5% Award for work on work breaking the exaflop barrier
    • AMD Q1 2020 Next 5% Award for work on AMD's Frontier supercomputer design win
    • AMD Executive Spotlight Award: Q4 2019, Q4 2020, Q2 2021, Q2 2023 (2x)
    • AMD DCGPU Spotlight Award: Q2 2020, Q1 2021, Q3 2021, Q2 2022, Q4 2022, Q2 2023, Q4 2023 (2x)
    • AMD Research Spotlight Award: Q2 2017
  • Academic Awards and Honors
    • IISWC 2016 Best Paper Award
    • CGO 2011 Best Student Presentation Award
    • Nomination for Best Paper at IISWC 2018
    • Nomination for Best Paper at HPDC 2014
  • Awards and Honors at the University of Michigan
    • 2011 University of Michigan CSE Graduate Student Honors Competition 1st Place
    • University of Michigan EECS Departmental Fellowship, 2006-2007
  • Honors at the University of Illinois
    • Eta Kappa Nu Electrical and Computer Engineering Honor Society
    • Tau Beta Pi Engineering Honor Society
    • Illinois Chancellor's Scholar
    • Illinois Engineering James Scholar
Skills
  • Programming Languages
    C, C++, HIP, CUDA, OpenCL, x86 assembly, AMD GCN, CDNA, and RDNA assembly, Python
  • Software Systems
    Linux kernel, multiple AMD-internal simulation, firmware, and analysis tools