Curriculum Vitae

Updated Februrary 23, 2023

PDF
Contact Information
Address: Joseph Lee Greathouse
              Austin, TX USA
E-mail: joseph.l.greathouse@gmail.com
WWW: http://www.computermachines.org/
Research Interests

My research and development work focuses on the hardware/software interface. Put succinctly, I strive to make hardware do new and potentially unintended things. This includes creating new software-visible hardware features in heterogeneous systems and optimizing software for novel hardware platforms.

Hardware performance monitors and on-chip firmware can yield intriguing data with a few well-placed tweaks. Power, energy, and thermal optimizations present a plethora of unsolved problems, and heterogeneous processors add an interesting dimension, as different styles of cores must share thermal and power infrastructure.

My Ph.D. dissertation focused on hardware methods for accelerating software analyses such as data race detectors and memory checkers. I used existing hardware, like performance counters, for this in addition to designing new hardware mechanisms.

I have applied these interests and background towards AMD's Instinct accelerators and ROCm GPGPU software stack. I am a software architect for these products, and have helped designed multiple hardware and software features for these products. Hardware examples the cache coherence protocols in our heterogeneous systems, optimizations in our virtual memory system, new GPU performance monitoring infrastructure, multi-device power control algorithms, and RAS features. In the software domain, I have designed multiple user-visible software APIs that surface new hardware features, hardware-optimized algorithms for math libraries, and numerous workarounds for unexpected hardware behavior.

Professional Experience
  • Advanced Micro Devices, Inc., Austin, TX USA
    Fellow
    July 2022 - Present
    • I am a software architect in the performance engineering team in AMD's Radeon Technologies Group. My work focuses on hardware/software interface topics for AMD's ROCm platform for general-purpose GPU computing.
    • My work covers a wide variety of topics. I create optimized sparse linear algebra algorithms for AMD GPUs, optimize our machine learning and high-performance computing codes for power and energy usage, architect optimized GPU software solutions, define new hardware mechanisms for power and performance analysis and optimization, and write user training materials.

    Principal Member of Technical Staff
    July 2019 - June 2022
    • As part of the performance engineering team in AMD's Radeon Technologies Group, this work focused on software, firmware, and hardware optimizations for AMD's ROCm software platform and Instinct accelerator hardware.
    • Software architect for the AMD Instinct MI200 and MI300 programs. This included hardware feature design in areas such as cache coherence, virtual memory, performance monitoring, power control, and RAS. I also developed software interfaces for multiple new hardware features, created new user-level APIs, and delivered software and firmware workarounds for unexpected hardware behavior.
    • I wrote hundreds of pages of documentation about AMD's hardware and software, created hundreds of slides of training material, and presented dozens of training sessions to large internal and external audiences.
    • Beyond the HW/SW architecture role, I continued to create optimized sparse linear algebra algorithms for AMD GPUs, optimize our machine learning and high-performance computing codes for power and energy usage.

    Senior Member of Technical Staff
    July 2016 - June 2019
    • I started this position in AMD Research, leading a team of 10 engineers studying performance and power monitoring, estimation, and management mechanisms for CPUs and GPUs as part of AMD's PathForward exascale research program.
    • We published research focusing on power and thermal management at HPCA 2017, ARCS 2017, ITherm 2018, ICCAD 2018, and MICRO 2019. The simulation tools we build were also used in a major HPCA 2017 industry track publication.
    • This group also researched system and software topics for heterogeneous systems such as GPGPU buffer overflow protection (published at CGO 2017 and IWOCL 2018) and GPGPU system call overheads (published at IISWC 2018).
    • I then moved to be a performance engineer in AMD's ROCm GPGPU software engineering team, where I worked on optimizing our GPU software, firmware, and hardware in order to meet the demands of our GPU compute customers.
    • My work in this group included user support of our software stack, new hardware bring-up and software optimization for the Radeon Instinct MI60 and Instinct MI100 accelerators, and software feature development such as HIP Cooperative Groups and rocSPARSE algorithmic development.

    Member of Technical Staff
    July 2014 - June 2016
    • For AMD's FastForward 2 research, a major focus was extending the high-level simulation work from the FastForward program. Details of some of our extended power and performance models can be found in our papers at HPCA 2015, IISWC 2015, and IISWC 2016.
    • Beyond modeling and simulation work, I focused on further GPGPU software, including further sparse linear algebra work as described at HiPC 2015 and IWOCL 2015 and hash table algorithms published at USENIX ATC 2016.
    • I also performed system administration tasks during this period of time, setting up clusters of computers running new HSA software on AMD heterogeneous processors for multiple government labs to use in their own research.
    Senior Design Engineer
    August 2012 - June 2014
    • As part of AMD Research's FastForward research on exascale computing, my research broadly centered on creating a high-level performance and power simulator based on analytic scaling of real hardware measurements.
    • This simulator was described in a ModSim 2013 paper. We invented multiple new CPU and GPU performance and power estimation algorithms for it, including one described at USENIX ATC 2014.
    • This simulator was used as a major part of AMD Research's studies of heterogeneous CPU-GPU PIM systems, as described in MSPC 2013 and HPDC 2014.
    • During this time, I also worked with exascale proxy applications to formulate new GPGPU algorithms for AMD GPUs and APUs, such as the GPU-based sparse matrix-vector multiplication algorithm published at SC14.
  • University of Michigan, Ann Arbor, MI, USA
    Research Assistant
    May 2007 - August 2012
    • Identified methods of distributing software analyses across many users to reduce slowdowns.
    • Managed graduate and undergraduate students through development of prototype systems.
  • Kelly Services / Intel Corp., Champaign, IL, USA
    Research Contractor
    May 2010 - October 2010
    • Researched approaches for improving speed and accuracy of Intel Inspector XE data race detector.
    • Utilized unique features of Intel processors to yield orders-of-magnitude performance gains.
  • International Business Machines Corp., Rochester, MN, USA
    Speed Team Intern
    May 2008 - August 2008
    • Designed and constructed an InfiniBand compliance verification suite that caught numerous bugs.
    • Added the suite into the IBM PowerVM I/O firmware development process and found multiple bugs.
Education
Conference Publications
Workshop Publications
Software Projects
Patents
Presentations
  • "Accelerating Dynamic Software Analyses," Microsoft Research, Feb. 23, 2012
  • "On-Demand Dynamic Software Analysis," AMD Tech Topic Series, Dec. 12, 2011
  • "Hardware Support for On-Demand Software Analysis," University of Michigan CSE Graduate Student Honors Competition, Dec. 8, 2011
  • "Accelerating Dynamic Software Analyses," Microsoft Research Silicon Valley, Dec. 2, 2011
  • "Accelerating Dynamic Software Analyses," VMware, Dec. 1, 2011
  • "On-Demand Dynamic Software Analysis," Intel Labs, Nov. 29, 2011
  • "Sampling Dynamic Dataflow Analyses," University of British Columbia, Jun. 10, 2011
Posters
  • Scalable Security Vulnerability Analysis via Sampling, 2011 GSRC Annual Symposium, Nov. 16, 2011
  • Testudo: Heavyweight Security Analysis via Statistical Sampling, 2008 Engineering Graduate Symposium, University of Michigan, Nov. 7, 2008.
Teaching Experience
  • University of Michigan – Graduate Student Instructor
    January 2012 - April 2012
    EECS 570 - Parallel Computer Architecture
    Responsible for guiding multiple graduate student research projects related to parallel computing.
    Set up software infrastructure for assignments on parallel programming and cache coherency protocols.
  • University of Illinois – Undergraduate Teaching Assistant
    January 2005 - August 2006
    ECE 290 - Computer Engineering I
    Graded homework assignments and tests for four semesters
    Taught discussion section for this undergraduate digital logic course during the summer of 2006.
  • University of Illinois – Grader
    August 2005 - December 2005
    CS 433 - Computer System Organization
    Graded homework assignments for this undergraduate computer architecture course.
Professional Activities
  • Program committee member for MICRO (2022), IISWC (2020), ICPP (2020), ISPASS (2015), HPPAC (2015–2018)
  • External reviewer for MICRO (2009, 2013, 2014, 2017, 2020), HPCA (2013, 2014), IEEE CAL (2015–2017), IEEE TPDS (2017), IEEE TCAD (2017, 2018), IEEE TMSCS (2018), SC (2017), SRCS (2013), FMCAD (2010), and MDPI Computation (2018, 2020)
  • External reviewer (through Todd Austin) for ASPLOS (2012, 2013), CODES (2011), DATE (2008–2012), FMCAD (2010), HPCA (2009, 2010, 2012), ISCA (2009, 2010, 2012), and MICRO (2008, 2011, 2012), and PACT (2012)
  • Judge for SRC TechCon (2015)
  • Association for Computing Machinery, Senior Member
  • Institute for Electrical and Electronics Engineers, Senior Member
  • U of M Advanced Computer Architecture Laboratory Reading Group organizer, 2009-2010, compute cluster administrator (2008--2011)
Awards and Honors
  • Awards at Advanced Micro Devices, Inc.
    • AMD Q3 2022 Next 5% Award for work on work breaking the exaflop barrier
    • AMD Q1 2020 Next 5% Award for work on AMD's Frontier supercomputer design win
    • AMD Executive Spotlight Award: Q4 2019, Q4 2022, Q2 2021
    • AMD DCGPU Recognition Award: Q1 2021, Q2 2022, Q4 2022
    • AMD Spotlight Award: Q2 2017
  • Academic Awards and Honors
    • IISWC 2016 Best Paper Award
    • CGO 2011 Best Student Presentation Award
    • Nomination for Best Paper at IISWC 2018
    • Nomination for Best Paper at HPDC 2014
  • Awards and Honors at the University of Michigan
    • 2011 University of Michigan CSE Graduate Student Honors Competition 1st Place
    • University of Michigan EECS Departmental Fellowship, 2006-2007
  • Honors at the University of Illinois
    • Eta Kappa Nu Electrical and Computer Engineering Honor Society
    • Tau Beta Pi Engineering Honor Society
    • Illinois Chancellor's Scholar
    • Illinois Engineering James Scholar
Skills
  • Programming Languages
    C, C++, HIP, CUDA, OpenCL, x86 assembly, AMD GCN, CDNA, and RDNA assembly, Python
  • Software Systems
    Linux kernel, multiple AMD-internal simulation, firmware, and analysis tools