Aussie AI

Chapter 1. CUDA Introduction

  • Book Excerpt from "CUDA C++ Debugging: Safer GPU Kernel Programming"
  • by David Spuler

What is CUDA?

CUDA is officially an acronym for Common Unified Device Architecture, which is just so excruciatingly boring. Instead, I prefer to think of CUDA as a barracuda, sleek and fast, shredding GPU chips with its gnashing teeth by sending them too much work.

CUDA is a platform to program NVIDIA GPUs, consisting of many tools and libraries. As a C++ programmer myself, I’m going to focus on CUDA C++ capabilities, but there is support for other programming languages, such as Fortran and Python.

CUDA is owned and maintained by NVIDIA, and is used to write code for NVIDIA GPUs. The basic idea is to write C++ for both the CPU and the GPU, and the CUDA C++ compiler allows you to do both from the same source file. The CUDA platform is not open-sourced, but is priced free to use (although the chips are not!).

Why Use CUDA?

Oh, come on! You know the answer to this one: AI. NVIDIA GPUs are for AI, and CUDA is NVIDIA’s answers to how you program a GPU. As for the C++ part, well, CUDA and C++ go together like AI and cat videos.

This book mostly assumes that you’re using CUDA for generative AI, but there are many use cases. They’re not as well-known as AI, but they’re making a lot more money for companies than AI ever will.

Generally, any type of algorithm that needs to do a lot of number crunching can often be parallelized within an inch of its life with CUDA. Some of the use cases of CUDA include:

  • Generative AI training and inference
  • Physics computations
  • Drug discovery algorithms
  • Cryptography (Bitcoin mining!)
  • Linear algebra operations
  • Optimization and search space computations

There are many more types of parallelizable algorithms. Feel free to add your own to the list.

Main Features of CUDA

The CUDA environment is not just a C++ platform, but also an entire ecosystem. This includes:

  • Documentation ranging from introductory to reference manuals.
  • Articles and technical blogs on the NVIDIA website.
  • Multiple language support (e.g., C++, Fortran)
  • Example code of various ilks on NVIDIA’s Github repo.
  • Full implementations of AI backends.
  • Forums and support platforms for questions.
  • The annual GTC NVIDIA conference in California.

Generally, I have to say that everything I see on the NVIDIA website is underpinned by a high level of technical competence. NVIDIA is an impressive company.

Oh, yeah, I almost forgot. There is also some C++ stuff in CUDA:

  • CUDA C++ compilers
  • Debugging tools
  • Memory checkers
  • Synchronization checkers
  • Performance profiler tools

Many of the tools have both graphical and command-line interfaces. Personally, I’m old-school and prefer the CLI versions, but many programmers prefer a nice GUI for increased productivity.

Advantages of CUDA C++

CUDA is popular and the market leading interface for programming NVIDIA GPUs. In fact, CUDA is regarded as a second level of value that helps NVIDIA stay on top in the GPU arms race. Here are some thoughts on why.

C++ Syntax. Writing a CUDA program is just like writing a C++ program, and everybody loves doing that. Personally, I’ve been doing that for 30 years, so we can just stop here at this point.

Fast! The programs that you write in CUDA run fast. It’s probably not really about the software at this point, but us programmers like to think it is. Let’s give CUDA the credit!

Dual coding model. You write both the CPU code and the GPU code in the same C++ program. This is very convenient, and keeps everything somewhat more orderly than otherwise.

Capable. You can do all of the basic C++ stuff, such as all the arithmetic and logical operators. There are also CUDA APIs for just about anything you could think of, such as memory management and thread synchronization.

Libraries. At a higher level than the CUDA APIs, there are also CUDA libraries for a lot of the common coding tasks, such as vector and matrix computations. These have been coded by professionals, and it’s hard to write faster code than you’ll find in these libraries, although many people keep trying.

Overall Limitations of CUDA

CUDA has been very successful and is often mentioned as a massive competitive “moat” for NVIDIA. Nevertheless, there are areas where CUDA can be problematic.

Steep Learning Curve. CUDA is not the easiest language to master. For starters, you need to know C++, and then there are a number of CUDA-specific syntax constructs and a lot of CUDA APIs and libraries to learn.

Even worse, the need to learn these new sets of C++ keywords and libraries is compounded by the utterly brain-bending nature of SIMD parallel programming. Hence, not all of the difficulty can be blamed on CUDA itself.

GPU specific. The use of CUDA C++ is limited to NVIDIA GPUs. Although some attempts have been made to layer CUDA over other non-NVIDIA GPUs, this does not work well, and is not supported by NVIDIA.

Proprietary License. The CUDA platform is not open-sourced by NVIDIA, although it is free. This compares with other similar GPU platforms, such as AMD’s ROCm, which has an open-source license. However, much of the NVIDIA code examples are open-sourced, under various licenses, so this limitation only applies to the core compiler and runtime platform.

Non-SIMD algorithms. Massive SIMD parallelism in GPUs only works to optimize certain kinds of algorithms. This is not a limitation of the CUDA platform itself, but moreso of GPUs in general. Fortunately, there are plenty of demand for recurring vector calculations in optimizing applications such as AI inference and training, not to mention Bitcoin mining and video codec processing.

Other GPU Accelerator Platforms

CUDA is not the only way to program NVIDIA GPU chips, but it’s the best. CUDA is widely regarded as a superior platform that helps sustain NVIDIA’s dominance of GPU chips for AI applications, with a “software moat” that adds another layer to its capabilities.

However, there is plenty of ongoing activity from competitors and also in open source communities. Some of the other upcoming GPU software platforms are discussed below.

Triton (OpenAI). The Triton platform was initially created by OpenAI, and has been open-sourced as its own project. The goal of Triton is to make GPU programming simpler, so as to write GPU applications in a Python-like language. The idea is to hide a lot of the low-level issues, such as memory transfers, in a way that does not impact performance.

ROCm (AMD). The ROCm software platform is for AMD GPU programming. Unlike CUDA, the underlying code for ROCm has been open-sourced, and is available for review. This is a fully-capable platform and has a long history of development.

Intel OneAPI. The OneAPI platform was created by Intel, and initially focused on their GPU chips. It has since become an open standard and its own project, allowing OneAPI to be used to manage other vendors’ GPU hardware.

Apple hardware. Apple makes its own M-series chips, based on the Arm architecture, for its PCs, tablets, and phones. To support developers of applications for these devices, Apple has developed its own software acceleration platforms, including CoreML, Apple Accelerate, Apple Metal, and the new “Apple Intelligence” platform. Apple’s hardware chips are not as fully-capable as high-end GPUs, and the focus of these software platforms is more for execution on AI PCs (MacOS) and AI phones (iPhone/iOS).

Vulkan. The Vulkan API is a portable layer to operate across multiple types of GPUs. A lot of its historical functionality is related to gaming and similar GPU applications, but it has become focused more on AI lately. Vulkan is an open source project that is supported by various corporate entities in this space.

SYCL (pronounced “sickle”). The SYCL platform is also an open-source multi-GPU abstraction layer and standardization, backed by the Kronos Group. It allows the development of GPU-based applications at a higher-level, allowing deployment to different hardware stacks.

I’m not sure why you needed to know about all those platforms, because you really only need one: CUDA. And most of the CUDA backends were written in C once up on a time, but are now usually written in C++, so the best way to program GPUs is CUDA C++.

 

Online: Table of Contents

PDF: Free PDF book download

Buy: CUDA C++ Debugging: Safer GPU Kernel Programming

CUDA C++ Optimization The new CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging