Aussie AI

Debugging AI Models and Frameworks

  • Last Updated 14 August, 2025
  • by David Spuler, Ph.D.

I heard a rumor that AI frameworks are just code, and AI models are just data. So this means that there must be bugs! And this article is about real, hard-core coding bugs, the nasty kind that sneak in with all of this performance tuning that's going around, not the higher level AI problems with safety and accuracy issues.

The reality is that an AI engine is some of the most difficult code you'll ever see. Parallelized code of any kind (e.g. low-level hardware acceleration, multi-threaded, multi-GPU, etc.) multiplies this complexity by another order of magnitude. Hence, starting with the basics of high quality coding practices are ever more important, such as:

  • Unit tests
  • Assertions and self-testing code
  • Debug tracing code
  • Automated system tests (regression testing)
  • Error handling (e.g. starting with checking error return codes)
  • Exception handling (wrapping code in a full exception handling stack)

All of these techniques involve a significant chunk of extra coding work. Theory says that full exception handling can be 80% of a finalized software product, so it's a four-fold amount of extra work! Maybe that estimate is a little outdated, given improvements in modern tech stacks, but it still contains many grains of truth.

There are many programming tools to help the debugging cycle:

  • C++ memory debugging tools (e.g. Valgrind on Linux)
  • Performance profiling tools (for "de-slugging")
  • Memory usage tracking (ie. allocated memory measurement)
  • Interactive debugging tools (eg. in the IDE, Gnu gdb, etc.)

Random Number Seeds

Neural network code often uses random numbers to improve accuracy, for a stochastic algorithm, or even just for random testing. Random numbers need a "seed" to get started, which is done via the "srand" function in C++. The typical way to initialize the random number generator, so it's truly random, is to use the current time:

    srand(time(NULL));

But that's not good for debugging! We don't want randomness when we're trying to reproduce a bug!

A generalized plan is to have a debugging or regression testing mode where the seed is fixed.

    if (g_yapi_debug_srand_seed != 0) {
	srand(g_yapi_debug_srand_seed);   // Non-random randomness!
    }
    else {  // Normal run
	srand(time(NULL));
    }

The test harness has to set the global debug variable when it's doing a regression test. For example, either it's manually hard-coded into a testing function, or it could be set via a command-line argument to your test harness executable.

This is better, but if we have a bug in production, we won't know the seed number. So the better code also prints out the seed number in case you need to use it later to reproduce a bug that occurred live.

    if (g_yapi_debug_srand_seed != 0) {
	srand(g_yapi_debug_srand_seed);   // Non-random randomness!
    }
    else {  // Normal run
	long int iseed = (long)time(NULL);
	fprintf(stderr, "INFO: Random number seed: %ld 0x%lx\n", iseed, iseed);
	srand(iseed);
    }

Research on Debugging AI Framework Code

Papers on the issues of debugging the actual code that runs AI models, including the code inside the frameworks and ML compilers, includes:

General Debugging Techniques Research

Research on general program debugging methods:

Testing AI Applications

Research on testing of AI apps in general (not just model evaluation of LLMs):

GPU Testing

GPU testing is a variety of techniques to detect errors in GPU hardware, which can arise due to aging, overheating, or transient soft errors. GPU failures are a common problem in large-scale AI training jobs, because there are literally 100,000+ GPU chips, each of which has a small failure chance. There are various tools, both commercial and open source, to run stress tests on GPUs.

Papers on GPU testing:

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research

Read more about: