Aussie AI Blog

by David Spuler, Ph.D.

Latest Blog Articles

LLM Inference Optimization: State-of-the-Art Research, new book by David Spuler, table of contents, buy on Amazon.
Early Exit as a Training Optimization
Structured Decoding
Training versus Inference Optimization
Why is Training Loss Calculated Per Batch?
LLM Training Optimization
What is Prefill?
Top-K Vector Theory
Layerwise Pipelined Overlapping of Prefill and Decode
FFN Fusion with Tiled Pipelined RELU
FFN Fusion Optimizations with Piecewise Linear Approximations
LLM Attention and FFN Optimization are Opposites
Scaling your API Wrapper Application
Vector dot product add-as-integer optimization
AGI might require higher precision FP32 or FP64
Free PDF download versions of several AI and C++ books now released
Promising LLM Inference Optimization Research
CUDA C++ Job Interview Questions
The Sweetest Lesson: Your Brain Versus AI — AI intelligence, AGI, and why human brains are 50 times bigger than frontier LLMs.
RAG Optimization — LLM inference optimization for RAG architectures.
Vector Dot Product Optimization in C++ with Instruction-Level Parallelism
C++ Low Latency Book

List of Lists

Most Popular

Low Latency C++ Blog Articles

CUDA C++ Efficiency Articles

CUDA C++ Safety Articles

C++ Safety and Debugging Articles

Aussie AI Book Releases

March 2025 Blog Articles

February 2025 Blog Articles

January 2025 Blog Articles

December 2024 Blog Articles

November 2024 Blog Articles

October 2024 Blog Articles

September 2024 Blog Articles

August 2024 Blog Articles

AI Books from Aussie AI

LLM Inference Optimization Book:

• Online: Table of Contents

• Buy: LLM Inference Optimization

LLM Inference Optimization

New book: LLM Inference Optimization:

50+ research breakthroughs: current, new and emerging
100+ chapters on inference efficiency techniques
500+ LLM inference optimization techniques
State-of-the-art research & literature review

Get your copy from Amazon: LLM Inference Optimization

The Sweetest Lesson: Your Brain Versus AI

The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:

Your brain is 50 times bigger than the best AI engines.
Truly intelligent AI will require more compute!
Another case of the bitter lesson?
Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson

RAG Optimization

RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:

Smarter RAG
Faster RAG
Cheaper RAG
Agentic RAG
RAG reasoning

Get your copy from Amazon: RAG Optimization

Generative AI in C++

Generative AI Applications book:

Deciding on your AI project
Planning for success and safety
Designs and LLM architectures
Expediting development
Implementation and deployment

Get your copy from Amazon: Generative AI Applications

Generative AI in C++

Generative AI programming book:

Generative AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

CUDA C++ Optimization

CUDA C++ Optimization book:

Faster CUDA C++ kernels
Optimization tools & techniques
Compute optimization
Memory optimization

Get your copy from Amazon: CUDA C++ Optimization

CUDA C++ Optimization

CUDA C++ Debugging book:

Debugging CUDA C++ kernels
Tools & techniques
Self-testing & reliability
Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

Free AI and C++ Books

Generative AI programming books:

The Sweetest Lesson: Your Brain Versus AI, November 2025: full text online, free PDF available
RAG Optimization: Accurate and Efficient LLM Applications, June 2025: full text online, free PDF available
Generative AI Applications: Planning, Design and Implementation, November 2024: full text online, free PDF available
Generative AI in C++ (Spuler, March 2024): full text online, free PDF available, table of contents, bonus materials, reference lists, source code

CUDA C++ GPU Programming Books:

CUDA C++ Optimization: Coding Faster GPU Kernels, July 2024: full text online, bonus materials, free PDF available
CUDA C++ Debugging: Safer GPU Kernel Programming, July 2024: full text online, free PDF available

Modern C++ Programming Books

C++ AVX Optimization: CPU SIMD Vectorization, 2025: full text online, free PDF available
C++ Ultra-Low Latency: Multithreading and Low-Level Optimizations, 2025: full text online, free PDF available
Advanced C++ Memory Techniques: Efficiency and Safety, 2025: full text online, free PDF available
Efficient C++ Multithreading: Modern Concurrency Optimization, 2025: free PDF available
Efficient Modern C++ Data Structures: Container and Algorithm Optimizations, 2025: free PDF available
C++ Low Latency: Multithreading and Hotpath Optimizations, 2025: free PDF available
Safe C++: Fixing Memory Safety Issues, Oct 2024: full text online, free PDF available

More AI Research Topics

Read more about: