Aussie AI Blog
What's New in Speculative Decoding?
-
March 3rd, 2025
-
by David Spuler, Ph.D.
What's New in Speculative Decoding?
Speculative decoding is one of the earliest LLM efficiency improvements that parallelized a lot of decoding steps. And yet, there seems to be a never-ending supply of research papers on the topic of speculative decoding.
So, what's new? Here are some of the more recent research areas:
- Draft model accuracy — more papers on this, as always; forgive me if I yawn!
- Multiple parallel draft models — ongoing improvements to this idea.
- Multi-query prompt lookup decoding — this generalizes prompt lookup decoding to scour not only the current prompt context, but also any previous queries in the history.
- Distributed speculative decoding — optimal use of speculative decoding when inference processing is distributed over multiple GPUs or multiple servers.
- Long context speculative decoding — examination of particular optimizations when applying speculative decoding to long context or ultralong contexts.
- Vision and multimodal speculative decoding — visual tokenization is very different.
Read more about types of speculative decoding.
More AI Research Topics
Read more about:
AI Books from Aussie AI
![]() |
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |
![]() |
RAG Optimization: Accurate and Efficient LLM Applications:
new book on RAG architectures:
Get your copy from Amazon: RAG Optimization |
![]() |
Generative AI Applications book:
Get your copy from Amazon: Generative AI Applications |
![]() |
Generative AI programming book:
Get your copy from Amazon: Generative AI in C++ |
![]() |
CUDA C++ Optimization book:
Get your copy from Amazon: CUDA C++ Optimization |
![]() |
CUDA C++ Debugging book:
Get your copy from Amazon: CUDA C++ Debugging |