Yoryck AI

Lookahead Decoding

  • Last Updated 11 June, 2025
  • by David Spuler, Ph.D.

What is Lookahead Decoding?

Lookahead decoding is a type of parallel decoding method that looks forwards in the sequence to see the upcoming tokens. The idea is to "guess" or "draft" the most likely token, and usually multiple tokens, which can then be verified in parallel for a speedup. This is similar to speculative decoding in that there's both drafting and verification, but in lookahead decoding this is done inside the same model.

This method operates in parallel, making it effective for parallel GPU implementions, but not effective for low-resource platforms such as AI PCs or AI Phones.

Research on Lookahead Decoding

Research papers include:

More Research on Decoding Algorithms

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research

Read more about:

(For feedback, suggestions or corrections, please email research@yoryck.com.)