Yoryck AI

Lookahead Decoding

  • Last Updated 11 June, 2025
  • by David Spuler, Ph.D.

What is Lookahead Decoding?

Lookahead decoding is a type of parallel decoding method that looks forwards in the sequence to see the upcoming tokens. The idea is to "guess" or "draft" the most likely token, and usually multiple tokens, which can then be verified in parallel for a speedup. This is similar to speculative decoding in that there's both drafting and verification, but in lookahead decoding this is done inside the same model.

This method operates in parallel, making it effective for parallel GPU implementions, but not effective for low-resource platforms such as AI PCs or AI Phones.

Research on Lookahead Decoding

Research papers include:

More Research on Decoding Algorithms

More AI Research

Read more about:

(For feedback, suggestions or corrections, please email research@yoryck.com.)