Aussie AI

Edit Decoding

  • Last Updated 31 May, 2026
  • by David Spuler, Ph.D.

What is Edit Decoding?

Edit decoding is a specific decoding algorithm for "editing" or Grammatical Error Correction (GEC), that uses the input prompt as a template to speed up the output decoding. It is similar to aggressive decoding and speculative decoding, which are types of parallel decoding. Edit decoding can be used sequentially or in parallel.

The main advantage of edit decoding over standard autoregressive decoding is that it can start at the beginning of the context, and is specialized for editing. Hence, there is no need to prepend an instruction to the prompt, because the engine knows to do editing automatically. This allows fewer tokens in the input prompt, and also improves inference efficiency by not needing to encode a large input prompt at the start.

Edit Decoding Algorithm

Transformers have been used for editing since shortly after they were discovered in 2017. The general field is called "Grammatical Error Correction" or GEC. The inference algorithm needs to be modified to perform editing.

Non-editing decoding tasks tend to extend an input prompt, known as a "completion." However, editing tasks need to modify the input prompt, rather than extend it. Hence, the decoding algorithm needs to be changed. The basic "edit decoding" algorithm is not aggressive decoding, which is a parallelization optimization of this simpler algorithm.

The most basic difference between edit decoding and standard autoregressive decoding is a simple observation: the prompt is changed. Non-edit inference will typically emit the input prompt tokens unchanged, and then add the "completion" tokens afterwards. Edit decoding differs in that it can modify the initial input tokens, rather than always emit them verbatim.

It is worth considering a simplistic attempt to do edit decoding like this pseudocode:

    tok = predict(1, n);  // Predict n+1 token
    if (tok == input[n+1]) {
        // correct prediction matching input
        emit input[n+1];
    }
    else { 
        // Incorrect input text, override input[n+1]
        emit tok;
    }

But this is silly coding because if you look carefully, you can see that this is identical to simpler pseudocode:

    tok = predict(1, n);  // Predict n+1 token
    emit tok;

This method is not really using the input text in the prediction. And it will actually struggle to get started, because the initial prediction after the first token likely to diverge, because there will be many possibilities. The output likely won't be similar to the input. Instead, edit decoding needs to prefer the input tokens unless the model has high confidence in a correction.

Edit Decoding: Book Excerpts and Blog Articles

Free online book excerpts with full text chapters online and free PDF downloads, and the Aussie AI blog, including related articles:

Edit decoding research papers

References for edit-specific use of Transformer inference:

Grammatical error correction general research. For additional general research papers on the broader area of GEC, see Research on Grammatical Error Correction.

More Research on Decoding Algorithms

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research

Read more about: