Aussie AI

Length Pruning

  • Last Updated 25 April, 2026
  • by David Spuler, Ph.D.

What is Length Pruning?

Length pruning is weight pruning on one of the three axes of pruning. The other two axes are width pruning (e.g. attention head pruning) and depth pruning (e.g. layer pruning and early exit). All three types of pruning are mostly orthogonal to each other and can be combined into triple pruning.

The main types of length pruning along the "lengthwise" dimension of the inputs are:

Other non-pruning AI model techniques that operate on the same "lengthwise" dimension include:

Length Pruning: Book Excerpts and Blog Articles

Free online book excerpts with full text chapters online and free PDF downloads, and the Aussie AI blog, including related articles:

Length Pruning Research

The term "length pruning" can apparently mean a few different things in the literature. It can mean avoiding redundant computations from the padding in the input vector, such as in Zhai et al. (2023). Or cutting tokens out of the input stream (see token pruning). It can also mean changing the size of the embeddings to reduce the memory size of the embedding matrix (see embeddings pruning). It may mean "length prediction" in the decoder output. And it can refer to managing the size of the inputs to reduce the auto-regression bottleneck (see non-autoregressive algorithms).

Research papers directly related to "length pruning" include:

More Research on Pruning Types

More AI Pruning Research

Read more about other types of pruning:

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging