Aussie AI
Layer Reordering
-
Last Updated 21 August, 2025
-
by David Spuler, Ph.D.
Research on Layer Reordering
Research papers include:
- Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In International Conference on Learning Representations, September 2019. https://openreview.net/forum?id=H1eA7AEtvS
- Ofir Press, Noah A. Smith, Omer Levy, Apr 2020, Improving Transformer Models by Reordering their Sublayers, https://arxiv.org/abs/1911.03864
- David Spuler, March 2024, Chapter 47. Early Exit and Layer Pruning, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
- Byung-Kwan Lee, Sangyun Chung, Chae Won Kim, Beomchan Park, Yong Man Ro, 19 Jun 2024 (v2), TroL: Traversal of Layers for Large Language and Vision Models, https://arxiv.org/abs/2406.12246 https://arxiv.org/pdf/2406.12246 (To achieve higher accuracy, this model re-traverses some of the layers, which achieves higher model accuracy from the same size model without more memory.)
- Vedang Lad, Wes Gurnee, Max Tegmark, 27 Jun 2024, The Remarkable Robustness of LLMs: Stages of Inference, https://arxiv.org/abs/2406.19384 (Deleting and swapping adjacent model layers. Hypothesizes that the first layer is effectively detokenization, the early layers focus on "features", the middle layers focus on "ensemble predictions" and the latter layers "sharpen" or finalize, with a lot of suppression happening near the end.)
- Zhuocheng Gong, Ang Lv, Jian Guan, Junxi Yan, Wei Wu, Huishuai Zhang, Minlie Huang, Dongyan Zhao, Rui Yan, 9 Jul 2024, Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of Modules, https://arxiv.org/abs/2407.06677
- Matthias Freiberger, Peter Kun, Anders Sundnes Løvlie, Sebastian Risi, 5 Jul 2024, LayerShuffle: Enhancing Robustness in Vision Transformers by Randomizing Layer Execution Order, https://arxiv.org/abs/2407.04513
- Rohit Kumar Thakur, July 2025, Google DeepMind Just Dropped a ‘Transformers Killer’ Architecture, https://ninza7.medium.com/google-deepmind-just-dropped-a-transformers-killer-architecture-c6c1d9288922 (Covers Mixture-of-Recursions algorithm.)
- Ben Dickson, July 22, 2025, Mixture-of-recursions delivers 2x faster inference—Here’s how to implement it, https://venturebeat.com/ai/mixture-of-recursions-delivers-2x-faster-inference-heres-how-to-implement-it/
- Sangmin Bae, Yujin Kim, Reza Bayat, Sungnyun Kim, Jiyoun Ha, Tal Schuster, Adam Fisch, Hrayr Harutyunyan, Ziwei Ji, Aaron Courville, Se-Young Yun, 21 Jul 2025 (v2), Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation, https://www.arxiv.org/abs/2507.10524 (MoR is an adaptive layer fusion or layer reuse method to a fixed "recursive level" and also combined with related optimizations to KV cache management techniques.)
AI Books from Aussie AI
![]() |
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |
![]() |
RAG Optimization: Accurate and Efficient LLM Applications:
new book on RAG architectures:
Get your copy from Amazon: RAG Optimization |
![]() |
Generative AI Applications book:
Get your copy from Amazon: Generative AI Applications |
![]() |
Generative AI programming book:
Get your copy from Amazon: Generative AI in C++ |
![]() |
CUDA C++ Optimization book:
Get your copy from Amazon: CUDA C++ Optimization |
![]() |
CUDA C++ Debugging book:
Get your copy from Amazon: CUDA C++ Debugging |
More AI Research Topics
Read more about:
- 500+ LLM Inference Optimization Techniques
- What's Hot in LLM Inference Optimization in 2025?
- Inference Optimization Research
- « Research Home