Aussie AI
Partial RoPE Positional Encoding
-
Last Updated 31 May, 2026
-
by David Spuler, Ph.D.
What is Partial RoPE?
Partial RoPE is an optimization of Rotational Positional Encoding where only a subset of the vectors are rotated. RoPE is a type of relative positional encoding inside the attention kernel. The idea of partial RoPE is that rotation primarily helps focus on nearby tokens, and the meaning of tokens that are far away is diminished. To increase the effect of distant but important tokens or facts in long contexts, some tokens are left unrotated. This method is primarily used for improvement of the performance and accuracy of the attention module in long contexts, but it also gives a minor improvement in compute costs by not needing the RoPE computations for all vectors. There is no memory benefit to processing or KV caching from using partial RoPE, so it is usually combined with other attention optimizations and KV management approaches. An example of partial RoPE in a production model is the Google Gemma model versions on edge devices, where only 25% of the tokens are rotated.
Research on Partial RoPE Positional Encoding
Research papers include:
- Devansh, Apr 2026, Google’s Gemma 4 is Weirder than you Realize: The architecture matters more than the numbers. Here’s what Google actually built, https://machine-learning-made-simple.medium.com/googles-gemma-4-is-weirder-than-you-realize-17d00d95b0d5
- Sesame Disk, Apr 2026, LLM Architecture Gallery 2026: Top Model Designs Explained, https://sesamedisk.com/llm-architecture-gallery-2026/
- Sebastian Raschka, PhD, Apr 2, 2026 (updated), The Big LLM Architecture Comparison: From DeepSeek V3 to GLM-5: A Look At Modern LLM Architecture Design, https://magazine.sebastianraschka.com/p/the-big-llm-architecture-comparison
- Mohammad Aflah Khan, Krishna P. Gummadi, Manish Gupta, Abhilasha Ravichander, 12 Mar 2026, Fractional Rotation, Full Potential? Investigating Performance and Convergence of Partial RoPE, https://arxiv.org/abs/2603.11611
- Ye Qiao, Haocheng Xu, Xiaofan Zhang, Sitao Huang, 26 Sep 2025, Rethinking RoPE Scaling in Quantized LLM: Theory, Outlier, and Channel-Band Analysis with Weight Rescaling, https://arxiv.org/abs/2510.00028
- Junu Kim, Xiao Liu, Zhenghao Lin, Lei Ji, Yeyun Gong, Edward Choi, 14 Nov 2025 (updated), Behind RoPE: How Does Causal Mask Encode Positional Information? ICLR 2026 Conference Withdrawn Submission, https://openreview.net/forum?id=IAXBLI2vo5 https://openreview.net/pdf?id=IAXBLI2vo5
- Bowen Yang, Bharat Venkitesh, Dwarak Talupuru, Hangyu Lin, David Cairuz, Phil Blunsom, Acyr Locatelli, 22 Oct 2025 (v2), Rope to Nope and Back Again: A New Hybrid Attention Strategy, https://arxiv.org/abs/2501.18795
- Qiao, Y., & Huang, S. (2026). Q-ROAR: Outlier-Aware Rescaling for RoPE Position Interpolation in Quantized Long-Context LLMs (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41359-41361. https://doi.org/10.1609/aaai.v40i48.42269 https://ojs.aaai.org/index.php/AAAI/article/view/42269
- Zichong Li, Chen Liang, Liliang Ren, Tuo Zhao, Yelong Shen, Weizhu Chen, 15 Apr 2026, Shuffle the Context: RoPE-Perturbed Self-Distillation for Long-Context Adaptation, https://arxiv.org/abs/2604.14339
- Arjun Kocher, Apr 2026, Compressed Sparse Attention, https://www.k-a.in/CSA.html
- David Spuler, May 31st, 2026, Chapter 32. Positional Encoding, in book LLM Inference Optimization: State-of-the-Art Research, Table of Contents: https://www.aussieai.com/book/llm-inference-optimization https://www.amazon.com/dp/B0H3FKR39T
AI Books from Aussie AI
|
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |
|
RAG Optimization: Accurate and Efficient LLM Applications:
new book on RAG architectures:
Get your copy from Amazon: RAG Optimization |
|
Generative AI Applications book:
Get your copy from Amazon: Generative AI Applications |
|
Generative AI programming book:
Get your copy from Amazon: Generative AI in C++ |
|
CUDA C++ Optimization book:
Get your copy from Amazon: CUDA C++ Optimization |
|
CUDA C++ Debugging book:
Get your copy from Amazon: CUDA C++ Debugging |
|
C++ AVX Optimization: CPU SIMD Vectorization:
Get your copy from Amazon: C++ AVX Optimization: CPU SIMD Vectorization |
|
C++ Ultra-Low Latency: Multithreading and Low-Level Optimizations:
Get your copy from Amazon: C++ Ultra-Low Latency |
More AI Research Topics
Read more about:
- 500+ LLM Inference Optimization Techniques
- What's Hot in LLM Inference Optimization in 2025?
- Inference Optimization Research
- « Research Home