Aussie AI

Partial RoPE Positional Encoding

Last Updated 23 April, 2026

by David Spuler, Ph.D.

What is Partial RoPE?

Partial RoPE is an optimization of Rotational Positional Encoding where only a subset of the vectors are rotated. RoPE is a type of relative positional encoding inside the attention kernel. The idea of partial RoPE is that rotation primarily helps focus on nearby tokens, and the meaning of tokens that are far away is diminished. To increase the effect of distant but important tokens or facts in long contexts, some tokens are left unrotated. This method is primarily used for improvement of the performance and accuracy of the attention module in long contexts, but it also gives a minor improvement in compute costs by not needing the RoPE computations for all vectors. There is no memory benefit to processing or KV caching from using partial RoPE, so it is usually combined with other attention optimizations and KV management approaches. An example of partial RoPE in a production model is the Google Gemma model versions on edge devices, where only 25% of the tokens are rotated.

Research on Partial RoPE Positional Encoding

Research papers include:

Devansh, Apr 2026, Google’s Gemma 4 is Weirder than you Realize: The architecture matters more than the numbers. Here’s what Google actually built, https://machine-learning-made-simple.medium.com/googles-gemma-4-is-weirder-than-you-realize-17d00d95b0d5
Sesame Disk, Apr 2026, LLM Architecture Gallery 2026: Top Model Designs Explained, https://sesamedisk.com/llm-architecture-gallery-2026/
Sebastian Raschka, PhD, Apr 2, 2026 (updated), The Big LLM Architecture Comparison: From DeepSeek V3 to GLM-5: A Look At Modern LLM Architecture Design, https://magazine.sebastianraschka.com/p/the-big-llm-architecture-comparison