Aussie AI
Ring Attention
-
Last Updated 8 August, 2025
-
by David Spuler, Ph.D.
Ring attention is an LLM optimization of the attention module using blockwise computations. The aim is to speed up the calculations of the self-attention step in either training or inference. Ring attention is a method that can be combined orthogonally with some of the other memory-efficient attention algorithms, such as with Flash attention.
Research on Ring Attention
Research papers on ring attention include:
- Hao Liu, Matei Zaharia, Pieter Abbeel, 27 Nov 2023 (v4), Ring Attention with Blockwise Transformers for Near-Infinite Context, https://arxiv.org/abs/2310.01889 https://github.com/lhao499/llm_large_context (Original paper for ring attention.)
- William Brandon, Aniruddha Nrusimha, Kevin Qian, Zachary Ankner, Tian Jin, Zhiye Song, Jonathan Ragan-Kelley, 15 Nov 2023, Striped Attention: Faster Ring Attention for Causal Transformers, https://arxiv.org/abs/2311.09431
- Amy Yang, Jingyi Yang, Aya Ibrahim, Xinfeng Xie, Bangsheng Tang, Grigory Sizov, Jongsoo Park, Jianyu Huang, 4 Nov 2024, Context Parallelism for Scalable Million-Token Inference, https://arxiv.org/abs/2411.01783
- Zongwu Wang, Fangxin Liu, Mingshuai Li, Li Jiang, 29 Dec 2024, TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication, https://arxiv.org/abs/2412.20501 https://github.com/ACA-Lab-SJTU/token-ring (Ring attention with inter-GPU network transmission optimizations.)
- Seongho Hong, Yong-Hoon Choi, 2 Jan 2025, RingFormer: A Neural Vocoder with Ring Attention and Convolution-Augmented Transformer, https://arxiv.org/abs/2501.01182
- zhuzilin, Jan 2025, ring-flash-attention: Ring attention implementation with flash attention, https://github.com/zhuzilin/ring-flash-attention
- Kilian Haefeli, Simon Zirui Guo, Bonnie Li, 10 Apr 2024 Ring Attention Explained, https://coconut-mode.com/posts/ring-attention/
- Tanuj Sharma, Feb 23, 2024, Breaking the Boundaries: Understanding Context Window Limitations and the idea of Ring Attention, https://medium.com/@iamtanujsharma/breaking-the-boundaries-understanding-context-window-limitations-and-the-idea-of-ring-attention-170e522d44b2
- Nivas Jayaseelan, November 1, 2023, Understanding Ring Attention: Building Transformers With Near-Infinite Context, https://www.e2enetworks.com/blog/understanding-ring-attention-building-transformers-with-near-infinite-context
- Peter Chng, August 19, 2024, Ring Attention - scaling attention across multiple devices, https://peterchng.com/blog/2024/08/19/ring-attention-scaling-attention-across-multiple-devices/
- Nouamane Tazi, Ferdinand Mom, Haojun Zhao, Phuc Nguyen, Mohamed Mekkouri, Leandro Werra, Thomas Wolf, Feb 19, 2025, The Ultra-Scale Playbook: Training LLMs on GPU Clusters, Hugging Face, https://huggingface.co/spaces/nanotron/ultrascale-playbook https://huggingface.co/spaces/nanotron/ultrascale-playbook/resolve/main/The_Ultra-Scale_Playbook_Training_LLMs_on_GPU_Clusters.pdf
- Stephen Diehl, 2025, Attention Wasn't All We Needed, https://www.stephendiehl.com/posts/post_transformers/
More Attention Research Topics
Related LLM research areas for long context optimization of the attention methods include:
- Attention optimization (main page)
- Local attention
- Linear attention
- Sparse attention
- Multi-Head Attention (MHA)
- Muti-Query Attention (MQA)
- Group-Query Attention (GQA)
- Flash attention
- Paged attention
Other topics in attention research:
- Low-rank matrix attention
- Medusa attention
- Block attention
- Cross attention
- Fused head attention
- Hybrid local-global attention
- FFT attention
- QKV computation optimizations
- Additive attention
- Multiplicative attention
- Graph attention
- Chunked attention
- Attention sink
- Attention steering
- Bilinear attention
- Attention-free methods
- Mixture-of-Heads (MOH) Attention (MoE+MHA)
AI Books from Aussie AI
![]() |
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |
![]() |
RAG Optimization: Accurate and Efficient LLM Applications:
new book on RAG architectures:
Get your copy from Amazon: RAG Optimization |
![]() |
Generative AI Applications book:
Get your copy from Amazon: Generative AI Applications |
![]() |
Generative AI programming book:
Get your copy from Amazon: Generative AI in C++ |
![]() |
CUDA C++ Optimization book:
Get your copy from Amazon: CUDA C++ Optimization |
![]() |
CUDA C++ Debugging book:
Get your copy from Amazon: CUDA C++ Debugging |
More AI Research
Read more about: