Aussie AI

Ring Attention

Last Updated 8 August, 2025

by David Spuler, Ph.D.

Ring attention is an LLM optimization of the attention module using blockwise computations. The aim is to speed up the calculations of the self-attention step in either training or inference. Ring attention is a method that can be combined orthogonally with some of the other memory-efficient attention algorithms, such as with Flash attention.

Research on Ring Attention

Research papers on ring attention include:

Hao Liu, Matei Zaharia, Pieter Abbeel, 27 Nov 2023 (v4), Ring Attention with Blockwise Transformers for Near-Infinite Context, https://arxiv.org/abs/2310.01889 https://github.com/lhao499/llm_large_context (Original paper for ring attention.)
William Brandon, Aniruddha Nrusimha, Kevin Qian, Zachary Ankner, Tian Jin, Zhiye Song, Jonathan Ragan-Kelley, 15 Nov 2023, Striped Attention: Faster Ring Attention for Causal Transformers, https://arxiv.org/abs/2311.09431
Amy Yang, Jingyi Yang, Aya Ibrahim, Xinfeng Xie, Bangsheng Tang, Grigory Sizov, Jongsoo Park, Jianyu Huang, 4 Nov 2024, Context Parallelism for Scalable Million-Token Inference, https://arxiv.org/abs/2411.01783
Zongwu Wang, Fangxin Liu, Mingshuai Li, Li Jiang, 29 Dec 2024, TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication, https://arxiv.org/abs/2412.20501 https://github.com/ACA-Lab-SJTU/token-ring (Ring attention with inter-GPU network transmission optimizations.)
Seongho Hong, Yong-Hoon Choi, 2 Jan 2025, RingFormer: A Neural Vocoder with Ring Attention and Convolution-Augmented Transformer, https://arxiv.org/abs/2501.01182
zhuzilin, Jan 2025, ring-flash-attention: Ring attention implementation with flash attention, https://github.com/zhuzilin/ring-flash-attention
Kilian Haefeli, Simon Zirui Guo, Bonnie Li, 10 Apr 2024 Ring Attention Explained, https://coconut-mode.com/posts/ring-attention/
Tanuj Sharma, Feb 23, 2024, Breaking the Boundaries: Understanding Context Window Limitations and the idea of Ring Attention, https://medium.com/@iamtanujsharma/breaking-the-boundaries-understanding-context-window-limitations-and-the-idea-of-ring-attention-170e522d44b2
Nivas Jayaseelan, November 1, 2023, Understanding Ring Attention: Building Transformers With Near-Infinite Context, https://www.e2enetworks.com/blog/understanding-ring-attention-building-transformers-with-near-infinite-context
Peter Chng, August 19, 2024, Ring Attention - scaling attention across multiple devices, https://peterchng.com/blog/2024/08/19/ring-attention-scaling-attention-across-multiple-devices/
Nouamane Tazi, Ferdinand Mom, Haojun Zhao, Phuc Nguyen, Mohamed Mekkouri, Leandro Werra, Thomas Wolf, Feb 19, 2025, The Ultra-Scale Playbook: Training LLMs on GPU Clusters, Hugging Face, https://huggingface.co/spaces/nanotron/ultrascale-playbook https://huggingface.co/spaces/nanotron/ultrascale-playbook/resolve/main/The_Ultra-Scale_Playbook_Training_LLMs_on_GPU_Clusters.pdf
Stephen Diehl, 2025, Attention Wasn't All We Needed, https://www.stephendiehl.com/posts/post_transformers/