Aussie AI

Lightning Attention

  • Last Updated 26 August, 2025
  • by David Spuler, Ph.D.

Lightning attention is an LLM efficiency optimization for faster attention kernels. It is a type of linear attention, which allows it to be used in long context and even ultra-long context models over 1M tokens.

Research on Lightning Attention

  • MiniMax, Aonian Li, Bangwei Gong, Bo Yang, Boji Shan, Chang Liu, Cheng Zhu, Chunhao Zhang, Congchao Guo, Da Chen, Dong Li, Enwei Jiao, Gengxin Li, Guojun Zhang, Haohai Sun, Houze Dong, Jiadai Zhu, Jiaqi Zhuang, Jiayuan Song, Jin Zhu, Jingtao Han, Jingyang Li, Junbin Xie, Junhao Xu, Junjie Yan, Kaishun Zhang, Kecheng Xiao, Kexi Kang, Le Han, Leyang Wang, Lianfei Yu, Liheng Feng, Lin Zheng, Linbo Chai, Long Xing, Meizhi Ju, Mingyuan Chi, Mozhi Zhang, Peikai Huang, Pengcheng Niu, Pengfei Li, Pengyu Zhao, Qi Yang, Qidi Xu, Qiexiang Wang, Qin Wang, Qiuhui Li, Ruitao Leng, Shengmin Shi, Shuqi Yu, Sichen Li, Songquan Zhu, Tao Huang, Tianrun Liang, Weigao Sun, Weixuan Sun, Weiyu Cheng, Wenkai Li, Xiangjun Song, Xiao Su, Xiaodong Han, Xinjie Zhang, Xinzhu Hou, Xu Min, Xun Zou, Xuyang Shen, Yan Gong, Yingjie Zhu, Yipeng Zhou, Yiran Zhong, Yongyi Hu, Yuanxiang Fan, Yue Yu, Yufeng Yang, Yuhao Li, Yunan Huang, Yunji Li, Yunpeng Huang, Yunzhi Xu, Yuxin Mao, Zehan Li, Zekang Li, Zewei Tao, Zewen Ying, Zhaoyang Cong, Zhen Qin, Zhenhua Fan, Zhihang Yu, Zhuo Jiang, Zijia Wu, 14 Jan 2025, MiniMax-01: Scaling Foundation Models with Lightning Attention, https://arxiv.org/abs/2501.08313 https://github.com/MiniMax-AI (Content window over 1 million tokens.)
  • Zhen Qin, Weigao Sun, Dong Li, Xuyang Shen, Weixuan Sun, Yiran Zhong, 15 Jan 2024 (v2), Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models, https://arxiv.org/abs/2401.04658 https://github.com/OpenNLPLab/lightning-attention
  • Zhen Qin, Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen, Xiaodong Han, Yunshen Wei, Baohong Lv, Xiao Luo, Yu Qiao, Yiran Zhong, 19 Jan 2024 (v2), TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer, https://arxiv.org/abs/2307.14995 https://github.com/OpenNLPLab/TransnormerLLM (Lightning attention first version.)
  • MiniMax, Jan 2025, MiniMax-01: Scaling Foundation Models with Lightning Attention, https://filecdn.minimax.chat/_Arxiv_MiniMax_01_Report.pdf
  • MiniMax, Jan 2025, MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era, https://www.minimaxi.com/en/news/minimax-01-series-2
  • MiniMax: Aili Chen, Aonian Li, Bangwei Gong, Binyang Jiang, Bo Fei, Bo Yang, Boji Shan, Changqing Yu, Chao Wang, Cheng Zhu, Chengjun Xiao, Chengyu Du, Chi Zhang, Chu Qiao, Chunhao Zhang, Chunhui Du, Congchao Guo, Da Chen, Deming Ding, Dianjun Sun, Dong Li, Enwei Jiao, (and many more authors), 16 Jun 2025, MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention, https://arxiv.org/abs/2506.13585 https://github.com/MiniMax-AI/MiniMax-M1 (A 456B MoE reasoning model trained with RL and has various optimizations in training efficiency and attention kernel.)
  • Md Sultanul Arifin, Abu Nowshed Sakib, Yeasir Rayhan, Tanzima Hashem, 10 Aug 2025, Lightning Prediction under Uncertainty: DeepLight with Hazy Loss, https://arxiv.org/abs/2508.07428
  • Xufang Luo, Yuge Zhang, Zhiyuan He, Zilong Wang, Siyun Zhao, Dongsheng Li, Luna K. Qiu, Yuqing Yang, 5 Aug 2025, Agent Lightning: Train ANY AI Agents with Reinforcement Learning, https://arxiv.org/abs/2508.03680
  • Minjong Cheon, 6 Aug 2025, Mj\"olnir: A Deep Learning Parametrization Framework for Global Lightning Flash Density, https://arxiv.org/abs/2504.19822

More Attention Research Topics

Related LLM research areas for long context optimization of the attention methods include:

Other topics in attention research:

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research

Read more about: