Aussie AI
Reasoning Model Caching
-
Last Updated 23 April, 2026
-
by David Spuler, Ph.D.
Research on Reasoning Model Caching
Research papers include:
- Akshat Ramachandran, Marina Neseem, Charbel Sakr, Rangharajan Venkatesan, Brucek Khailany, Tushar Krishna, 1 Oct 2025, ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models, https://arxiv.org/abs/2510.01290
- Anna Kuzina, Maciej Pioro, Paul N. Whatmough, Babak Ehteshami Bejnordi, 2 Oct 2025, KaVa: Latent Reasoning via Compressed KV-Cache Distillation, https://arxiv.org/abs/2510.02312
- Haoyue Zhang, Hualei Zhang, Xiaosong Ma, Jie Zhang, Song Guo, 14 Oct 2025, LazyEviction: Lagged KV Eviction with Attention Pattern Observation for Efficient Long Reasoning, https://arxiv.org/abs/2506.15969
- Yu Fu, Zefan Cai, Abedelkadir Asi, Wayne Xiong, Yue Dong, Wen Xiao, 23 Oct 2025, Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning, https://arxiv.org/abs/2410.19258
- Adnan Oomerjee, Zafeirios Fountas, Haitham Bou-Ammar, Jun Wang, 26 Sep 2025, Bottlenecked Transformers: Periodic KV Cache Consolidation for Generalised Reasoning, https://arxiv.org/abs/2505.16950
- Kaiwen Chen, Xin Tan, Minchen Yu, Hong Xu, 29 Jul 2025, MemShare: Memory Efficient Inference for Large Reasoning Models through KV Cache Reuse, https://arxiv.org/abs/2507.21433
- Mingyuan Wu, Jize Jiang, Haozhen Zheng, Meitang Li, Zhaoheng Li, Beitong Tian, Bo Chen, Yongjoo Park, Minjia Zhang, Chengxiang Zhai, Klara Nahrstedt, 19 Sep 2025, Cache-of-Thought: Master-Apprentice Framework for Cost-Effective Vision Language Model Reasoning, https://arxiv.org/abs/2502.20587
AI Books from Aussie AI
|
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |
|
RAG Optimization: Accurate and Efficient LLM Applications:
new book on RAG architectures:
Get your copy from Amazon: RAG Optimization |
|
Generative AI Applications book:
Get your copy from Amazon: Generative AI Applications |
|
Generative AI programming book:
Get your copy from Amazon: Generative AI in C++ |
|
CUDA C++ Optimization book:
Get your copy from Amazon: CUDA C++ Optimization |
|
CUDA C++ Debugging book:
Get your copy from Amazon: CUDA C++ Debugging |
|
C++ AVX Optimization: CPU SIMD Vectorization:
Get your copy from Amazon: C++ AVX Optimization: CPU SIMD Vectorization |
|
C++ Ultra-Low Latency: Multithreading and Low-Level Optimizations:
Get your copy from Amazon: C++ Ultra-Low Latency |
More AI Research Topics
Read more about:
- 500+ LLM Inference Optimization Techniques
- What's Hot in LLM Inference Optimization in 2025?
- Inference Optimization Research
- « Research Home