Aussie AI
Outliers in Quantization
-
Last Updated 7 February, 2026
-
by David Spuler, Ph.D.
Research on Outliers in Quantization
Research papers include:
- Jerry Yao-Chieh Hu, Pei-Hsuan Chang, Robin Luo, Hong-Yu Chen, Weijian Li, Wei-Po Wang, Han Liu, 4 Apr 2024, Outlier-Efficient Hopfield Layers for Large Transformer-Based Models, https://arxiv.org/abs/2404.03828 Code: https://github.com/MAGICS-LAB/OutEffHop (Addresses outliers in quantization with a modified Softmax and an advanced Hopfield memory model.)
- Xing Hu, Yuan Chen, Dawei Yang, Sifan Zhou, Zhihang Yuan, Jiangyong Yu, Chen Xu, 28 May 2024, I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models, https://arxiv.org/abs/2405.17849 Code: https://anonymous.4open.science/r/I-LLM-F242/
- Wanyun Cui, Qianle Wang, 3 Apr 2024, Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models, https://arxiv.org/abs/2404.02837 (Examines which weights most affect inference, including outlier values.)
- Nikita Trukhanov, Ilya Soloveychik, 29 Mar 2024, Accurate Block Quantization in LLMs with Outliers, https://arxiv.org/abs/2403.20137 (Analyzes block floating point number formats in block quantization with a focus on the KV cache memory reduction, including the use of permutations to reorder tensor weight rows.)
- Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey
- Daliang Xu, Hao Zhang, Liming Yang, Ruiqi Liu, Mengwei Xu, and Xuanzhe Liu, 11 June 2024, WiP: Efficient LLM Prefilling with Mobile NPU, EdgeFM '24: Proceedings of the Workshop on Edge and Mobile Foundation Models, June 2024, Pages 33 - 35, https://doi.org/10.1145/3662006.3662066 https://dl.acm.org/doi/abs/10.1145/3662006.3662066 PDF: https://dl.acm.org/doi/pdf/10.1145/3662006.3662066 (Faster NPU prefill via chunked prefilling using sequences of tokens, along with INT8 NPU quantization that is aware of outliers and offloads FP32 calculations from NPU back to CPU.)
- Franklin Huang, May 17, 2024, Machine Learning Systems with Reduced Memory Requirements, Masters of Science, Electrical Engineering and Computer Sciences, University of California, Berkeley, Technical Report No. UCB/EECS-2024-120 http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-120.html https://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-120.pdf Code: https://github.com/hongyihuang/spec-mcts/blob/main/triton (Broad paper that examines a lot of different optimizations that reduce memory costs, including quantization, kernel fusion, sparsity, MatMul optimizations, KV cache compression, and various other methods.)
- Jinguang Wang, Yuexi Yin, Haifeng Sun, Qi Qi, Jingyu Wang, Zirui Zhuang, Tingting Yang, Jianxin Liao, 27 Jun 2024, OutlierTune: Efficient Channel-Wise Quantization for Large Language Models, https://arxiv.org/abs/2406.18832
- Wonbeom Lee, Jungi Lee, Junghwan Seo, Jaewoong Sim, 28 Jun 2024, InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management, https://arxiv.org/abs/2406.19707 (KV caching optimization using salient token pruning for the attention layer.)
- Lianwei Yang, Haisong Gong, 6 Aug 2024, DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers, https://arxiv.org/abs/2408.03291
- Guangxuan Xiao, May 2024, Efficient Deployment Algorithms for Large Language Models, Masters Thesis, MIT, https://dspace.mit.edu/bitstream/handle/1721.1/156332/xiao-xgx-sm-eecs-2024-thesis.pdf
- Xin Ding, Xiaoyu Liu, Zhijun Tu, Yun Zhang, Wei Li, Jie Hu, Hanting Chen, Yehui Tang, Zhiwei Xiong, Baoqun Yin, Yunhe Wang, 15 Apr 2024 (v4), CBQ: Cross-Block Quantization for Large Language Models, https://arxiv.org/abs/2312.07950
- Bowen Zhao, Hannaneh Hajishirzi, Qingqing Cao, 4 Jun 2024 (v2), APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference, ICML 2024 Oral, https://arxiv.org/abs/2401.12200 https://github.com/ROIM1998/APT
- Yupeng Su, Ziyi Guan, Xiaoqun Liu, Tianlai Jin, Dongkuan Wu, Graziano Chesi, Ngai Wong, Hao Yu, 20 Aug 2024, LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models, https://arxiv.org/abs/2408.10631 https://github.com/YupengSu/LLM-Barber
- Jahyun Koo, Dahoon Park, Sangwoo Jung, Jaeha Kung, 6 Sep 2024, OPAL: Outlier-Preserved Microscaling Quantization A ccelerator for Generative Large Language Models, https://arxiv.org/abs/2409.05902
- Yanshu Wang, Tong Yang, Xiyan Liang, Guoan Wang, Hanning Lu, Xu Zhe, Yaoming Li, Li Weitao, 18 Sep 2024, Art and Science of Quantizing Large-Scale Models: A Comprehensive Overview, https://arxiv.org/abs/2409.11650 (Extensive survey of quantization from the basics to SOTA approaches, with also some coverage of knowledge distillation and KV cache compression.)
- Ruihao Gong, Yifu Ding, Zining Wang, Chengtao Lv, Xingyu Zheng, Jinyang Du, Haotong Qin, Jinyang Guo, Michele Magno, Xianglong Liu, 25 Sep 2024, A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms, https://arxiv.org/abs/2409.16694
- Yifei Liu, Jicheng Wen, Yang Wang, Shengyu Ye, Li Lyna Zhang, Ting Cao, Cheng Li, Mao Yang, 25 Sep 2024, VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models, https://arxiv.org/abs/2409.17066 https://arxiv.org/pdf/2409.17066
- Tianzhu Ye, Li Dong, Yuqing Xia, Yutao Sun, Yi Zhu, Gao Huang, Furu Wei, 7 Oct 2024, Differential Transformer, https://arxiv.org/abs/2410.05258
- Ke Yi, Zengke Liu, Jianwei Zhang, Chengyuan Li, Tong Zhang, Junyang Lin, Jingren Zhou, 30 Sep 2024, Rotated Runtime Smooth: Training-Free Activation Smoother for accurate INT4 inference, https://arxiv.org/abs/2409.20361 (Handling of outliers in INT4 quantization.)
- Mengzhao Chen, Yi Liu, Jiahao Wang, Yi Bin, Wenqi Shao, Ping Luo, 7 Oct 2024, PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs, https://arxiv.org/abs/2410.05265 https://github.com/ChenMnZ/PrefixQuant (Puts outliers into the KV cache as a prefix.)
- Akshat Ramachandran, Souvik Kundu, Tushar Krishna, 12 Nov 2024 (v2), MicroScopiQ: Accelerating Foundational Models through Outlier-Aware Microscaling Quantization, https://arxiv.org/abs/2411.05282
- Dongwei Wang, Huanrui Yang, 8 Dec 2024, Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization, https://arxiv.org/abs/2412.06858
- H Kang, Q Zhang, S Kundu, G Jeong, Z Liu, T Krishna, Dec 2024, GEAR: An Efficient Error Reduction Framework for KV Cache Compression in LLM Inference, https://neurips2024-enlsp.github.io/papers/paper_3.pdf (Use extra information in low-rank and sparse matrices to efficiently alleviate lossy KV cache quantization issues such as outliers.)
- Jiun-Man Chen, Yu-Hsuan Chao, Yu-Jie Wang, Ming-Der Shieh, Chih-Chung Hsu, Wei-Fen Lin, 11 Mar 2024, QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning, https://arxiv.org/abs/2403.06497 (Outlier-correcting fine-tuning and quantization method.)
- Kyle Wiggers, December 23, 2024, A popular technique to make AI more efficient has drawbacks, https://techcrunch.com/2024/12/23/a-popular-technique-to-make-ai-more-efficient-has-drawbacks/
- Haoyang Li, Yiming Li, Anxin Tian, Tianhao Tang, Zhanchao Xu, Xuejia Chen, Nicole Hu, Wei Dong, Qing Li, Lei Chen, 27 Dec 2024, A Survey on Large Language Model Acceleration based on KV Cache Management, https://arxiv.org/abs/2412.19442 (Huge survey of all KV cache optimization methods.)
- S. Kim, H. Lee, S. Kim, C. Kim and W. W. Ro, "AirGun: Adaptive Granularity Quantization for Accelerating Large Language Models," 2024 IEEE 42nd International Conference on Computer Design (ICCD), Milan, Italy, 2024, pp. 645-652, doi: 10.1109/ICCD63220.2024.00103. https://ieeexplore.ieee.org/abstract/document/10818069
- Xuerui Qiu, Jieyuan Zhang, Wenjie Wei, Honglin Cao, Junsheng Guo, Rui-Jie Zhu, Yimeng Shan, Yang Yang, Malu Zhang, Haizhou Li, 23 Jan 2025, Quantized Spike-driven Transformer, https://arxiv.org/abs/2501.13492 https://github.com/bollossom/QSD-Transformer
- Minhajul Hoque, Jan 4, 2025, DeepSeek V3: How They Achieved Big Results with Small Compute, https://ai.plainenglish.io/deepseek-v3-how-they-achieved-big-results-with-small-compute-fb694606d59a (DeepSeek optimizations included FP8 quantization with outlier handling, attention and KV cache optimization via Multi-Head Latent Attention (MHLA), and multi-token decoding.)
- Zunhai Su, Zhe Chen, Wang Shen, Hanyu Wei, Linge Li, Huangqi Yu, Kehong Yuan, 25 Jan 2025, RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations, https://arxiv.org/abs/2501.16383 (INT2 KV caching with special handling of outliers, RoPE, and attention sinks, and the resulting architecture works in Chain-of-Thought.)
- Mingyu Jin, Kai Mei, Wujiang Xu, Mingjie Sun, Ruixiang Tang, Mengnan Du, Zirui Liu, Yongfeng Zhang, 3 Feb 2025, Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding, https://arxiv.org/abs/2502.01563 https://github.com/MingyuJ666/Rope_with_LLM (Finds that outliers in attention are important, and arise by being generated by RoPE.)
- Songhao Wu, Ang Lv, Xiao Feng, Yufei Zhang, Xun Zhang, Guojun Yin, Wei Lin, Rui Yan, 1 Feb 2025, PolarQuant: Leveraging Polar Transformation for Efficient Key Cache Quantization and Decoding Acceleration, https://arxiv.org/abs/2502.00527
- G. Wang, S. Cai, W. Li, D. Lyu and G. He, "OFQ-LLM: Outlier-Flexing Quantization for Efficient Low-Bit Large Language Model Acceleration," in IEEE Transactions on Circuits and Systems I: Regular Papers, doi: 10.1109/TCSI.2025.3547732. https://ieeexplore.ieee.org/abstract/document/10924797
- Yi Su, Yuechi Zhou, Quantong Qiu, Juntao Li, Qingrong Xia, Ping Li, Xinyu Duan, Zhefeng Wang, Min Zhang, 16 May 2025, Accurate KV Cache Quantization with Outlier Tokens Tracing, https://arxiv.org/abs/2505.10938
- P Czakó, G Kertész, S Szénási, 2025, Addressing Activation Outliers in LLMs: A Systematic Review of Post-Training Quantization Techniques, IEEE Access, 2025, https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10994764
- M. Seo, J. Hyun, S. Jeong, X. T. Nguyen, H. -J. Lee and H. Lee, "OASIS: Outlier-Aware KV Cache Clustering for Scaling LLM Inference in CXL Memory Systems," in IEEE Computer Architecture Letters, doi: 10.1109/LCA.2025.3567844, https://ieeexplore.ieee.org/abstract/document/10990150
- Yutong Liu, Cairong Zhao, Guosheng Hu, 23 Jul 2025, A Comprehensive Evaluation on Quantization Techniques for Large Language Models, https://arxiv.org/pdf/2507.17417
- Dongyeun Lee, Jiwan Hur, Hyounguk Shon, Jae Young Lee, Junmo Kim, 17 Jul 2025, DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization, https://arxiv.org/abs/2507.12933
- Joonsung Kang, 23 Jul 2025, Doubly robust outlier resistant inference on causal treatment effect, https://arxiv.org/abs/2507.17439
- Ivan Letteri, 20 Jul 2025, A Comparative Analysis of Statistical and Machine Learning Models for Outlier Detection in Bitcoin Limit Order Books, https://arxiv.org/abs/2507.14960
- Arend Hintze and Clifford Bohm, 11 Aug 2025, Rethinking Self-Replication: Detecting Distributed Selfhood in the Outlier Cellular Automaton, https://arxiv.org/abs/2508.08047
- Tanvir Islam, 26 Jul 2025, Extended Histogram-based Outlier Score (EHBOS), https://arxiv.org/abs/2502.05719
- Marcello D'Orazio, 28 Jul 2025, An empirical comparison of some outlier detection methods with longitudinal data, https://arxiv.org/abs/2507.21203
- Katharine M. Clark and Paul D. McNicholas, 31 Jul 2025, funOCLUST: Clustering Functional Data with Outliers, https://arxiv.org/abs/2508.00110
- Jiaxi Li, Lu Yin, Xilu Wang, 4 Aug 2025, OWLed: Outlier-weighed Layerwise Pruning for Efficient Autonomous Driving Framework, https://arxiv.org/abs/2411.07711
- Muhammad Rajabinasab, Farhad Pakdaman, Moncef Gabbouj, Peter Schneider-Kamp, Arthur Zimek, 18 Aug 2025, Randomized PCA Forest for Outlier Detection, https://arxiv.org/abs/2508.12776
- Mingyu Kim, Daniel Stilwell, Jorge Jimenez, 18 Aug 2025, Outlier Detection of Poisson-Distributed Targets Using a Seabed Sensor Network, https://arxiv.org/abs/2508.13099
- K\'evin Ducharlet, Louise Trav\'e-Massuy\`es, Jean-Bernard Lasserre, Marie-V\'eronique Le Lann, Youssef Miloudi, 13 Aug 2025, Leveraging the Christoffel Function for Outlier Detection in Data Streams, https://arxiv.org/abs/2508.16617
- Sunwoo Kim, 17 Aug 2025, Deep Learning and Matrix Completion-aided IoT Network Localization in the Outlier Scenarios, https://arxiv.org/abs/2508.18225
- Ryan Faulkner, Ian Reid, Simon Ratcliffe, Tat-Jun Chin, 25 Aug 2025, Finding Outliers in a Haystack: Anomaly Detection for Large Pointcloud Scenes, https://arxiv.org/abs/2508.17634
- Paul Fogel (1), Christophe Geissler (1), George Luta (2) ((1) Data Services, Forvis Mazars, Levallois, France, (2) Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University Medical Center, Washington, DC, USA), 22 Aug 2025, The Target Polish: A New Approach to Outlier-Resistant Non-Negative Matrix Factorization, https://arxiv.org/abs/2507.10484
- Simon Kl\"uttermann and Emmanuel M\"uller, 1 Sep 2025, Deep Transductive Outlier Detection, https://arxiv.org/abs/2404.03495
- Dietmar Saupe, Tim Bleile, 8 Sep 2025, Robustness and accuracy of mean opinion scores with hard and soft outlier detection, https://arxiv.org/abs/2509.06554
- Bingxin Xu, Zhen Dong, Oussama Elachqar, Yuzhang Shang, 11 Sep 2025, ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms, https://arxiv.org/abs/2509.09679
- Sehoon Kim, Coleman Hooper, Amir Gholami, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W. Mahoney, Kurt Keutzer, 5 Jun 2024 (v4), SqueezeLLM: Dense-and-Sparse Quantization, https://arxiv.org/abs/2306.07629 https://github.com/SqueezeAILab/SqueezeLLM (Separating outliers into a separately-stored small sparse data structure with full precision values, while "squeezing" the main LLM with low-bit quantization.)
- Mengxia Yu, De Wang, Qi Shan, Colorado J Reed, Alvin Wan, 7 Jul 2025 (v2), The Super Weight in Large Language Models, https://arxiv.org/pdf/2411.07191
- Arslan Majal, Aamir Hussain Chughtai and Muhammad Tahir, 9 Sep 2025, EMORF-II: Adaptive EM-based Outlier-Robust Filtering with Correlated Measurement Noise, https://arxiv.org/abs/2509.07415
- Waqar Ahmad, Evan Murphy, Vladimir A. Krylov, 12 Sep 2025, Similarity-based Outlier Detection for Noisy Object Re-Identification Using Beta Mixtures, https://arxiv.org/abs/2509.08926
- Avinash Patil, 19 Sep 2025, When Bugs Linger: A Study of Anomalous Resolution Time Outliers and Their Themes, https://arxiv.org/abs/2509.16140
- Ye Qiao, Sitao Huang, 17 Sep 2025, Q-ROAR: Outlier-Aware Rescaling for RoPE Position Interpolation in Quantized Long-Context LLMs, https://arxiv.org/abs/2509.14391
- Yigit E. Yildirim, Samet Demir, Zafer Dogan, 18 Sep 2025, Benefits of Online Tilted Empirical Risk Minimization: A Case Study of Outlier Detection and Robust Regression, https://arxiv.org/abs/2509.15141
- Georgios Vlassis, Saleh Ashkboos, Alexandra Volkova, Torsten Hoefler, Dan Alistarh, 2 Oct 2025, Beyond Outliers: A Study of Optimizers Under Quantization, https://arxiv.org/abs/2509.23500
- Ye Qiao, Haocheng Xu, Xiaofan Zhang, Sitao Huang, 26 Sep 2025, Rethinking RoPE Scaling in Quantized LLM: Theory, Outlier, and Channel-Band Analysis with Weight Rescaling, https://arxiv.org/abs/2510.00028
- Hongkang Li, Songtao Lu, Xiaodong Cui, Pin-Yu Chen, Meng Wang, 1 Oct 2025, Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis, https://arxiv.org/abs/2510.00399
- Buhe Li, Berkay Kaplan, Maksym Lazirko, Aleksandr Kogan, 19 Sep 2025, Unsupervised Outlier Detection in Audit Analytics: A Case Study Using USA Spending Data, https://arxiv.org/abs/2509.19366
- Trung Nguyen Thanh, Huyen Giang Thi Thu, Tai Le Quy, Ha-Bang Ban, 23 Sep 2025, Constraint-Reduced MILP with Local Outlier Factor Modeling for Plausible Counterfactual Explanations in Credit Approval, https://arxiv.org/abs/2509.19504
- Akira Tamamori, 28 Oct 2025, Localized Kernel Projection Outlyingness: A Two-Stage Approach for Multi-Modal Outlier Detection, https://arxiv.org/abs/2510.24043
- Ziyi Fang, Lingxiao Huang, Runkai Yang, 28 Oct 2025, Coreset for Robust Geometric Median: Eliminating Size Dependency on Outliers, https://arxiv.org/abs/2510.24621
- Jonah Ekelund, Savvas Raptis, Vicki Toy-Edens, Wenli Mo, Drew L. Turner, Ian J. Cohen, Stefano Markidis, 23 Oct 2025, Adaptive PCA-Based Outlier Detection for Multi-Feature Time Series in Space Missions, https://arxiv.org/abs/2504.15846
- Juan A. Lara, David Lizcano, V\'ictor Ramp\'erez, Javier Soriano, 27 Oct 2025, A method for outlier detection based on cluster analysis and visual expert criteria, https://arxiv.org/abs/2510.23136
- William Roy Orchard, Nastaran Okati, Sergio Hernan Garrido Mejia, Patrick Bl\"obaum, Dominik Janzing, 25 Oct 2025, Root Cause Analysis of Outliers with Missing Structural Knowledge, https://arxiv.org/abs/2406.05014
- Carlo Dindorf, Jonas Dully, Steven Simon, Dennis Perchthaler, Stephan Becker, Hannah Ehmann, Kjell Heitmann, Bernd Stetter, Christian Diers, Michael Fr\"ohlich, 26 Sep 2025, Outlier Detection in Plantar Pressure: Human-Centered Comparison of Statistical Parametric Mapping and Explainable Machine Learning, https://arxiv.org/abs/2509.21943
- Daniela Schkoda, Dominik Janzing, 8 Oct 2025, Root Cause Analysis of Outliers in Unknown Cyclic Graphs, https://arxiv.org/abs/2510.06995
- Iuri Macocco, Nora Graichen, Gemma Boleda, Marco Baroni, 3 Oct 2025, Not a nuisance but a useful heuristic: Outlier dimensions favor frequent tokens in language models, https://arxiv.org/abs/2503.21718
- Yuchen Shen, Haomin Wen and Leman Akoglu, 24 Sep 2025, FoMo-0D: A Foundation Model for Zero-shot Tabular Outlier Detection, https://arxiv.org/abs/2409.05672
- Zifan Wang, Xinlei Yi, Xenia Konti, Michael M. Zavlanos, and Karl H. Johansson, 29 Sep 2025, Distributionally Robust Federated Learning with Outlier Resilience, https://arxiv.org/abs/2509.24462
- Sajjad Hashemian, Mohammad Saeed Arvenaghi, Ebrahim Ardeshir-Larijani, 6 Oct 2025, Optimal Bound for PCA with Outliers using Higher-Degree Voronoi Diagrams, https://arxiv.org/abs/2408.06867
- Yihao Ang, Peicheng Yao, Yifan Bao, Yushuo Feng, Qiang Huang, Anthony K. H. Tung, Zhiyong Huang, 9 Oct 2025, RFOD: Random Forest-based Outlier Detection for Tabular Data, https://arxiv.org/abs/2510.08747
- Ilyas Varshavskiy, Bonu Boboeva, Shuhrat Khalilbekov, Azizjon Azimi, Sergey Shulgin, Akhlitdin Nizamitdinov, Haitz Saez de Ocariz Borde, 10 Oct 2025, Mitigating Model Drift in Developing Economies Using Synthetic Data and Outliers, https://arxiv.org/abs/2510.09294
- Cong Zeng, Shengkun Tang, Yuanzhou Chen, Zhiqiang Shen, Wenchao Yu, Xujiang Zhao, Haifeng Chen, Wei Cheng, Zhiqiang Xu, 7 Oct 2025, Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection, https://arxiv.org/abs/2510.08602
- Shivam Patel, Neharika Jali, Ankur Mallick, Gauri Joshi, 10 Oct 2025, ProxRouter: Proximity-Weighted LLM Query Routing for Improved Robustness to Outliers, https://arxiv.org/abs/2510.09852
- Haozheng Luo, Zhuolin Jiang, Md Zahid Hasan, Yan Chen, Soumalya Sarkar, 26 Jan 2026, FROST: Filtering Reasoning Outliers with Attention for Efficient Reasoning, https://arxiv.org/abs/2601.19001 https://github.com/robinzixuan/FROST
- James Pan, Guoliang Li, 27 Jun 2025, A Survey of LLM Inference Systems, https://arxiv.org/abs/2506.21901
AI Books from Aussie AI
|
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |
|
RAG Optimization: Accurate and Efficient LLM Applications:
new book on RAG architectures:
Get your copy from Amazon: RAG Optimization |
|
Generative AI Applications book:
Get your copy from Amazon: Generative AI Applications |
|
Generative AI programming book:
Get your copy from Amazon: Generative AI in C++ |
|
CUDA C++ Optimization book:
Get your copy from Amazon: CUDA C++ Optimization |
|
CUDA C++ Debugging book:
Get your copy from Amazon: CUDA C++ Debugging |
More AI Research Topics
Read more about:
- 500+ LLM Inference Optimization Techniques
- What's Hot in LLM Inference Optimization in 2025?
- Inference Optimization Research
- « Research Home