Aussie AI

Pyramid Inference

Last Updated 17 November, 2025

by David Spuler, Ph.D.

What is Pyramid Inference?

Pyramid inference is an LLM efficiency optimization based on adaptive inference, where the processing dynamically reduces on two dimensions up to a "peak" at the end with a small and focused area of computation. One way to do pyramid inference is via dual pruning optimizations, with adaptive pruning on two dimensions (e.g., combining layer-based depth pruning and attention head width pruning). Computation begins with a broad set of data on three tensor computation dimensions (length, depth, and width), as usual for LLM inference, but is reduced on two dimensions as inference progresses (e.g., through layers), so that the final steps of inference computation are only considering a small subset of the area. This yields a pyramid shaped structure in the computation with a broad base at the start and a narrow, sharp peak at the end of inference.

Research on Pyramid Inference

Research papers on pyramid LLM inference optimizations:

K. He, X. Zhang, S. Ren, J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (9) (2015) 1904–1916. doi: 10.1109/TPAMI.2015.2389824. http://dx.doi.org/10.1109/TPAMI.2015.2389824
Long Xing, Qidong Huang, Xiaoyi Dong, Jiajie Lu, Pan Zhang, Yuhang Zang, Yuhang Cao, Conghui He, Jiaqi Wang, Feng Wu, Dahua Lin, 22 Oct 2024, PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction, https://arxiv.org/abs/2410.17247
Yipeng Zhang, Yifan Liu, Zonghao Guo, Yidan Zhang, Xuesong Yang, Chi Chen, Jun Song, Bo Zheng, Yuan Yao, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun, 18 Dec 2024, LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer, https://arxiv.org/abs/2412.13871
Xuanli He, Iman Keivanloo, Yi Xu, Xiang He, Belinda Zeng, Santosh Rajagopalan, Trishul Chilimbi, 30 Oct 2021, Magic Pyramid: Accelerating Inference with Early Exiting and Token Pruning, https://arxiv.org/abs/2111.00230
Zhaokai Wang, Xizhou Zhu, Xue Yang, Gen Luo, Hao Li, Changyao Tian, Wenhan Dou, Junqi Ge, Lewei Lu, Yu Qiao, Jifeng Dai, 14 Jan 2025, Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding, https://arxiv.org/abs/2501.07783
Xiaojiao Xiao, Qinmin Vivian Hu, Guanghui Wang, 22 Jul 2025, Pyramid Hierarchical Masked Diffusion Model for Imaging Synthesis, https://arxiv.org/abs/2507.16579
Max Hahn-Klimroth, Jo\~ao Pedro Meireles, Laurie Bingaman Lackey, Nick van Eeuwijk Mads F. Bertelsen, Paul W. Dierkes, Marcus Clauss, 5 Aug 2025, A semi-automatic approach to study population dynamics based on population pyramids, https://arxiv.org/abs/2508.03788
Qianyang Li, Xingjun Zhang, Shaoxun Wang, Jia Wei, 19 Sep 2025, DPANet: Dual Pyramid Attention Network for Multivariate Time Series Forecasting, https://arxiv.org/abs/2509.14868
Chenglin Yu, Yang Yu, Songmiao Wang, Yucheng Wang, Yifan Yang, Jinjia Li, Ming Li, Hongxia Yang, 26 Sep 2025, InfiAgent: Self-Evolving Pyramid Agent Framework for Infinite Scenarios, https://arxiv.org/abs/2509.22502
Arshia Yousefi Nezhad, Helia Aghaei, Hedieh Sajedi, 28 Sep 2025, PVTAdpNet: Polyp Segmentation using Pyramid vision transformer with a novel Adapter block, https://arxiv.org/abs/2509.23751
Dayu Tan, Cheng Kong, Yansen Su, Hai Chen, Dongliang Yang, Junfeng Xia, and Chunhou Zheng, 29 Sep 2025, An Enhanced Pyramid Feature Network Based on Long-Range Dependencies for Multi-Organ Medical Image Segmentation, https://arxiv.org/abs/2509.24358
Jiahui Hong, Siqing Li, Muqing Jian, Luming Yang, 11 Oct 2025, Bidirectional Time-Frequency Pyramid Network for Enhanced Robust EEG Classification, https://arxiv.org/abs/2510.10004
Heming Wu, Di Wang, Tai Ma, Peng Zhao, Yubin Xiao, Zhongke Wu, Xing-Ce Wang, Chuang Li, Xuan Wu, You Zhou, 9 Oct 2025, TCIP: Threshold-Controlled Iterative Pyramid Network for Deformable Medical Image Registration, https://arxiv.org/abs/2510.07666