Aussie AI
Constrained Decoding
-
Last Updated 26 April, 2026
-
by David Spuler, Ph.D.
Research on Constrained Decoding
Research papers include:
- Theia Vogel, December 18, 2023, How to make LLMs go fast, https://vgel.me/posts/faster-inference/
- Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng, 6 Jun 2024 (v2), SGLang: Efficient Execution of Structured Language Model Programs, https://arxiv.org/abs/2312.07104 https://github.com/sgl-project/sglang
- K Ahmed, KW Chang, G Van den Broeck, Oct 2024, Controllable Generation via Locally Constrained Resampling, Neurips Safe Generative AI Workshop 2024, https://openreview.net/pdf?id=v091fzXTu0
- Gaya Mehenni, Amal Zouaq, 23 Nov 2024, Ontology-Constrained Generation of Domain-Specific Clinical Summaries, https://arxiv.org/abs/2411.15666
- Will Kurt, Nov 2024, Say What You Mean: A Response to 'Let Me Speak Freely', https://blog.dottxt.co/say-what-you-mean.html
- Zhi Rui Tam, Cheng-Kuang Wu, Yi-Lin Tsai, Chieh-Yen Lin, Hung-yi Lee, Yun-Nung Chen, 14 Oct 2024 (v3), Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models, https://arxiv.org/abs/2408.02442
- Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. Guided open vocabulary image captioning with constrained beam search, 2017, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 936–945, https://arxiv.org/abs/1612.00576
- Chris Hokamp and Qun Liu, 2017, Lexically constrained decoding for sequence generation using grid beam search. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1535–1546, https://arxiv.org/abs/1704.07138
- Yizhe Zhang, Guoyin Wang, Chunyuan Li, Zhe Gan, Chris Brockett, and Bill Dolan. Pointer: Constrained text generation via insertion-based generative pre-training. arXiv preprint arXiv:2005.00558, 2020. https://arxiv.org/abs/2005.00558
- Saibo Geng, Martin Josifoski, Maxime Peyrard, Robert West, 18 Jan 2024 (v6), Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning, https://arxiv.org/abs/2305.13971 https://github.com/epfl-dlab/GCD
- Yanjun Fu, Ethan Baker, Yu Ding, Yizheng Chen, 20 Jul 2024 (v3), Constrained Decoding for Secure Code Generation, https://arxiv.org/abs/2405.00218 https://codeguardplus.github.io/
- Zekun Hao, David W. Romero, Tsung-Yi Lin, Ming-Yu Liu, 12 Dec 2024, Meshtron: High-Fidelity, Artist-Like 3D Mesh Generation at Scale, https://arxiv.org/abs/2412.09548 https://research.nvidia.com/labs/dir/meshtron/ (Optimizations to avoid the quadratic Transformer cost, in both training and inference, include "hourglass neural architecture" analogous to widthwise pruning or slimming, sliding window attention, rolling KV cache, truncated sequence training, and a "robust sampling strategy" that is effectively a type of constrained decoding based on mesh layouts.)
- Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yongkang Wu, Zhonghua Li, Qi Ye, Zhicheng Dou, 16 Dec 2024, RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation, https://arxiv.org/abs/2412.11919 https://github.com/sunnynexus/RetroLLM
- Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
- Haoran Wang, Kai Shu, Jan 2025, Make Every Token Count: A Systematic Survey on Decoding Methods for Foundation Model, https://www.researchgate.net/profile/Haoran-Wang-96/publication/387703971_Make_Every_Token_Count_A_Systematic_Survey_on_Decoding_Methods_for_Foundation_Models/links/67784c8ce74ca64e1f49eb15/Make-Every-Token-Count-A-Systematic-Survey-on-Decoding-Methods-for-Foundation-Models.pdf https://github.com/wang2226/Awesome-LLM-Decoding
- D Banerjee, T Suresh, S Ugare, S Misailovic, G Singh, Mar 2025, Preserving Reasoning Capabilities Under Constrained LLM Generation, https://openreview.net/pdf?id=RX3GIOkGHr
- Changran Xu, Yi Liu, Yunhao Zhou, Shan Huang, Ningyi Xu, Qiang Xu, 18 Mar 2025, Speculative Decoding for Verilog: Speed and Quality, All in One, https://arxiv.org/abs/2503.14153
- Niels M\"undler and Jasper Dekoninck and Martin Vechev, 13 Aug 2025, Constrained Decoding of Diffusion LLMs with Context-Free Grammars, https://arxiv.org/abs/2508.10111
- Lingxiao Li, Salar Rahili, Yiwei Zhao, 20 Aug 2025, Correctness-Guaranteed Code Generation via Constrained Decoding, https://arxiv.org/abs/2508.15866
- Parv Kapoor, Akila Ganlath, Changliu Liu, Sebastian Scherer, Eunsuk Kang, 1 Sep 2025, Constrained Decoding for Robotics Foundation Models, https://arxiv.org/abs/2509.01728
- Devansh, Sep 2025, The Chocolate Milk Cult’s Guide to Inference Scaling for AI Models: How to Reduce the costs of Running LLMs https://machine-learning-made-simple.medium.com/the-chocolate-milk-cults-guide-to-inference-scaling-for-ai-models-50aa2290eb50 (Deep analysis of using many progressive optimizations to real-life LLM inference.)
- Rajaa El Hamdani, Samy Haffoudhi, Nils Holzenberger, Fabian Suchanek, Thomas Bonald, and Fragkiskos D. Malliaros, 27 Sep 2025, Retrieval-Constrained Decoding Reveals Underestimated Parametric Knowledge in Language Models, https://arxiv.org/abs/2509.23417
- Donghoon Kim, Minji Bae, Kyuhong Shim, Byonghyo Shim, 21 Jul 2025, Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models, https://arxiv.org/abs/2505.08622
- Oscar Ma\~nas, Pierluca D'Oro, Koustuv Sinha, Adriana Romero-Soriano, Michal Drozdzal, Aishwarya Agrawal, 15 Aug 2025, Controlling Multimodal LLMs via Reward-guided Decoding, https://arxiv.org/abs/2508.11616
- Guofu Xie, Chen Zhang, Xiao Zhang, Yunsheng Shi, Ting Yao and Jun Xu, 4 Oct 2025, Merge and Guide: Unifying Model Merging and Guided Decoding for Controllable Multi-Objective Generation, https://arxiv.org/abs/2510.03782
- Zhenhua Liu, Lijun Li, Ruizhe Chen, Yuxian Jiang, Tong Zhu, Zhaochen Su, Wenliang Chen, Jing Shao, 4 Oct 2025, Evolutionary Guided Decoding: Iterative Value Refinement for LLMs, https://arxiv.org/abs/2503.02368
- Piotr Komorowski, Elena Golimblevskaia, Reduan Achtibat, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek, 30 Sep 2025, Attribution-Guided Decoding, https://arxiv.org/abs/2509.26307
- Ran Wang, Xiaoxuan Liu, Hao Ren, Gang Chen, Fanchao Qi, Maosong Sun, 22 Jul 2025, WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding, https://arxiv.org/abs/2507.16768
- Julian Oestreich and Lydia M\"uller, 21 Aug 2025, Evaluating Structured Decoding for Text-to-Table Generation: Evidence from Three Datasets, https://arxiv.org/abs/2508.15910
- Nan Xu, Shiheng Li, Shengchao Hou, 23 Apr 2026 (v2), From Image to Music Language: A Two-Stage Structure Decoding Approach for Complex Polyphonic OMR, https://arxiv.org/abs/2604.20522
- Yifan Le, 16 Apr 2026, Schema Key Wording as an Instruction Channel in Structured Generation under Constrained Decoding https://arxiv.org/abs/2604.14862
- Yixin Dong, Charlie F. Ruan, Yaxing Cai, Ruihang Lai, Ziyi Xu, Yilong Zhao, Tianqi Chen, 12 May 2025 (v3), XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models, https://arxiv.org/abs/2411.15100 (Speeding up CFG-based structured decoding with precomputed token masks.)
- Terry Koo, Frederick Liu, Luheng He, 5 Aug 2024 (v3), Automata-based constraints for language model decoding https://arxiv.org/abs/2407.08103
- Bailin Wang, Zi Wang, Xuezhi Wang, Yuan Cao, Rif A. Saurous, Yoon Kim, 3 Nov 2023 (v3), Grammar Prompting for Domain-Specific Language Generation with Large Language Models, https://arxiv.org/abs/2305.19234
- Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve, 31 Jan 2024 (v3), Code Llama: Open Foundation Models for Code, https://arxiv.org/abs/2308.12950
- Chaudhary, S., 2023, Code Alpaca: An Instruction-following LLaMA Model trained on code generation instructions, https://github.com/sahil280114/codealpaca
- 13 Dec 2023 (this version, v2)] StarCoder: may the source be with you! Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, et al., https://arxiv.org/abs/2305.06161 https://openreview.net/forum?id=KoFOg41haE
- Frederikke I. Marin, Dennis Pultz, Wouter Boomsma, 6 May 2025, Gene finding revisited: improved robustness through structured decoding from learned embeddings, https://arxiv.org/abs/2505.03377
- Zhimin Qiu, Di Wu, Feng Liu, Yuxiao Wang, 28 Jan 2026 (v2), Structure-Aware Decoding Mechanisms for Complex Entity Extraction with Large-Scale Language Models, https://arxiv.org/abs/2512.13980
- Avinash Reddy, Thayne T. Walker, James S. Ide, Amrit Singh Bedi, 8 Feb 2026, Draft-Conditioned Constrained Decoding for Structured Generation in LLMs, https://arxiv.org/abs/2603.03305
- Let's Data Science February 11, 2026, Structured Outputs: Making LLMs Return Reliable JSON, https://letsdatascience.com/blog/structured-outputs-making-llms-return-reliable-json
- Hongxu Zhou, 7 Apr 2026, From Hallucination to Structure Snowballing: The Alignment Tax of Constrained Decoding in LLM Reflection, https://arxiv.org/abs/2604.06066 https://github.com/hongxuzhou/agentic_llm_structured_self_critique
- Brandon T. Willard, Rémi Louf, 19 Aug 2023 (v4), Efficient Guided Generation for Large Language Models, https://arxiv.org/abs/2307.09702
- Zhengyang Su, Isay Katsman, Yueqi Wang, Ruining He, Lukasz Heldt, Raghunandan Keshavan, Shao-Chuan Wang, Xinyang Yi, Mingyan Gao, Onkar Dalal, Lichan Hong, Ed Chi, Ningren Han, 26 Feb 2026, Vectorizing the Trie: Efficient Constrained Decoding for LLM-based Generative Retrieval on Accelerators, https://arxiv.org/abs/2602.22647 https://github.com/youtube/static-constraint-decoding
- Aaron Pham, January 15, 2025, Structured Decoding in vLLM: A Gentle Introduction: Understand structure decoding and vLLM and how recent XGrammar integration can contribute to 5x improvement in TPOT, https://www.bentoml.com/blog/structured-decoding-in-vllm-a-gentle-introduction
- Kanghee Park, Timothy Zhou, Loris D'Antoni, 15 Jul 2025 (v2), Flexible and Efficient Grammar-Constrained Decoding, https://arxiv.org/abs/2502.05111
- Liangsheng Yin, Ying Sheng, Lianmin Zheng Feb 5, 2024, Fast JSON Decoding for Local LLMs with Compressed Finite State Machine, https://www.lmsys.org/blog/2024-02-05-compressed-fsm/
- Chuangtao Chen, Grace Li Zhang, Xunzhao Yin, Cheng Zhuo, Bing Li, Ulf Schlichtmann, 17 Apr 2026 (v2), KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs, https://arxiv.org/abs/2604.13226
- Minghao Yan, Saurabh Agarwal, and Shivaram Venkataraman. 2025. Decoding Speculative Decoding. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 6460–6473, Albuquerque, New Mexico. Association for Computational Linguistics, https://aclanthology.org/2025.naacl-long.328/ https://aclanthology.org/2025.naacl-long.328.pdf
- Ziyang Liu, 20 Apr 2026, Copy-as-Decode: Grammar-Constrained Parallel Prefill for LLM Editing https://arxiv.org/abs/2604.18170
- Nishanth Nakshatri, Shamik Roy, Rajarshi Das, Suthee Chaidaroon, Leonid Boytsov, Rashmi Gangadharaiah, 10 Feb 2025 (v2), Constrained Decoding with Speculative Lookaheads, https://arxiv.org/abs/2412.10418
- Zhuocheng Gong, Jiahao Liu, Ziyue Wang, Pengfei Wu, Jingang Wang, Xunliang Cai, Dongyan Zhao, Rui Yan, 23 Jul 2024, Graph-Structured Speculative Decoding, https://arxiv.org/abs/2407.16207
AI Books from Aussie AI
|
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |
|
RAG Optimization: Accurate and Efficient LLM Applications:
new book on RAG architectures:
Get your copy from Amazon: RAG Optimization |
|
Generative AI Applications book:
Get your copy from Amazon: Generative AI Applications |
|
Generative AI programming book:
Get your copy from Amazon: Generative AI in C++ |
|
CUDA C++ Optimization book:
Get your copy from Amazon: CUDA C++ Optimization |
|
CUDA C++ Debugging book:
Get your copy from Amazon: CUDA C++ Debugging |
More AI Research Topics
Read more about:
- 500+ LLM Inference Optimization Techniques
- What's Hot in LLM Inference Optimization in 2025?
- Inference Optimization Research
- « Research Home