Aussie AI

Constrained Decoding

Last Updated 24 September, 2025

by David Spuler, Ph.D.

Research on Constrained Decoding

Research papers include:

Theia Vogel, December 18, 2023, How to make LLMs go fast, https://vgel.me/posts/faster-inference/
Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng, 6 Jun 2024 (v2), SGLang: Efficient Execution of Structured Language Model Programs, https://arxiv.org/abs/2312.07104 https://github.com/sgl-project/sglang
K Ahmed, KW Chang, G Van den Broeck, Oct 2024, Controllable Generation via Locally Constrained Resampling, Neurips Safe Generative AI Workshop 2024, https://openreview.net/pdf?id=v091fzXTu0
Gaya Mehenni, Amal Zouaq, 23 Nov 2024, Ontology-Constrained Generation of Domain-Specific Clinical Summaries, https://arxiv.org/abs/2411.15666
Will Kurt, Nov 2024, Say What You Mean: A Response to 'Let Me Speak Freely', https://blog.dottxt.co/say-what-you-mean.html
Zhi Rui Tam, Cheng-Kuang Wu, Yi-Lin Tsai, Chieh-Yen Lin, Hung-yi Lee, Yun-Nung Chen, 14 Oct 2024 (v3), Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models, https://arxiv.org/abs/2408.02442
Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. Guided open vocabulary image captioning with constrained beam search, 2017, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 936–945, https://arxiv.org/abs/1612.00576
Chris Hokamp and Qun Liu, 2017, Lexically constrained decoding for sequence generation using grid beam search. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1535–1546, https://arxiv.org/abs/1704.07138
Yizhe Zhang, Guoyin Wang, Chunyuan Li, Zhe Gan, Chris Brockett, and Bill Dolan. Pointer: Constrained text generation via insertion-based generative pre-training. arXiv preprint arXiv:2005.00558, 2020. https://arxiv.org/abs/2005.00558
Saibo Geng, Martin Josifoski, Maxime Peyrard, Robert West, 18 Jan 2024 (v6), Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning, https://arxiv.org/abs/2305.13971 https://github.com/epfl-dlab/GCD
Yanjun Fu, Ethan Baker, Yu Ding, Yizheng Chen, 20 Jul 2024 (v3), Constrained Decoding for Secure Code Generation, https://arxiv.org/abs/2405.00218 https://codeguardplus.github.io/
Zekun Hao, David W. Romero, Tsung-Yi Lin, Ming-Yu Liu, 12 Dec 2024, Meshtron: High-Fidelity, Artist-Like 3D Mesh Generation at Scale, https://arxiv.org/abs/2412.09548 https://research.nvidia.com/labs/dir/meshtron/ (Optimizations to avoid the quadratic Transformer cost, in both training and inference, include "hourglass neural architecture" analogous to widthwise pruning or slimming, sliding window attention, rolling KV cache, truncated sequence training, and a "robust sampling strategy" that is effectively a type of constrained decoding based on mesh layouts.)
Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yongkang Wu, Zhonghua Li, Qi Ye, Zhicheng Dou, 16 Dec 2024, RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation, https://arxiv.org/abs/2412.11919 https://github.com/sunnynexus/RetroLLM
Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
Haoran Wang, Kai Shu, Jan 2025, Make Every Token Count: A Systematic Survey on Decoding Methods for Foundation Model, https://www.researchgate.net/profile/Haoran-Wang-96/publication/387703971_Make_Every_Token_Count_A_Systematic_Survey_on_Decoding_Methods_for_Foundation_Models/links/67784c8ce74ca64e1f49eb15/Make-Every-Token-Count-A-Systematic-Survey-on-Decoding-Methods-for-Foundation-Models.pdf https://github.com/wang2226/Awesome-LLM-Decoding
D Banerjee, T Suresh, S Ugare, S Misailovic, G Singh, Mar 2025, Preserving Reasoning Capabilities Under Constrained LLM Generation, https://openreview.net/pdf?id=RX3GIOkGHr
Changran Xu, Yi Liu, Yunhao Zhou, Shan Huang, Ningyi Xu, Qiang Xu, 18 Mar 2025, Speculative Decoding for Verilog: Speed and Quality, All in One, https://arxiv.org/abs/2503.14153
Niels M\"undler and Jasper Dekoninck and Martin Vechev, 13 Aug 2025, Constrained Decoding of Diffusion LLMs with Context-Free Grammars, https://arxiv.org/abs/2508.10111
Lingxiao Li, Salar Rahili, Yiwei Zhao, 20 Aug 2025, Correctness-Guaranteed Code Generation via Constrained Decoding, https://arxiv.org/abs/2508.15866
Parv Kapoor, Akila Ganlath, Changliu Liu, Sebastian Scherer, Eunsuk Kang, 1 Sep 2025, Constrained Decoding for Robotics Foundation Models, https://arxiv.org/abs/2509.01728
Devansh, Sep 2025, The Chocolate Milk Cult’s Guide to Inference Scaling for AI Models: How to Reduce the costs of Running LLMs https://machine-learning-made-simple.medium.com/the-chocolate-milk-cults-guide-to-inference-scaling-for-ai-models-50aa2290eb50 (Deep analysis of using many progressive optimizations to real-life LLM inference.)