Aussie AI

Decoding Algorithms

Last Updated 22 October, 2025

by David Spuler, Ph.D.

What are Decoding Algorithms?

The decoding algorithm in Transformer AI engines is the method whereby the decoder emits tokens for the output message. At the end of each decoder sequence, the output is a list of "logits" with probabilities for the predictions of the next best token. The algorithm by which the decoder decides to output one token, or multiple tokens, and which ones, is called the decoding algorithm.

Logits vs Activations

Each decoding step is aimed at producing a single token (i.e., the next word to output). The output of a decoding phase for one token is actually a two-step process:

The "activation vector" or "activations" are computed (numbers representing "embeddings"), and then
The "logits" are computed from the activation vector (called "unembedding").

These two vectors are not usually the same size:

Activations vector — size is the "hidden" model dimension (e.g., 4096)
Logits vector — size is the model vocabulary size (e.g., 50,000 unique tokens).

Logits are very closely related to tokens, and there is one logit value per token. Each logit value represent the LLM's prediction of how likely it would be to output this token as the next one. We could simply take the logit with the highest probability, which is the most likely token according to the LLM, and output that token. This is called "greedy decoding."

Note that most of the LLM's processing is not using logits, but uses activation vectors. The logits only appear only at the very end of a decoding phase. In all the interim steps, which are usually multiple layers of computations inside the model, we use an "embedding space" representation called an activation vector. We don't actually work on "tokens" or their probabilities, but we work on the probabilities of what I call the "signals" in the embedding space, as stored in the activation vector.

The activations are a vector of numbers representing the likelihood of each "signal" in the embeddings For example, a signal might be something like "noun" or "adjective" signals, but there are literally thousands of them, and not everyone understands what every value in an embedding actually represents in reality. But the LLM sure knows!

This vector of numbers is the "activation vector" and is usually shortened to "activations." These values represent the extent to which each signal has been "activated" in each neuron. Each layer of computation modifies the activations, and we get the final activations after the final model layer.

At this very final phase, we need to take these "activations," which are based on the model internal dimension (e.g., 4096) that represents how many signals it's tracking. We need to convert that to logit probabilities, one per token, and there are perhaps 50,000 tokens (depends on the "vocabulary size" but 50,000 or 100,000 is common). Hence, we need to convert a 4096-length vector of numbers ("activations") into a 50,000-length vector of numbers ("logits").

The "unembedding matrix" is what we use. Multiplying the activation vector by this matrix, which is large and rectangular (e.g., 4096x50,000), is how this is done, This converts the 4096-vector into a 50,000-vector. The embedding matrix is large, and expensive to use, which is why we only do this once per decoding phase, rather than once per layer.

Anyway, the computation of activations is not the decoding algorithm. Nor is the multiplication by the unembedding matrix to get the logits vector. Rather, the decoding algorithm is the final phase, which operates on the logits vector of probabilities for each of the 50,000 tokens, and thereby chooses the next token to output.

Types of Decoding Algorithms

There are several possible decoding algorithms for the basic situation of choosing one token to output from a vector of probabilities for each token:

Greedy decoding — always choose the highest-probability token.
Top-k sampling (random sampling) — choose from k most likely tokens.
Top-p sampling (nucleus sampling) — a finesse on the top-k decoding algorithm.
Beam search decoding — a more complex "tree" search of multiple token sequences.
Edit decoding — using the input context to help decode the output (e.g., grammar checking).

The above are all variations on a theme: take a vector of token probabilities as the input, and analyze these probabilities to choose exactly one of the tokens as the output.

At a higher-level, there are more advanced options, and the main classes of decoding algorithms are:

Autoregressive decoding
Non-Autoregressive (NAR) decoding
Parallel decoding
Multi-token output

Other issues for decoding algorithms include:

Prefill phase (runs before decoding)
Temperature (scaling hyper-parameter that affects decoding)

Parallel Decoding Algorithms

There are several types of parallel optimizations for decoding:

Speculative decoding
Generalized speculative decoding
Lookahead decoding
Lookup decoding (including "prompt lookup decoding" and "retrieval lookup decoding")
Parallel decoding (generally)

Multi-model decoding algorithms have also been examined:

Supervised decoding (see big-little architectures)
Ensemble decoding (see ensemble architectures).
Collaborative decoding
Consensus decoding

Hybrid Decoding Optimizations

The decoding algorithm may also be combined with other optimizations that improve the decoding process, such as:

Non-autoregressive decoding
Token pruning
Prompt compression (input compression)

Beam Search Decoding

Beam search decoding is an advanced type of decoding that works on a tree of potential output sequences. This is a complex search space that keeps multiple candidate token sequences in reserve, until it chooses the best one. Beam search can look ahead a few tokens, and then backtrack to choose a different final output.

Research papers on beam search:

Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, Zhihao Jia, 2024. SpecInfer: Accelerating Large Language Model Serving with Tree-based Speculative Inference and Verification, ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, April 2024, Pages 932–949, https://doi.org/10.1145/3620666.3651335 https://dl.acm.org/doi/abs/10.1145/3620666.3651335 Code: https://github.com/flexflow/FlexFlow/
Jared Lichtarge, Christopher Alberti, Shankar Kumar, Noam Shazeer, and Niki Parmar. 2018. Weakly supervised grammatical error correction using iterative decoding. CoRR, abs/1811.01710. https://arxiv.org/abs/1811.01710 (Beam search decoding with a high threshold to emit corrections.)
Jindrich Libovicky, Jindrich Helcl, Marek Tlusty, Ondrej Bojar, and Pavel Pecina. 2016. CUNI system for WMT16 automatic post-editing and multimodal translation tasks. In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, pages 646–654, Berlin, Germany. https://arxiv.org/abs/1606.07481 (Post-editing of machine translation.)
Daniel Dahlmeier, Hwee Tou Ng, 2012, A Beam-Search Decoder for Grammatical Error Correction, Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 568–578, Jeju Island, Korea, 12–14 July 2012, https://aclanthology.org/D12-1052.pdf
Xiaoming (Jason) Cui, Ashraf Bhuiyan, 2023, Optimizing Transformer Model Inference on Intel® Processors, https://www.intel.com/content/www/us/en/developer/articles/technical/optimize-transformer-model-inference-processors.html
Ashwin K. Vijayakumar, Michael Cogswell, Ramprasaath R. Selvaraju, Qing Sun, Stefan Lee, David J. Crandall, and Dhruv Batra. 2018. Diverse beam search for improved description of complex scenes. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 7371–7379. AAAI Press. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17329
Xiaohui Wang, Ying Xiong, Yang Wei, Mingxuan Wang, Lei Li Apr 2021, LightSeq: A High Performance Inference Library for Transformers, https://arxiv.org/pdf/2010.13887.pdf
Chufan Shi, Haoran Yang, Deng Cai, Zhisong Zhang, Yifan Wang, Yujiu Yang, Wai Lam, 10 Feb 2024, A Thorough Examination of Decoding Methods in the Era of LLMs, https://arxiv.org/abs/2402.06925 (Evaluates a number of decoding algorithms with several 7B models including Llama2-7B, and also with 4-bit and 8-bit quantization.)
GC Garbacea, 2023, Neural Language Generation for Content Adaptation: Explainable, Efficient Low-Resource Text Simplification and Evaluation, Ph.D. thesis, Computer Science and Engineering, University of Michigan, https://deepblue.lib.umich.edu/bitstream/handle/2027.42/178028/garbacea_1.pdf?sequence=1 (Broad thesis with sections on beam search decoding optimizations and AI safety issues such as bias.)
Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. Guided open vocabulary image captioning with constrained beam search, 2017, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 936–945, https://arxiv.org/abs/1612.00576
Chris Hokamp and Qun Liu, 2017, Lexically constrained decoding for sequence generation using grid beam search. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1535–1546, https://arxiv.org/abs/1704.07138
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica, Oct 2023, Efficient Memory Management for Large Language Model Serving with PagedAttention, SOSP ’23, October 23–26, 2023, Koblenz, Germany, https://dl.acm.org/doi/pdf/10.1145/3600006.3613165 (The original Paged Attention and vLLM paper, focusing on optimizing memory size of the KV cache using methods similar to operating-system memory paging.)
Zhaorun Chen, Zhuokai Zhao, Hongyin Luo, Huaxiu Yao, Bo Li, Jiawei Zhou, July 2024, HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:7824-7846, 2024, https://proceedings.mlr.press/v235/chen24bi.html PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24bi/chen24bi.pdf https://github.com/BillChan226/HALC
Tinghui Zhu, Kai Zhang, Jian Xie, Yu Su, 4 Feb 2024 (v2), Deductive Beam Search: Decoding Deducible Rationale for Chain-of-Thought Reasoning, https://arxiv.org/abs/2401.17686
Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Dragomir Radev, Yejin Choi, and Noah A. Smith. 2024. A Call for Clarity in Beam Search: How It Works and When It Stops. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 77–90, Torino, Italia. ELRA and ICCL. https://aclanthology.org/2024.lrec-main.7/ https://aclanthology.org/2024.lrec-main.7.pdf
Zongyue Qin, Zifan He, Neha Prakriya, Jason Cong, Yizhou Sun, 25 Sep 2024, Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference, https://arxiv.org/abs/2409.16560
Shixiaowei02, Oct 2024, TensorRT-LLM 0.13.0 Release Latest, https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.13.0
Yejin Lee, Anna Sun, Basil Hosmer, Bilge Acun, Can Balioglu, Changhan Wang, Charles David Hernandez, Christian Puhrsch, Daniel Haziza, Driss Guessous, Francisco Massa, Jacob Kahn, Jeffrey Wan, Jeremy Reizenstein, Jiaqi Zhai, Joe Isaacson, Joel Schlosser, Juan Pino, Kaushik Ram Sadagopan, Leonid Shamis, Linjian Ma, Min-Jae Hwang, Mingda Chen, Mostafa Elhoushi, Pedro Rodriguez, Ram Pasunuru, Scott Yih, Sravya Popuri, Xing Liu, Carole-Jean Wu, 30 Sep 2024, Characterizing and Efficiently Accelerating Multimodal Generation Model Inference, https://arxiv.org/abs/2410.00215 (Analyzes the bottlenecks in inference, finding the usual problems of autoregression, but also more interesting issues such as that linear kernels can be expensive, and KV cache reordering is a bottleneck in beam search, and layer skipping is analyzed.)
Xinyu Lin, Chaoqun Yang, Wenjie Wang, Yongqi Li, Cunxiao Du, Fuli Feng, See-Kiong Ng, Tat-Seng Chua, 8 Oct 2024 (v2), Efficient Inference for Large Language Model-based Generative Recommendation, https://arxiv.org/abs/2410.05165
Rongxiang Wang and Felix Xiaozhu Lin. 2024. Turbocharge Speech Understanding with Pilot Inference. In Proceedings of the 30th Annual International Conference on Mobile Computing and Networking (ACM MobiCom '24). Association for Computing Machinery, New York, NY, USA, 1299–1313. https://doi.org/10.1145/3636534.3690694 https://dl.acm.org/doi/abs/10.1145/3636534.3690694 https://dl.acm.org/doi/pdf/10.1145/3636534.3690694 ("Pilot inference" is a specialized mix of caching, computation reuse, and backtracking in beam search for speech understanding, and is somewhat related to speculative decoding, and similar to continual inference for processing a stream.)
NVIDIA, Dec 2024, Multi-Head, Multi-Query, and Group-Query Attention, https://nvidia.github.io/TensorRT-LLM/advanced/gpt-attention.html#kv-cache
Xuezhi Wang, Denny Zhou, 23 May 2024 (v2), Chain-of-Thought Reasoning Without Prompting, https://arxiv.org/abs/2402.10200 ("CoT decoding" is examining the alternative paths in the decoding algorithm, which is somewhat similar to Chain-of-Thought reasoning.)
Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
Haoran Wang, Kai Shu, Jan 2025, MakeEveryTokenCount: ASystematic Survey on Decoding Methods for Foundation Model, https://www.researchgate.net/profile/Haoran-Wang-96/publication/387703971_Make_Every_Token_Count_A_Systematic_Survey_on_Decoding_Methods_for_Foundation_Models/links/67784c8ce74ca64e1f49eb15/Make-Every-Token-Count-A-Systematic-Survey-on-Decoding-Methods-for-Foundation-Models.pdf https://github.com/wang2226/Awesome-LLM-Decoding
Edward Beeching, Lewis Tunstall, Sasha Rush Dec 16, 2024, Scaling Test Time Compute with Open Source Models, https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute
Maciej Besta, Julia Barth, Eric Schreiber, Ales Kubicek, Afonso Catarino, Robert Gerstenberger, Piotr Nyczyk, Patrick Iff, Yueling Li, Sam Houliston, Tomasz Sternal, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Łukasz Flis, Hannes Eberhard, Hubert Niewiadomski, Torsten Hoefler, 23 Jan 2025 (v3), Reasoning Language Models: A Blueprint, https://arxiv.org/abs/2501.11223 (Survey and blueprint for how to build a Large Reasoning Model.)
Wendi Cui, Jiaxin Zhang, Zhuohang Li, Hao Sun, Damien Lopez, Kamalika Das, Bradley A. Malin, Sricharan Kumar, 26 Feb 2025, Automatic Prompt Optimization via Heuristic Search: A Survey, https://arxiv.org/abs/2502.18746 (Survey of auto prompting, from basic LLM enhancements to some methods quite similar to RALM and TALM.)
Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H.S. Torr, Salman Khan, Fahad Shahbaz Khan, 28 Feb 2025, LLM Post-Training: A Deep Dive into Reasoning Large Language Models, https://arxiv.org/abs/2502.21321 https://github.com/mbzuai-oryx/Awesome-LLM-Post-training
Yangchao Wu, Zongyue Qin, Alex Wong, Stefano Soatto, 20 May 2025, STree: Speculative Tree Decoding for Hybrid State-Space Models, https://arxiv.org/abs/2505.14969
Mikhail Andronov, Natalia Andronova, Michael Wand, J\"urgen Schmidhuber, Djork-Arn\'e Clevert, 2 Aug 2025, Fast and scalable retrosynthetic planning with a transformer neural network and speculative beam search, https://arxiv.org/abs/2508.01459
Harold Silv\`ere Kiossou and Siegfried Nijssen and Pierre Schaus, 8 Aug 2025, A Generic Complete Anytime Beam Search for Optimal Decision Tree, https://arxiv.org/abs/2508.06064

Phrase Banning

Phrase banning is a feature extension for LLM decoding to disallow selected words or phrases, rather than a speed optimization. The idea is to block the LLM from outputting certain words or phrases, rather than post-processing LLM output to remove words. Words or phrases can be "banned" at the decoder level, forcing the LLM decoding phase to backtrack whenever it tries to emit a disallowed word or phrase. If a model has whole-word tokenization, then individual words can be banned at the current decoding step, by modifying simple decoding algorithms like greedy or top-k/top-p decoding. However, banning multi-word phrases or other multi-token sequences requires backtracking similar to beam search decoding. In fact, it makes sense to merge the phrase banning algorithm into beam search or other tree decoding methods. Banning phrases is usually efficient, because it has only a small token search cost to detect the phrases, and although backtracking is expensive, hopefully it is a relatively rare condition.

Research papers on phrase banning:

Lost Ruins, Oct 11, 2024, koboldcpp-1.76, https://github.com/LostRuins/koboldcpp/releases/tag/v1.76 (Release includes "anti-slop" using "phrase banning" decoding algorithm.)
Sam Paech, 2024, antislop-sampler, https://github.com/sam-paech/antislop-sampler?tab=readme-ov-file (Decoding algorithm for "phrase banning" with backtracking.)
Bilgehan Sel, Dingcheng Li, Phillip Wallis, Vaishakh Keshava, Ming Jin, Siddhartha Reddy Jonnalagadda, 11 Mar 2025, Backtracking for Safety, https://arxiv.org/abs/2503.08919

Tree Decoding

Tree decoding is the use of alternative pathways in decoding, in the form of a hierarchical tree. This idea is a generalization of beam search decoding. One of the applications of tree decoding is in the attempt to mimic Chain-of-Thought reasoning in a single inference step using a tree of pathways in CoT decoding.

Research papers on tree decoding:

Ziyu Wan, Xidong Feng, Muning Wen, Stephen Marcus Mcaleer, Ying Wen, Weinan Zhang, Jun Wang, July 2024, AlphaZero-Like Tree-Search can Guide Large Language Model Decoding and Training, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:49890-49920, 2024, https://proceedings.mlr.press/v235/wan24c.html PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/wan24c/wan24c.pdf
Xiangxiang Gao, Weisheng Xie, Yiwei Xiang, Feng Ji, 17 Dec 2024, Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree, https://arxiv.org/abs/2412.12639
Xuezhi Wang, Denny Zhou, 23 May 2024 (v2), Chain-of-Thought Reasoning Without Prompting, https://arxiv.org/abs/2402.10200 ("CoT decoding" is examining the alternative paths in the decoding algorithm, which is somewhat similar to Chain-of-Thought reasoning.)
Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
Xidong Feng, Ziyu Wan, Muning Wen, Ying Wen, Weinan Zhang, and Jun Wang. 2023. Alphazero-like tree-search can guide large language model decoding and training. In NeurIPS 2023 Foundation Models for Decision Making Workshop. https://arxiv.org/abs/2309.17179
Zongyue Qin, Zifan He, Neha Prakriya, Jason Cong, Yizhou Sun, 25 Sep 2024, Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference, https://arxiv.org/abs/2409.16560
Penghui Yang, Cunxiao Du, Fengzhuo Zhang, Haonan Wang, Tianyu Pang, Chao Du, Bo An, 24 Feb 2025, LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification, https://arxiv.org/abs/2502.17421 https://github.com/sail-sg/LongSpec
Yifu Ding, Wentao Jiang, Shunyu Liu, Yongcheng Jing, Jinyang Guo, Yingjie Wang, Jing Zhang, Zengmao Wang, Ziwei Liu, Bo Du, Xianglong Liu, Dacheng Tao, 27 Feb 2025 (v2), Dynamic Parallel Tree Search for Efficient LLM Reasoning, https://arxiv.org/abs/2502.16235
Yangchao Wu, Zongyue Qin, Alex Wong, Stefano Soatto, 20 May 2025, STree: Speculative Tree Decoding for Hybrid State-Space Models, https://arxiv.org/abs/2505.14969
Yuhao Shen, Junyi Shen, Quan Kong, Tianyu Liu, Yao Lu, Cong Wang, 16 May 2025, Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism, https://arxiv.org/abs/2506.01979

Contrastive Decoding

Contrastive decoding is a method whereby the probabilities of two or more outputs are "contrasted" to choose the best token to output. This can be done by examining prior layers during inference, or it can be done with multiple models.

Chufan Shi, Haoran Yang, Deng Cai, Zhisong Zhang, Yifan Wang, Yujiu Yang, Wai Lam, 10 Feb 2024, A Thorough Examination of Decoding Methods in the Era of LLMs, https://arxiv.org/abs/2402.06925 (Evaluates a number of decoding algorithms with several 7B models including Llama2-7B, and also with 4-bit and 8-bit quantization.)
Kaiyan Zhang, Jianyu Wang, Ning Ding, Biqing Qi, Ermo Hua, Xingtai Lv, Bowen Zhou, 18 Jun 2024, Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding, https://arxiv.org/abs/2406.12295 Code: https://github.com/TsinghuaC3I/FS-GEN
Sean Welleck, Amanda Bertsch, Matthew Finlayson, Hailey Schoelkopf, Alex Xie, Graham Neubig, Ilia Kulikov, Zaid Harchaoui, 24 Jun 2024, From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models, https://arxiv.org/abs/2406.16838 (Survey and theoretical analysis of many different decoding algorithms, along with various ways to speed them up such as speculative decoding and KV caches.)
Xiang Lisa Li, Ari Holtzman, Daniel Fried, Percy Liang, Jason Eisner, Tatsunori Hashimoto, Luke Zettlemoyer, Mike Lewis, 10 Jul 2023 (v2), Contrastive Decoding: Open-ended Text Generation as Optimization, https://arxiv.org/abs/2210.15097
Hyunjong Ok, Jegwang Ryu, Jaeho Lee, 26 Jun 2024, Decoding with Limited Teacher Supervision Requires Understanding When to Trust the Teacher, https://arxiv.org/abs/2406.18002 (Examines the idea of not using the larger model to always verify, and when to trust either the smaller or larger models, which is an idea that generalized beyond speculative decoding.)
Zexuan Qiu, Zijing Ou, Bin Wu, Jingjing Li, Aiwei Liu, Irwin King, 25 Jun 2024, Entropy-Based Decoding for Retrieval-Augmented Large Language Models, https://arxiv.org/abs/2406.17519 (Enhanced decoding algorithm for multi-document RAG processing.)
Hongyi Yuan, Keming Lu, Fei Huang, Zheng Yuan, Chang Zhou, 13 Mar 2024 (v2), Speculative Contrastive Decoding, https://arxiv.org/abs/2311.08981
Zhaorun Chen, Zhuokai Zhao, Hongyin Luo, Huaxiu Yao, Bo Li, Jiawei Zhou, July 2024, HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:7824-7846, 2024, https://proceedings.mlr.press/v235/chen24bi.html PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24bi/chen24bi.pdf https://github.com/BillChan226/HALC
F. Li, X. zhang and P. Zhang, 2024, Mitigating Hallucination Issues in Small-Parameter LLMs through Inter-Layer Contrastive Decoding, 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 2024, pp. 1-8, doi: 10.1109/IJCNN60899.2024.10650644, https://ieeexplore.ieee.org/abstract/document/10650644
Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
Haoran Wang, Kai Shu, Jan 2025, MakeEveryTokenCount: ASystematic Survey on Decoding Methods for Foundation Model, https://www.researchgate.net/profile/Haoran-Wang-96/publication/387703971_Make_Every_Token_Count_A_Systematic_Survey_on_Decoding_Methods_for_Foundation_Models/links/67784c8ce74ca64e1f49eb15/Make-Every-Token-Count-A-Systematic-Survey-on-Decoding-Methods-for-Foundation-Models.pdf https://github.com/wang2226/Awesome-LLM-Decoding
Phuc Phan, Hieu Tran, Long Phan, 23 Aug 2024 (v2), Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation, https://arxiv.org/abs/2402.14874
Nikhil Anand, Nov 14, 2024, Making LLMs more Truthful with DoLa: A Contrastive Decoding Approach (Part I), https://ai.gopubby.com/making-llms-more-truthful-with-dola-a-contrastive-decoding-approach-part-i-1c2f90c91996 (Decoding by examining probabilities across layers.)
Hongxiang Zhang, Hao Chen, Muhao Chen, Tianyi Zhang, 2 Jun 2025 (v2), Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation, https://arxiv.org/abs/2505.23657
Che-Yu Chou, Hung-Hsuan Chen, 14 Aug 2025, Contrastive ECOC: Learning Output Codes for Adversarial Defense, https://arxiv.org/abs/2508.10491
Shan Shen, Shenglu Hua, Jiajun Zou, Jiawei Liu, Jianwang Zhai, Chuan Shi, Wenjian Yu, 14 Aug 2025, Transferable Parasitic Estimation via Graph Contrastive Learning and Label Rebalancing in AMS Circuits, https://arxiv.org/abs/2507.06535
Lei Tian, Xiaomin Li, Liqian Ma, Hao Yin, Zirui Zheng, Hefei Huang, Taiqing Li, Huchuan Lu, Xu Jia, 14 Aug 2025, CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting, https://arxiv.org/abs/2505.20469
Amr Mousa, Neil Karavis, Michele Caprio, Wei Pan and Richard Allmendinger, 14 Aug 2025, TAR: Teacher-Aligned Representations via Contrastive Learning for Quadrupedal Locomotion, https://arxiv.org/abs/2503.20839
Weijia Yang, Tian Lan, Leyuan Liu, Wei Chen, Tianqing Zhu, Sheng Wen, Xiaosong Zhang, 19 Jul 2025, CASPER: Contrastive Approach for Smart Ponzi Scheme Detecter with More Negative Samples, https://arxiv.org/abs/2507.16840
Xiaoqiang He, 21 Jul 2025, CLAMP: Contrastive Learning with Adaptive Multi-loss and Progressive Fusion for Multimodal Aspect-Based Sentiment Analysis, https://arxiv.org/abs/2507.16854
Piotr Masztalski, Micha{\l} Romaniuk, Jakub \.Zak, Mateusz Matuszewski, Konrad Kowalczyk, 23 Jul 2025, Clustering-based hard negative sampling for supervised contrastive speaker verification, https://arxiv.org/abs/2507.17540
Arsh Tangri, Nichols Crawford Taylor, Haojie Huang, Robert Platt, 22 Jul 2025, Equivariant Goal Conditioned Contrastive Reinforcement Learning, https://arxiv.org/abs/2507.16139
Xiaoya Li, Xiaofei Sun, Albert Wang, Jiwei Li and Chris Shum, 22 Jul 2025, CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning, https://arxiv.org/abs/2507.14111
Zhijie Wang, Zixin Xu, Zhiyuan Pan, 24 Jul 2025, GCC-Spam: Spam Detection via GAN, Contrastive Learning, and Character Similarity Networks, https://arxiv.org/abs/2507.14679
Yajiao Dai, Jun Li, Zhen Mei, Yiyang Ni, Shi Jin, Zengxiang Li, Sheng Guo, Wei Xiang, 12 Jul 2025, Semi-Supervised Federated Learning via Dual Contrastive Learning and Soft Labeling for Intelligent Fault Diagnosis, https://arxiv.org/abs/2507.14181
Xiaotong Luo, Shengda Zhuo, Min Chen, Lichun Li, Ruizhao Lu, Wenqi Fan, Shuqiang Huang and Yin Tang, 12 Jul 2025, From Bias to Behavior: Learning Bull-Bear Market Dynamics with Contrastive Modeling, https://arxiv.org/abs/2507.14182
Yiming Xu, Zhen Peng, Bin Shi, Xu Hua, Bo Dong, Song Wang, Chen Chen, 19 Jul 2025, Revisiting Graph Contrastive Learning on Anomaly Detection: A Structural Imbalance Perspective, https://arxiv.org/abs/2507.14677
Abdul-Kazeem Shamba, Kerstin Bach and Gavin Taylor, 20 Jul 2025, eMargin: Revisiting Contrastive Learning with Margin-Based Separation, https://arxiv.org/abs/2507.14828
Jinzhi Wang, Bin Li, Qingke Peng, Haozhou Li, Zeyuan Zeng, Ruimeng Li, Kaixuan Yang, Jiangbo Zhang, Biyi Zhou, Yaoying Wang, 20 Jul 2025, LumiCRS: Asymmetric Contrastive Prototype Learning for Long-Tail Conversational Recommender Systems, https://arxiv.org/abs/2507.04722
Sho Oshima, Yuji Okamoto, Taisei Tosaki, Ryosuke Kojima, Yasushi Okuno, 19 Jul 2025, Supervised Graph Contrastive Learning for Gene Regulatory Network, https://arxiv.org/abs/2505.17786
Chaoqun Cui, Caiyan Jia, 10 Aug 2025, Propagation Tree Is Not Deep: Adaptive Graph Contrastive Learning Approach for Rumor Detection, https://arxiv.org/abs/2508.07201
WonJun Moon, Hyun Seok Seong, Jae-Pil Heo, 11 Aug 2025, Selective Contrastive Learning for Weakly Supervised Affordance Grounding, https://arxiv.org/abs/2508.07877
Mohammad Zia Ur Rehman, Anukriti Bhatnagar, Omkar Kabde, Shubhi Bansal, Nagendra Kumar, 7 Aug 2025, ImpliHateVid: A Benchmark Dataset and Two-stage Contrastive Learning Framework for Implicit Hate Speech Detection in Videos, https://arxiv.org/abs/2508.06570
Mengting Pan, Fan Li, Xiaoyang Wang, Wenjie Zhang, Xuemin Lin, 10 Aug 2025, HiTeC: Hierarchical Contrastive Learning on Text-Attributed Hypergraph with Semantic-Aware Augmentation, https://arxiv.org/abs/2508.03104
Binxiong Li, Yuefei Wang, Binyu Zhao, Heyang Gao, Benhan Yang, Quanzhou Luo, Xue Li, Xu Xiang, Yujie Liu, Huijie Tang, 28 Jul 2025, Attributed Graph Clustering with Multi-Scale Weight-Based Pairwise Coarsening and Contrastive Learning, https://arxiv.org/abs/2507.20505
Chengkai Wang, Di Wu, Yunsheng Liao, Wenyao Zheng, Ziyi Zeng, Xurong Gao, Hemmings Wu, Zhoule Zhu, Jie Yang, Lihua Zhong, Weiwei Cheng, Yun-Hsuan Chen and Mohamad Sawan, 27 Jul 2025, NeuroCLIP: A Multimodal Contrastive Learning Method for rTMS-treated Methamphetamine Addiction Analysis, https://arxiv.org/abs/2507.20189
Wenhao Ma, Yu-Cheng Chang, Jie Yang, Yu-Kai Wang, Chin-Teng Lin, 28 Jul 2025, Contrastive learning-based agent modeling for deep reinforcement learning, https://arxiv.org/abs/2401.00132
Sanqing Qu, Tianpei Zou, Florian R\"ohrbein, Cewu Lu, Guang Chen, Dacheng Tao, Changjun Jiang, 26 Jul 2025, GLC++: Source-Free Universal Domain Adaptation through Global-Local Clustering and Contrastive Affinity Learning, https://arxiv.org/abs/2403.14410
Maximillian Chen and Ruoxi Sun and Tomas Pfister and Sercan \"O. Ar{\i}k, 27 Jul 2025, Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training, https://arxiv.org/abs/2406.00222
Yu Tai, Xinglong Wu, Hongwei Yang, Hui He, Duanjing Chen, Yuanming Shao and Weizhe Zhang, 28 Jul 2025, How to Bridge Spatial and Temporal Heterogeneity in Link Prediction? A Contrastive Method, https://arxiv.org/abs/2411.00612
Kristin Qi, Jiali Cheng, Youxiang Zhu, Hadi Amiri, Xiaohui Liang, 28 Jul 2025, Unveil Multi-Picture Descriptions for Multilingual Mild Cognitive Impairment Detection via Contrastive Learning, https://arxiv.org/abs/2505.17067
Fabrizio Lo Scudo, Alessio De Rango, Luca Furnari, Alfonso Senatore, Donato D'Ambrosio, Giuseppe Mendicino and Gianluigi Greco, 23 Jul 2025, Advancing Wildfire Risk Prediction via Morphology-Aware Curriculum Contrastive Learning, https://arxiv.org/abs/2507.21147
Yaoyu Zhang and Chi-Guhn Lee, 28 Jul 2025, A Contrastive Diffusion-based Network (CDNet) for Time Series Classification, https://arxiv.org/abs/2507.21357
David A Kelly and Hana Chockler, 31 Jul 2025, Causal Identification of Sufficient, Contrastive and Complete Feature Sets in Image Classification, https://arxiv.org/abs/2507.23497
Binxiong Li, Xu Xiang, Xue Li, Quanzhou Lou, Binyu Zhao, Yujie Liu, Huijie Tang, Benhan Yang, 31 Jul 2025, GCL-GCN: Graphormer and Contrastive Learning Enhanced Attributed Graph Clustering Network, https://arxiv.org/abs/2507.19095
Qile Liu, Weishan Ye, Lingli Zhang, Zhen Liang, 31 Jul 2025, EEG-SCMM: Soft Contrastive Masked Modeling for Cross-Corpus EEG-Based Emotion Recognition, https://arxiv.org/abs/2408.09186
Ziwei Wang, Siyang Li, Xiaoqing Chen, and Dongrui Wu, 31 Jul 2025, MVCNet: Multi-View Contrastive Network for Motor Imagery Classification, https://arxiv.org/abs/2502.17482
Gianluca Carloni, Biagio Brattoli, Seongho Keum, Jongchan Park, Taebum Lee, Chang Ho Ahn, Sergio Pereira, 29 Jul 2025, Pathology Foundation Models are Scanner Sensitive: Benchmark and Mitigation with Contrastive ScanGen Loss, https://arxiv.org/abs/2507.22092
Sara Sarto, Nicholas Moratelli, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara, 29 Jul 2025, Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training, https://arxiv.org/abs/2410.07336
Tuna Han Salih Meral, Enis Simsar, Federico Tombari, Pinar Yanardag, 29 Jul 2025, Contrastive Test-Time Composition of Multiple LoRA Models for Image Generation, https://arxiv.org/abs/2403.19776
Zizhuo Zhang, Jianing Zhu, Xinmu Ge, Zihua Zhao, Zhanke Zhou, Xuan Li, Xiao Feng, Jiangchao Yao, Bo Han, 1 Aug 2025, Co-Reward: Self-supervised Reinforcement Learning for Large Language Model Reasoning via Contrastive Agreement, https://arxiv.org/abs/2508.00410
Yiming Xu, Xu Hua, Zhen Peng, Bin Shi, Jiarun Chen, Xingbo Fu, Song Wang, Bo Dong, 1 Aug 2025, Text-Attributed Graph Anomaly Detection via Multi-Scale Cross- and Uni-Modal Contrastive Learning, https://arxiv.org/abs/2508.00513
Shiyi Liu, Buwen Liang, Yuetong Fang, Zixuan Jiang and Renjing Xu, 1 Aug 2025, Hierarchical Multi-Label Contrastive Learning for Protein-Protein Interaction Prediction Across Organisms, https://arxiv.org/abs/2507.02724
Amrit Rajeev, Udayaadithya Avadhanam, Harshula Tulapurkar, SaiBarath Sundar, 1 Aug 2025, Small sample-based adaptive text classification through iterative and contrastive description refinement, https://arxiv.org/abs/2508.00957
Xiaoya Li, Xiaofei Sun, Albert Wang, Chris Shum and Jiwei Li, 4 Aug 2025, CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search, https://arxiv.org/abs/2508.02091
Jing Lan, Hexiao Ding, Hongzhao Chen, Yufeng Jiang, Ng Nga Chun, Gerald W.Y. Cheng, Zongxi Li, Jing Cai, Liang-ting Lin, Jung Sun Yoo, 3 Aug 2025, Contrastive Multi-Task Learning with Solvent-Aware Augmentation for Drug Discovery, https://arxiv.org/abs/2508.01799
Yujia Tong, Tian Zhang, Jingling Yuan, Yuze Wang, Chuang Hu, 3 Aug 2025, LetheViT: Selective Machine Unlearning for Vision Transformers via Attention-Guided Contrastive Learning, https://arxiv.org/abs/2508.01569
Kosmas Pinitas and Konstantinos Makantasis and Georgios N. Yannakakis, 30 Jul 2025, Privileged Contrastive Pretraining for Multimodal Affect Modelling, https://arxiv.org/abs/2508.03729
Hyungbin Kim, Incheol Baek, Yon Dohn Chung, 6 Aug 2025, Decoupled Contrastive Learning for Federated Learning, https://arxiv.org/abs/2508.04005
Thang Duc Tran, Thai Hoang Le, 6 Aug 2025, WSS-CL: Weight Saliency Soft-Guided Contrastive Learning for Efficient Machine Unlearning Image Classification, https://arxiv.org/abs/2508.04308
Rui Zuo, Simon Khan, Zifan Wang, Garrett Ethan Katz, Qinru Qiu, 6 Aug 2025, Why the Agent Made that Decision: Contrastive Explanation Learning for Reinforcement Learning, https://arxiv.org/abs/2411.16120
Sahil Sethi, David Chen, Thomas Statchen, Michael C. Burkhart, Nipun Bhandari, Bashar Ramadan, Brett Beaulieu-Jones, 6 Aug 2025, ProtoECGNet: Case-Based Interpretable Deep Learning for Multi-Label ECG Classification with Contrastive Learning, https://arxiv.org/abs/2504.08713
Tianchen Fang, Guiru Liu, 7 Aug 2025, RegionMed-CLIP: A Region-Aware Multimodal Contrastive Learning Pre-trained Model for Medical Image Understanding, https://arxiv.org/abs/2508.05244
Kang Liu and Zhuoqi Ma and Zikang Fang and Yunan Li and Kun Xie and Qiguang Miao, 7 Aug 2025, PriorRG: Prior-Guided Contrastive Pre-training and Coarse-to-Fine Decoding for Chest X-ray Report Generation, https://arxiv.org/abs/2508.05353
Wonjun Kang, Byeongkeun Ahn, Minjae Lee, Kevin Galim, Seunghyuk Oh, Hyung Il Koo, Nam Ik Cho, 7 Aug 2025, UNCAGE: Contrastive Attention Guidance for Masked Generative Transformers in Text-to-Image Generation, https://arxiv.org/abs/2508.05399
Willian T. Lunardi, Abdulrahman Banabila, Dania Herzalla, and Martin Andreoni, 7 Aug 2025, Contrastive Representation Modeling for Anomaly Detection, https://arxiv.org/abs/2501.05130
Shengzhu Yang, Jiawei Du, Shuai Lu, Weihang Zhang, Ningli Wang, Huiqi Li, 8 Aug 2025, CLIPin: A Non-contrastive Plug-in to CLIP for Multimodal Semantic Alignment, https://arxiv.org/abs/2508.06434
Zihu Wang, Boxun Xu, Hejia Geng, Peng Li, 8 Aug 2025, Khan-GCL: Kolmogorov-Arnold Network Based Graph Contrastive Learning with Hard Negatives, https://arxiv.org/abs/2505.15103
Huifa Li, Jie Fu, Xinlin Zhuang, Haolin Yang, Xinpeng Ling, Tong Cheng, Haochen xue, Imran Razzak, Zhili Chen, 7 Aug 2025, scAGC: Learning Adaptive Cell Graphs with Contrastive Guidance for Single-Cell Clustering, https://arxiv.org/abs/2508.09180
Ziyu Liu, Azadeh Alavi, Minyi Li, Xiang Zhang, 13 Aug 2025, A Unified Contrastive-Generative Framework for Time Series Classification, https://arxiv.org/abs/2508.09451
Han Yu, Huiyuan Yang, Akane Sano, 12 Aug 2025, LEAVES: Learning Views for Time-Series Biobehavioral Data in Contrastive Learning, https://arxiv.org/abs/2210.07340
Minghui Sun, Matthew M. Engelhard, Benjamin A. Goldstein, 15 Aug 2025, Borrowing From the Future: Enhancing Early Risk Assessment through Contrastive Learning, https://arxiv.org/abs/2508.11210
Bin Ma, Yifei Zhang, Yongjin Xian, Qi Li, Linna Zhou, Gongxun Miao, 15 Aug 2025, A Cross-Modal Rumor Detection Scheme via Contrastive Learning by Exploring Text and Image internal Correlations, https://arxiv.org/abs/2508.11141
Haojie Zhang, Yixiong Liang, Hulin Kuang, Lihui Cen, Zhe Qu, Yigang Cen, Min Zeng, Shichao Kan, 8 Aug 2025, Contrastive Regularization over LoRA for Multimodal Biomedical Image Incremental Learning, https://arxiv.org/abs/2508.11673
Reza Shirkavand, Shangqian Gao, Peiran Yu, Heng Huang, 17 Aug 2025, Cost-Aware Contrastive Routing for LLMs, https://arxiv.org/abs/2508.12491
Alicja Ziarko, Michal Bortkiewicz, Michal Zawalski, Benjamin Eysenbach and Piotr Milos, 18 Aug 2025, Contrastive Representations for Temporal Reasoning, https://arxiv.org/abs/2508.13113
Yihan Wang, Yiwei Lu, Guojun Zhang, Franziska Boenisch, Adam Dziedzic, Yaoliang Yu, Xiao-Shan Gao, 16 Aug 2025, MUC: Machine Unlearning for Contrastive Learning with Black-box Evaluation, https://arxiv.org/abs/2406.03603
Kai Sun, Yushi Bai, Zhen Yang, Jiajie Zhang, Ji Qi, Lei Hou and Juanzi Li, 17 Aug 2025, Hard Negative Contrastive Learning for Fine-Grained Geometric Understanding in Large Multimodal Models, https://arxiv.org/abs/2505.20152
Lingyu Si, Jingyao Wang, Wenwen Qiang, 19 Aug 2025, A Generalized Learning Framework for Self-Supervised Contrastive Learning, https://arxiv.org/abs/2508.13596
Ruobing Jiang, Yacong Li, Haobing Liu, Yanwei Yu, 19 Aug 2025, Incorporating Attributes and Multi-Scale Structures for Heterogeneous Graph Contrastive Learning, https://arxiv.org/abs/2503.13911
Tianxi Cai, Feiqing Huang, Ryumei Nakada, Linjun Zhang, Doudou Zhou, 19 Aug 2025, Contrastive Learning on Multimodal Analysis of Electronic Health Records, https://arxiv.org/abs/2403.14926
Qian Zhanga, Ruilin Zhang, Jun Xiao, Yifan Liu and Zhe Wang, 12 Aug 2025, MCLPD:Multi-view Contrastive Learning for EEG-based PD Detection Across Datasets, https://arxiv.org/abs/2508.14073
Chen-Hao Chang, Hui-Ju Hung, Chia-Hsun Lu, Chih-Ya Shen, 20 Aug 2025, Enhancing Contrastive Link Prediction With Edge Balancing Augmentation, https://arxiv.org/abs/2508.14808
Guilhem Faur\'e (MULTISPEECH), Mostafa Sadeghi (MULTISPEECH), Sam Bigeard (MULTISPEECH), Slim Ouni (LORIA, MULTISPEECH), 20 Aug 2025, Towards Skeletal and Signer Noise Reduction in Sign Language Production via Quaternion-Based Pose Encoding and Contrastive Learning, https://arxiv.org/abs/2508.14574
Yifan Zhang, Junhui Hou, 20 Aug 2025, Is Contrastive Distillation Enough for Learning Comprehensive 3D Representations?, https://arxiv.org/abs/2412.08973
Yi Yuan, Joseph Van Duyn, Runze Yan, Zhuoyi Huang, Sulaiman Vesal, Sergey Plis, Xiao Hu, Gloria Hyunjung Kwak, Ran Xiao, Alex Fedorov, 21 Aug 2025, Learning ECG Representations via Poly-Window Contrastive Learning, https://arxiv.org/abs/2508.15225
Junho Song, Jong-Hwan Jang, DongGyun Hong, Joon-myoung Kwon, and Yong-Yeon Jo, 21 Aug 2025, CREMA: A Contrastive Regularized Masked Autoencoder for Robust ECG Diagnostics across Clinical Domains, https://arxiv.org/abs/2407.07110
Pouria Mortezaagha, Arya Rahgozar, 17 Aug 2025, An Auditable Pipeline for Fuzzy Full-Text Screening in Systematic Reviews: Integrating Contrastive Semantic Highlighting and LLM Judgment, https://arxiv.org/abs/2508.15822
Wenqiao Zhu, Ji Liu, Rongjuncheng Zhang, Haipang Wu, Yulun Zhang, 21 Aug 2025, CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning, https://arxiv.org/abs/2508.15868
Yulin Zhu, Xing Ai, Yevgeniy Vorobeychik, Kai Zhou, 22 Aug 2025, Robust Graph Contrastive Learning with Information Restoration, https://arxiv.org/abs/2307.12555
Yushi Lin, Peng Yang, 23 Aug 2025, A Decoupled LOB Representation Framework for Multilevel Manipulation Detection with Supervised Contrastive Learning, https://arxiv.org/abs/2508.17086
Muhammad Aqeel, Danijel Skocaj, Marco Cristani, Francesco Setti, 25 Aug 2025, A Contrastive Learning-Guided Confident Meta-learning for Zero Shot Anomaly Detection, https://arxiv.org/abs/2508.17827
Bin Tan, Wangyao Ge, Yidi Wang, Xin Liu, Jeff Burtoft, Hao Fan, Hui Wang, 25 Aug 2025, PCR-CA: Parallel Codebook Representations with Contrastive Alignment for Multiple-Category App Recommendation, https://arxiv.org/abs/2508.18166
Jiajun He, Naoki Sawada, Koichi Miyazaki, Tomoki Toda, 4 Sep 2025, PARCO: Phoneme-Augmented Robust Contextual ASR via Contrastive Entity Disambiguation, https://arxiv.org/abs/2509.04357
Wenhui Cui, Christopher Sandino, Hadi Pouransari, Ran Liu, Juri Minxha, Ellen L. Zippi, Aman Verma, Anna Sedlackova, Behrooz Mahasseni, Erdrin Azemi, 4 Sep 2025, CPEP: Contrastive Pose-EMG Pre-training Enhances Gesture Generalization on EMG Signals, https://arxiv.org/abs/2509.04699
Wuchao Liu, Han Peng, Wengen Li, Yichao Zhang, Jihong Guan and Shuigeng Zhou, 23 Aug 2025, scI2CL: Effectively Integrating Single-cell Multi-omics by Intra- and Inter-omics Contrastive Learning, https://arxiv.org/abs/2508.18304
Jiangfeng Sun, Sihao He, Zhonghong Ou, Meina Song, 24 Aug 2025, Structures Meet Semantics: Multimodal Fusion via Graph Contrastive Learning, https://arxiv.org/abs/2508.18322
Md. Rashid Shahriar Khan, Md. Abrar Hasan, Mohammod Tareq Aziz Justice, 25 Aug 2025, Context-Aware Zero-Shot Anomaly Detection in Surveillance Using Contrastive and Predictive Spatiotemporal Modeling, https://arxiv.org/abs/2508.18463
Eichi Takaya and Ryusei Inamori, 26 Aug 2025, ModAn-MulSupCon: Modality-and Anatomy-Aware Multi-Label Supervised Contrastive Pretraining for Medical Imaging, https://arxiv.org/abs/2508.18613
Yi-Ping Hsu, Po-Wei Wang, Chantat Eksombatchai, Jiajing Xu, 26 Aug 2025, Taming the One-Epoch Phenomenon in Online Recommendation System by Two-stage Contrastive ID Pre-training, https://arxiv.org/abs/2508.18700
Junhua Liu and Yong Keat Tan and Bin Fu and Kwan Hui Lim, 26 Aug 2025, From Intents to Conversations: Generating Intent-Driven Dialogues with Contrastive Learning for Multi-Turn Classification, https://arxiv.org/abs/2411.14252
Yifan Dou, Adam Khadre, Ruben C Petreaca, Golrokh Mirzaei, 26 Aug 2025, MS-ConTab: Multi-Scale Contrastive Learning of Mutation Signatures for Pan Cancer Representation and Stratification, https://arxiv.org/abs/2508.19424
Jinyuan Feng, Chaopeng Wei, Tenghai Qiu, Tianyi Hu, Zhiqiang Pu, 28 Aug 2025, CoMoE: Contrastive Representation for Mixture-of-Experts in Parameter-Efficient Fine-tuning, https://arxiv.org/abs/2505.17553
Xin Huang, Ruibin Li, Tong Jia, Wei Zheng, Ya Wang, 28 Aug 2025, Visual Perturbation and Adaptive Hard Negative Contrastive Learning for Compositional Reasoning in Vision-Language Models, https://arxiv.org/abs/2505.15576
Amartya Banerjee, Somnath Kar, Anirban Pal, Debabrata Maiti, 31 Aug 2025, Valid Property-Enhanced Contrastive Learning for Targeted Optimization & Resampling for Novel Drug Design, https://arxiv.org/abs/2509.00684
Smayan Khanna, Doruk Efe G\"okmen, Risi Kondor, Vincenzo Vitelli, 1 Sep 2025, Graph Contrastive Learning versus Untrained Baselines: The Role of Dataset Size, https://arxiv.org/abs/2509.01541
Hiroshi Sasaki, 2 Sep 2025, Structure-aware Contrastive Learning for Diagram Understanding of Multimodal Models, https://arxiv.org/abs/2509.01959
Juhyeon Lee, Wonduk Seo, Hyunjin An, Seunghyun Lee, Yi Bu, 2 Sep 2025, Better by Comparison: Retrieval-Augmented Contrastive Reasoning for Automatic Prompt Optimization, https://arxiv.org/abs/2509.02093
Micha Livne, 30 Aug 2025, Contrastive MIM: A Contrastive Mutual Information Framework for Unified Generative and Discriminative Representation Learning, https://arxiv.org/abs/2411.10548
Alexander Marusov, Aleksandr Yugay, Alexey Zaytsev, 2 Sep 2025, A theoretical framework for self-supervised contrastive learning for continuous dependent data, https://arxiv.org/abs/2506.09785
Yanmei Hu and Yihang Wu and Bing Sun and Xue Yue and Biao Cai and Xiangtao Li and Yang Chen, 30 Aug 2025, Contrastive clustering based on regular equivalence for influential node identification in complex networks, https://arxiv.org/abs/2509.02609
Yiru Jiao, Sander van Cranenburgh, Simeon Calvert, Hans van Lint, 3 Sep 2025, Structure-preserving contrastive learning for spatial time series, https://arxiv.org/abs/2502.06380
Jack Wilkie, Hanan Hindy, Christos Tachtatzis, Robert Atkinson, 8 Sep 2025, Contrastive Self-Supervised Network Intrusion Detection using Augmented Negative Pairs, https://arxiv.org/abs/2509.06550
Serge Lionel Nikiema, Jordan Samhi, Micheline B\'en\'edicte Moumoula, Alb\'erick Euraste Djir\'e, Abdoul Kader Kabor\'e, Jacques Klein and Tegawend\'e F. Bissyand\'e, 6 Sep 2025, Using Contrastive Learning to Improve Two-Way Reasoning in Large Language Models: The Obfuscation Task as a Case Study, https://arxiv.org/abs/2509.05553
Yuyao Ge, Shenghua Liu, Yiwei Wang, Lingrui Mei, Baolong Bi, Xuanshan Zhou, Jiayu Yao, Jiafeng Guo, Xueqi Cheng, 8 Sep 2025, Focusing by Contrastive Attention: Enhancing VLMs' Visual Reasoning, https://arxiv.org/abs/2509.06461
Mengxue Yang, Chun Yang, Jiaqi Zhu, Jiafan Li, Jingqi Zhang, Yuyang Li, Ying Li, 8 Sep 2025, SLiNT: Structure-aware Language Model with Injection and Contrastive Training for Knowledge Graph Completion, https://arxiv.org/abs/2509.06531
Dipta Neogi, Nourash Azmine Chowdhury, Muhammad Rafsan Kabir, Mohammad Ashrafuzzaman Khan, 8 Sep 2025, Video-Based MPAA Rating Prediction: An Attention-Driven Hybrid Architecture Using Contrastive Learning, https://arxiv.org/abs/2509.06826
Zahra Zamanzadeh Darban, Yiyuan Yang, Geoffrey I. Webb, Charu C. Aggarwal, Qingsong Wen, Shirui Pan, Mahsa Salehi, 7 Sep 2025, DACAD: Domain Adaptation Contrastive Learning for Anomaly Detection in Multivariate Time Series, https://arxiv.org/abs/2404.11269
Moo Hyun Son, Juyoung Bae, Zelin Qiu, Jiale Peng, Kai Xin Li, Yifan Lin, Hao Chen, 9 Sep 2025, Multimodal Contrastive Pretraining of CBCT and IOS for Enhanced Tooth Segmentation, https://arxiv.org/abs/2509.07923
Gul Rukh Khattak, Konstantinos Patlatzoglou, Joseph Barker, Libor Pastika, Boroumand Zeidaabadi, Ahmed El-Medany, Hesham Aggour, Yixiu Liang, Antonio H. Ribeiro, Jeffrey Annis, Antonio Luiz Pinho Ribeiro, Junbo Ge, Daniel B. Kramer, Jonathan W. Waks, Evan Brittain, Nicholas Peters, Fu Siong Ng, Arunashis Sau, 12 Sep 2025, Data distribution impacts the performance and generalisability of contrastive learning-based foundation models of electrocardiograms, https://arxiv.org/abs/2509.10369
Shengqiang Fu, 12 Sep 2025, SI-FACT: Mitigating Knowledge Conflict via Self-Improving Faithfulness-Aware Contrastive Tuning, https://arxiv.org/abs/2509.10208
Wenfang Wu, Tingting Yuan, Yupeng Li, Daling Wang, Xiaoming Fu, 12 Sep 2025, SignClip: Leveraging Mouthing Cues for Sign Language Translation by Multimodal Contrastive Fusion, https://arxiv.org/abs/2509.10266
Christos Sgouropoulos, Christos Nikou, Stefanos Vlachos, Vasileios Theiou, Christos Foukanelis and Theodoros Giannakopoulos, 12 Sep 2025, Prototypical Contrastive Learning For Improved Few-Shot Audio Classification, https://arxiv.org/abs/2509.10074
Zahraa Al Sahili, Ioannis Patras, Matthew Purver, 11 Sep 2025, Data Matters Most: Auditing Social Bias in Contrastive Vision Language Models, https://arxiv.org/abs/2501.13223
Zahraa Al Sahili, Ioannis Patras, Matthew Purver, 11 Sep 2025, Breaking Language Barriers or Reinforcing Bias? A Study of Gender and Racial Disparities in Multilingual Contrastive Vision Language Models, https://arxiv.org/abs/2505.14160
Jia Tang, Xinrui Wang and Songcan Chen, 18 Sep 2025, Global Pre-fixing, Local Adjusting: A Simple yet Effective Contrastive Strategy for Continual Learning, https://arxiv.org/abs/2509.15347
Xinxin Meng, Jiangtao Guo, Yunxiang Zhang, Shun Huang, 19 Sep 2025, Contrastive Learning with Spectrum Information Augmentation in Abnormal Sound Detection, https://arxiv.org/abs/2509.15570
Gwendal Le Vaillant and Yannick Molle, 16 Sep 2025, Contrastive timbre representations for musical instrument and synthesizer retrieval, https://arxiv.org/abs/2509.13285
Artemis Panagopoulou, Le Xue, Honglu Zhou, silvio savarese, Ran Xu, Caiming Xiong, Chris Callison-Burch, Mark Yatskar, Juan Carlos Niebles, 15 Sep 2025, Contra4: Evaluating Contrastive Cross-Modal Reasoning in Audio, Video, Image, and 3D, https://arxiv.org/abs/2506.01275
Wenhui Diao, Haichen Yu, Kaiyue Kang, Tong Ling, Di Liu, Yingchao Feng, Hanbo Bi, Libo Ren, Xuexue Li, Yongqiang Mao, Xian Sun, 16 Sep 2025, RingMo-Aerial: An Aerial Remote Sensing Foundation Model With Affine Transformation Contrastive Learning, https://arxiv.org/abs/2409.13366
Carlos Celemin, Joseph Brennan, Pierluigi Vito Amadori, Tim Bradley, 15 Sep 2025, Learning Representations in Video Game Agents with Supervised Contrastive Imitation Learning, https://arxiv.org/abs/2509.11880
Robin Narsingh Ranabhat, Longwei Wang, Amit Kumar Patel, KC santosh, 14 Sep 2025, Promoting Shape Bias in CNNs: Frequency-Based and Contrastive Regularization for Corruption Robustness, https://arxiv.org/abs/2509.11355
Zihan Dong, Xin Zhou, Ryumei Nakada, Lexin Li and Linjun Zhang, 14 Sep 2025, Contrastive Network Representation Learning, https://arxiv.org/abs/2509.11316
Jing Lan, Hexiao Ding, Hongzhao Chen, Yufeng Jiang, Nga-Chun Ng, Gwing Kei Yip, Gerald W.Y. Cheng, Yunlin Mao, Jing Cai, Liang-ting Lin, Jung Sun Yoo, 18 Sep 2025, Structure-Aware Contrastive Learning with Fine-Grained Binding Representations for Drug Discovery, https://arxiv.org/abs/2509.14788
Neale Ratzlaff, Matthew Lyle Olson, Musashi Hinck, Estelle Aflalo, Shao-Yen Tseng, Vasudev Lal, Phillip Howard, 18 Sep 2025, Debias your Large Multi-Modal Model at Test-Time via Non-Contrastive Visual Attribute Steering, https://arxiv.org/abs/2411.12590
Anna Van Elst, Debarghya Ghoshdastidar, 18 Sep 2025, Tight PAC-Bayesian Risk Certificates for Contrastive Learning, https://arxiv.org/abs/2412.03486
Hao Fang, Jiawei Kong, Tianqu Zhuang, Yixiang Qiu, Kuofeng Gao, Bin Chen, Shu-Tao Xia, Yaowei Wang, Min Zhang, 10 Sep 2025, Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors, https://arxiv.org/abs/2505.15337
Ranga Baminiwatte, Kazi Jewel Rana, Aaron J. Masino, 17 Sep 2025, PhenoGnet: A Graph-Based Contrastive Learning Framework for Disease Similarity Prediction, https://arxiv.org/abs/2509.14037
Chenghao Huang, Xiaolu Chen, Yanru Zhang, and Hao Wang, 17 Sep 2025, FedCoSR: Personalized Federated Learning with Contrastive Shareable Representations for Label Heterogeneity in Non-IID Data, https://arxiv.org/abs/2404.17916

Flash Decoding

Flash decoding is a memory-reducing decoding algorithm introduced by the research team better known for "flash attention" (versions 1, 2, and 3 so far). This is similar memory access reductions applied to the decoding algorithm.

Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, Tianqi Chen, Zhihao Jia, 23 Dec 2023, Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems, https://arxiv.org/abs/2312.15234
Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey
8 Jun 2024 (v2), A Survey on Efficient Inference for Large Language Models, Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang, https://arxiv.org/abs/2404.14294
Ke Hong, Guohao Dai, Jiaming Xu, Qiuli Mao, Xiuhong Li, Jun Liu, kangdi chen, Yuhan Dong, Yu Wang, 2024, FlashDecoding++: Faster Large Language Model Inference with Asynchronization, Flat GEMM Optimization, and Heuristics, Part of Proceedings of Machine Learning and Systems 6 (MLSys 2024) Conference, PDF: https://proceedings.mlsys.org/paper_files/paper/2024/file/5321b1dabcd2be188d796c21b733e8c7-Paper-Conference.pdf (Next generation of Flash Decoding, with improved ascynchronous parallelism of Softmax in both prefill and decoding phases, heuristic dataflow management algorithms, and enhanced GEMM during the decoding phase.)
Together AI, Nov 13, 2023, Announcing Together Inference Engine – the fastest inference available, https://www.together.ai/blog/together-inference-engine-v1
Tri Dao, Daniel Haziza, Francisco Massa, Grigory Sizov, October 12, 2023, Flash-Decoding for long-context inference, https://www.together.ai/blog/flash-decoding-for-long-context-inference
Jinhao Li, Jiaming Xu, Shan Huang, Yonghua Chen, Wen Li, Jun Liu, Yaoxiu Lian, Jiayi Pan, Li Ding, Hao Zhou, Guohao Dai, 6 Oct 2024, Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective, https://arxiv.org/abs/2410.04466
Aniruddha Nrusimha, William Brandon, Mayank Mishra, Yikang Shen, Rameswar Panda, Jonathan Ragan-Kelley, Yoon Kim, 28 May 2025, FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference, https://arxiv.org/abs/2505.22758 https://github.com/aninrusimha/flashformer (Optimizing kernels for low latency in a single isolated query, not a batch, via kernel fusion and running all components in one kernel, along with programming techniques like metaprogramming.)

Top-p Decoding

Top-p decoding is a longstanding decoding method that examines the cumulative probabilities of the top candidate tokens. Top-p is usually combined with top-k decoding into a hybrid top-k top-p decoding algorithm.

Research papers on Top-p decoding:

David Spuler, March 2024, Chapter 26. Decoding Algorithms, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
Sean Welleck, Amanda Bertsch, Matthew Finlayson, Hailey Schoelkopf, Alex Xie, Graham Neubig, Ilia Kulikov, Zaid Harchaoui, 24 Jun 2024, From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models, https://arxiv.org/abs/2406.16838 (Survey and theoretical analysis of many different decoding algorithms, along with various ways to speed them up such as speculative decoding and KV caches.)
Hugging Face, 2024, Text Generation Inference, https://huggingface.co/docs/text-generation-inference/index
David Spuler, March 2024, Top-p Decoding, in Generative AI in C++, https://www.aussieai.com/book/ch26-top-p-decoding
Haoran Wang, Kai Shu, Jan 2025, MakeEveryTokenCount: ASystematic Survey on Decoding Methods for Foundation Model, https://www.researchgate.net/profile/Haoran-Wang-96/publication/387703971_Make_Every_Token_Count_A_Systematic_Survey_on_Decoding_Methods_for_Foundation_Models/links/67784c8ce74ca64e1f49eb15/Make-Every-Token-Count-A-Systematic-Survey-on-Decoding-Methods-for-Foundation-Models.pdf https://github.com/wang2226/Awesome-LLM-Decoding

Min-P Decoding

Min-p decoding is a new minor decoding modification, that mainly improves accuracy (rather than efficiency), but doesn't reduce efficiency either. Similar to top-p decoding, min-p tries to avoid showing tokens with too-low probabilities, so top-p and min-p have the same goal. However, min-p uses a lower threshold for the minimum probability allowed, and changes this threshold dynamically. The discovery of "min-p" was a nice piece of research work, since it is a small coding change that improves accuracy without sacrificing latency.

Research on min-p decoding:

Ignacio de Gregorio, Aug 2024, Elevate LLM Performance by 20% Instantly with Min-P, https://medium.com/@ignacio.de.gregorio.noblejas/elevate-llm-performance-by-20-instantly-with-min-p-c961fe1daf3b
Hugging Face, 2024, Min P style sampling - an alternative to Top P/TopK #27670, https://github.com/huggingface/transformers/issues/27670
Minh Nguyen, Andrew Baker, Andreas Kirsch, Clement Neo, 1 Jul 2024, Min P Sampling: Balancing Creativity and Coherence at High Temperature, https://arxiv.org/abs/2407.01082
Joao Gante, May 2024, New sampling strategy dropped in 🤗 transformers -- Min P sampling , Hugging Face, https://huggingface.co/posts/joaogante/319451541682734

Constrained Decoding

Constrained decoding is an optimization of the decoding algorithm where there are extra constraints on the token that can be output. This extra information can be used to either force inclusion of a particular token, or to exclude a subset of the tokens from consideration. Examples where there is extra information to use in decoding include:

Programming language syntax (code generation)
Parts-of-speech identification

For example, if you're programming an LLM decoding algorithm to output C++ code, then you know that the token 'if' is always followed by a token '(' in the code syntax. Hence, there's not really any need for a full LLM computation after an 'if' token, but the heuristic can be used. This idea is using the "constraint" of the language syntax to do "constrained decoding."

Clearly, that heuristic would be much faster, and easily coded. However, it's not all strawberries and cream, because the next token won't have a KV cache for the current token, if we use this heuristic. Hence, the next token would need to do a "mini-prefill" computation to calculate the KV cache, which means there's almost no point in avoiding the current token's computation (i.e., we are simply pushing the current token's computation onto the next token).

However, we've seen this issue of a "missing KV cache" before in early exit or layer skipping optimizations, where the KV cache is missing for any skipped layers (see KV caching). And there are various tricks to avoid fully re-computing the KV cache, such as propagation of the prior one or fusion with another layer. Similar ideas can be used when constrained decoding skips an LLM computation and the next token's KV cache is thereby absent.

Overlapped parallel computation can be used to address the missing KV cache, as also possible for early exit. The constraints of the language grammar allow the second token's inference to start almost immediately, possibly via a heuristic that does not even involve LLM layer execution. However, the computation of the current token's KV cache can still be completed, in parallel to the next token's decoding cycle, by ensuring that the next token's layers are staggered a little behind the current token's KV cache computation. This overlaps the next token's decoding phase with the current token's KV cache computation.

Research papers on constrained decoding:

Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng, 6 Jun 2024 (v2), SGLang: Efficient Execution of Structured Language Model Programs, https://arxiv.org/abs/2312.07104 https://github.com/sgl-project/sglang
K Ahmed, KW Chang, G Van den Broeck, Oct 2024, Controllable Generation via Locally Constrained Resampling, Neurips Safe Generative AI Workshop 2024, https://openreview.net/pdf?id=v091fzXTu0
Gaya Mehenni, Amal Zouaq, 23 Nov 2024, Ontology-Constrained Generation of Domain-Specific Clinical Summaries, https://arxiv.org/abs/2411.15666
Will Kurt, Nov 2024, Say What You Mean: A Response to 'Let Me Speak Freely', https://blog.dottxt.co/say-what-you-mean.html
Zhi Rui Tam, Cheng-Kuang Wu, Yi-Lin Tsai, Chieh-Yen Lin, Hung-yi Lee, Yun-Nung Chen, 14 Oct 2024 (v3), Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models, https://arxiv.org/abs/2408.02442
Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. Guided open vocabulary image captioning with constrained beam search, 2017, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 936–945, https://arxiv.org/abs/1612.00576
Chris Hokamp and Qun Liu, 2017, Lexically constrained decoding for sequence generation using grid beam search. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1535–1546, https://arxiv.org/abs/1704.07138
Yizhe Zhang, Guoyin Wang, Chunyuan Li, Zhe Gan, Chris Brockett, and Bill Dolan. Pointer: Constrained text generation via insertion-based generative pre-training. arXiv preprint arXiv:2005.00558, 2020. https://arxiv.org/abs/2005.00558
Saibo Geng, Martin Josifoski, Maxime Peyrard, Robert West, 18 Jan 2024 (v6), Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning, https://arxiv.org/abs/2305.13971 https://github.com/epfl-dlab/GCD
Yanjun Fu, Ethan Baker, Yu Ding, Yizheng Chen, 20 Jul 2024 (v3), Constrained Decoding for Secure Code Generation, https://arxiv.org/abs/2405.00218 https://codeguardplus.github.io/
Zekun Hao, David W. Romero, Tsung-Yi Lin, Ming-Yu Liu, 12 Dec 2024, Meshtron: High-Fidelity, Artist-Like 3D Mesh Generation at Scale, https://arxiv.org/abs/2412.09548 https://research.nvidia.com/labs/dir/meshtron/ (Optimizations to avoid the quadratic Transformer cost, in both training and inference, include "hourglass neural architecture" analogous to widthwise pruning or slimming, sliding window attention, rolling KV cache, truncated sequence training, and a "robust sampling strategy" that is effectively a type of constrained decoding based on mesh layouts.)
Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yongkang Wu, Zhonghua Li, Qi Ye, Zhicheng Dou, 16 Dec 2024, RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation, https://arxiv.org/abs/2412.11919 https://github.com/sunnynexus/RetroLLM
Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
Haoran Wang, Kai Shu, Jan 2025, MakeEveryTokenCount: ASystematic Survey on Decoding Methods for Foundation Model, https://www.researchgate.net/profile/Haoran-Wang-96/publication/387703971_Make_Every_Token_Count_A_Systematic_Survey_on_Decoding_Methods_for_Foundation_Models/links/67784c8ce74ca64e1f49eb15/Make-Every-Token-Count-A-Systematic-Survey-on-Decoding-Methods-for-Foundation-Models.pdf https://github.com/wang2226/Awesome-LLM-Decoding
Theia Vogel, December 18, 2023, How to make LLMs go fast, https://vgel.me/posts/faster-inference/
D Banerjee, T Suresh, S Ugare, S Misailovic, G Singh, Mar 2025, Preserving Reasoning Capabilities Under Constrained LLM Generation, https://openreview.net/pdf?id=RX3GIOkGHr
Changran Xu, Yi Liu, Yunhao Zhou, Shan Huang, Ningyi Xu, Qiang Xu, 18 Mar 2025, Speculative Decoding for Verilog: Speed and Quality, All in One, https://arxiv.org/abs/2503.14153
Niels M\"undler and Jasper Dekoninck and Martin Vechev, 13 Aug 2025, Constrained Decoding of Diffusion LLMs with Context-Free Grammars, https://arxiv.org/abs/2508.10111
Lingxiao Li, Salar Rahili, Yiwei Zhao, 20 Aug 2025, Correctness-Guaranteed Code Generation via Constrained Decoding, https://arxiv.org/abs/2508.15866
Parv Kapoor, Akila Ganlath, Changliu Liu, Sebastian Scherer, Eunsuk Kang, 1 Sep 2025, Constrained Decoding for Robotics Foundation Models, https://arxiv.org/abs/2509.01728
Devansh, Sep 2025, The Chocolate Milk Cult’s Guide to Inference Scaling for AI Models: How to Reduce the costs of Running LLMs https://machine-learning-made-simple.medium.com/the-chocolate-milk-cults-guide-to-inference-scaling-for-ai-models-50aa2290eb50 (Deep analysis of using many progressive optimizations to real-life LLM inference.)

Multi-Token Decoding

Multi-token decoding is an optimization whereby two or more tokens are output in a single decoding step. The idea of multi-token decoding is to train a special type of model so that it predicts not just the next token, but also the one after that (and possibly more). This improves on autoregressive decoding because the output is no longer one-at-a-time.

Shikhar Tuli, Chi-Heng Lin, Yen-Chang Hsu, Niraj K. Jha, Yilin Shen, Hongxia Jin, 1 May 2024, DynaMo: Accelerating Language Model Inference with Dynamic Multi-Token Sampling, https://arxiv.org/abs/2405.00888 (A model trained to predict multiple tokens ahead.)
Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozière, David Lopez-Paz, Gabriel Synnaeve, 30 Apr 2024, Better & Faster Large Language Models via Multi-token Prediction, https://arxiv.org/abs/2404.19737 Project: https://huggingface.co/facebook/multi-token-prediction
Michael Nuñez, July 4, 2024, Meta drops AI bombshell: Multi-token prediction models now open for research, https://venturebeat.com/ai/meta-drops-ai-bombshell-multi-token-prediction-models-now-open-for-research/
Zongyue Qin, Ziniu Hu, Zifan He, Neha Prakriya, Jason Cong, Yizhou Sun, 12 Jul 2024, Multi-Token Joint Speculative Decoding for Accelerating Large Language Model Inference, https://arxiv.org/abs/2407.09722
Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D Lee, Deming Chen, and Tri Dao. Medusa: Simple llm inference acceleration framework with multiple decoding heads. arXiv preprint arXiv:2401.10774, 2024 https://arxiv.org/abs/2401.10774
Guanqiao Qu, Qiyuan Chen, Wei Wei, Zheng Lin, Xianhao Chen, Kaibin Huang, July 2024, Mobile Edge Intelligence for Large Language Models: A Contemporary Survey, https://www.techrxiv.org/doi/pdf/10.36227/techrxiv.172115025.57884352
Taehyeon Kim, Ananda Theertha Suresh, Kishore Papineni, Michael Riley, Sanjiv Kumar, Adrian Benton, 2024, Exploring and Improving Drafts in Blockwise Parallel Decoding, https://openreview.net/pdf?id=KtnUTS1f91
Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer, 1 May 2024 (v6), LLM Inference Unveiled: Survey and Roofline Model Insights, https://arxiv.org/abs/2402.16363 Code: https://github.com/hahnyuan/LLM-Viewer
David Spuler, 25th August, 2024, Hot Inference Optimization Techniques, https://www.aussieai.com/blog/hot-inference-research
Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Tri Dao, September 11, 2023, Medusa: Simple framework for accelerating LLM generation with multiple decoding heads, https://www.together.ai/blog/medusa
Wei Zhong, Manasa Bharadwaj, 1 Jun 2024 (v2), S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs, https://arxiv.org/abs/2405.20314
Desh Raj, Gil Keren, Junteng Jia, Jay Mahadeokar, Ozlem Kalinli, 12 Sep 2024, Faster Speech-LLaMA Inference with Multi-token Prediction, https://arxiv.org/abs/2409.08148
Zilin Xiao, Hongming Zhang, Tao Ge, Siru Ouyang, Vicente Ordonez, Dong Yu, 8 Oct 2024, ParallelSpec: Parallel Drafter for Efficient Speculative Decoding, https://arxiv.org/abs/2410.05589 (Multi-token prediction in draft models for speculative decoding.)
Siru Ouyang, Shuohang Wang, Minhao Jiang, Ming Zhong, Donghan Yu, Jiawei Han, Yelong Shen, 14 Oct 2024, Temperature-Centric Investigation of Speculative Decoding with Knowledge Distillation, https://arxiv.org/abs/2410.10141 https://github.com/ozyyshr/TempSpec
Tan Dat Nguyen, Ji-Hoon Kim, Jeongsoo Choi, Shukjae Choi, Jinseok Park, Younglo Lee, Joon Son Chung, 17 Oct 2024, Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding, https://arxiv.org/abs/2410.13839
Anonymous Authors, Oct 2024, Optimized Multi-Token Joint Decoding With Auxiliary Model for LLM Inference, https://openreview.net/pdf?id=ZHhBawo3k5
Pengfei Wu, Jiahao Liu, Zhuocheng Gong, Qifan Wang, Jinpeng Li, Jingang Wang, Xunliang Cai, Dongyan Zhao, 27 Oct 2024, FIRP: Faster LLM inference via future intermediate representation prediction, https://arxiv.org/abs/2410.20488
DP Ghosh, DA Team, Oct 29, 2024, Multi-Token Prediction with Extended Transformer Layers, https://www.researchgate.net/profile/Debiprasad-Ghosh/publication/385311204_Multi-Token_Prediction_with_Extended_Transformer_Layers/links/671fdd2c55a5271cdee28059/Multi-Token-Prediction-with-Extended-Transformer-Layers.pdf
Yash Akhauri, Safeen Huda, Mohamed S. Abdelfattah, 26 Nov 2024, Attamba: Attending To Multi-Token States, https://arxiv.org/abs/2411.17685
Shibaranjani Dasgupta, Chandan Maity, Somdip Mukherjee, Rohan Singh, Diptendu Dutta, Debasish Jana, 14 Dec 2024, HITgram: A Platform for Experimenting with n-gram Language Models, https://arxiv.org/abs/2412.10717
Y Li, K Livescu, J Zhou, Dec 2024, Beyond Token Generation: Adaptive Chunk-Distilled Language Modeling, 38th Conference on Neural Information Processing Systems (NeurIPS 2024), https://neurips2024-enlsp.github.io/papers/paper_90.pdf (Generate multiple tokens in decoding by inserting RAG chunks directly into the decoding output.)
Tim Urista, Dec 2024, Dramatically Reduce Inference Costs with DeepSeek-V3: A New Era in Open-Source LLMs, https://ai.gopubby.com/dramatically-reduce-inference-costs-with-deepseek-v3-a-new-era-in-open-source-llms-4f1adf760ee1
Yanhong Li, Karen Livescu, Jiawei Zhou, 31 Dec 2024, Chunk-Distilled Language Modeling, https://arxiv.org/abs/2501.00343 (Multi-token decoding using retrieval.)
Sean Welleck, Amanda Bertsch, Matthew Finlayson, Hailey Schoelkopf, Alex Xie, Graham Neubig, Ilia Kulikov, Zaid Harchaoui, 20 Nov 2024 (v2), From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models, https://arxiv.org/abs/2406.16838
Minhajul Hoque, Jan 4, 2025, DeepSeek V3: How They Achieved Big Results with Small Compute, https://ai.plainenglish.io/deepseek-v3-how-they-achieved-big-results-with-small-compute-fb694606d59a (DeepSeek optimizations included FP8 quantization with outlier handling, attention and KV cache optimization via Multi-Head Latent Attention (MHLA), and multi-token decoding.)
Nandini Lokesh Reddy, Jan 2025, DeepSeek: Bridging Performance and Efficiency in Modern AI, https://medium.com/@nandinilreddy/deepseek-bridging-performance-and-efficiency-in-modern-ai-106181a85693
Qianhui Zhao, Li Zhang, Fang Liu, Xiaoli Lian, Qiaoyuanhe Meng, Ziqian Jiao, Zetong Zhou, Borui Zhang, Runlin Guo, Jia Li, 24 Feb 2025, CodeSwift: Accelerating LLM Inference for Efficient Code Generation, https://arxiv.org/abs/2502.17139 (Using draft sequences from a datastore of code, to achieve parallel inference, similar to prompt looking decoding or retrieval lookup decoding.)
Yunhai Hu, Zining Liu, Zhenyuan Dong, Tianfan Peng, Bradley McDanel, Sai Qian Zhang, 27 Feb 2025, Speculative Decoding and Beyond: An In-Depth Review of Techniques, https://arxiv.org/abs/2502.19732
Yijiong Yu, 26 Mar 2025, Accelerate Parallelizable Reasoning via Parallel Decoding within One Sequence, https://arxiv.org/abs/2503.20533 https://github.com/yuyijiong/parallel-decoding-in-one-sequence
Chengen Wang, Murat Kantarcioglu, 14 Mar 2025, A Review of DeepSeek Models' Key Innovative Techniques, https://arxiv.org/abs/2503.11486
L. Xiong et al., May 2025, DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models, IEEE/CAA Journal of Automatica Sinica, vol. 12, no. 5, pp. 841-858, May 2025, doi: 10.1109/JAS.2025.125495, https://ieeexplore.ieee.org/abstract/document/11005752
Anastasios Gerontopoulos, Spyros Gidaris, Nikos Komodakis, 15 May 2025, Multi-Token Prediction Needs Registers, https://arxiv.org/abs/2505.10518
Somesh Mehra, Javier Alonso Garcia, Lukas Mauch, 13 Feb 2025, On multi-token prediction for efficient LLM inference, https://arxiv.org/abs/2502.09419?
Xiaohao Liu, Xiaobo Xia, Weixiang Zhao, Manyi Zhang, Xianzhi Yu, Xiu Su, Shuo Yang, See-Kiong Ng, Tat-Seng Chua, 23 May 2025, L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models, https://arxiv.org/abs/2505.17505
Stephen Diehl, 2025, Attention Wasn't All We Needed, https://www.stephendiehl.com/posts/post_transformers/
Anirudhan Badrinath, Prabhat Agarwal, Laksh Bhasin, Jaewon Yang, Jiajing Xu, Charles Rosenberg, 6 Aug 2025, PinRec: Outcome-Conditioned, Multi-Token Generative Retrieval for Industry-Scale Recommendation Systems, https://arxiv.org/abs/2504.10507
https://developer.nvidia.com/blog/an-introduction-to-speculative-decoding-for-reducing-latency-in-ai-inference/
Carl Franzen, September 24, 2025, Chinese food delivery app Meituan's open source AI model LongCat-Flash-Thinking rivals GPT-5, https://venturebeat.com/ai/chinese-food-delivery-firm-meituans-open-source-ai-model-longcat-flash

Stop Tokens

Stop tokens are one way whereby LLMs can be trained to control the length of their output. The idea is that stop tokens are incorporated at the end of answers during the training phase, and when they occur in an inference phase, they cause the LLM to stop outputting further tokens at that point.

Research papers with coverage of stop token techniques:

Louis-François Bouchard, May 10, 2024, How LLMs Know When to Stop Generating? Understand how LLMs like GPT-4 decide when they have answered your question, https://pub.towardsai.net/how-llms-know-when-to-stop-generating-b82a9a57e2c4
Lianghong Guo, Yanlin Wang, Ensheng Shi, Wanjun Zhong, Hongyu Zhang, Jiachi Chen, Ruikai Zhang, Yuchi Ma, Zibin Zheng, 29 Jul 2024, When to Stop? Towards Efficient Code Generation in LLMs with Excess Token Prevention, https://arxiv.org/abs/2407.20042 Code: https://github.com/DeepSoftwareAnalytics/CodeFast
Jiaming Li, Lei Zhang, Yunshui Li, Ziqiang Liu, yuelin bai, Run Luo, Longze Chen, Min Yang, 1 Oct 2024 (v2), Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models, https://arxiv.org/abs/2409.18943 https://github.com/Geaming2002/Ruler
Bradley Butcher, Michael O'Keefe, James Titchener, 16 Dec 2024, Precise Length Control in Large Language Models, https://arxiv.org/abs/2412.11937

General Research on Decoding Algorithms

Papers on the various decoding methods include:

S Bae, J Ko, H Song, SY Yun, Oct 2023, Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding, arXiv preprint arXiv:2310.05424, https://arxiv.org/pdf/2310.05424.pdf, Code: https://github.com/raymin0223/fast_robust_early_exit (Combination of early-exit with a "shallow-deep module" and parallel decoding.)
Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K. Li, Richard Socher, 2018, Non-Autoregressive Neural Machine Translation, International Conference on Learning Representations, https://arxiv.org/abs/1711.02281 (Parallel decoding early paper.)
Marjan Ghazvininejad, Omer Levy, Yinhan Liu, and Luke Zettlemoyer. 2019. Mask-predict: Parallel decoding of conditional masked language models. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 6111–6120. Association for Computational Linguistics. https://arxiv.org/abs/1904.09324
Jiatao Gu and Xiang Kong. 2021. Fully non-autoregressive neural machine translation: Tricks of the trade. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 120–133, https://arxiv.org/abs/2012.15833
Nikolay Savinov, Junyoung Chung, Mikolaj Binkowski, Erich Elsen, and Aaron van den Oord. 2022. Step-unrolled denoising autoencoders for text generation. International Conference on Learning Representations. https://arxiv.org/abs/2112.06749
Andrea Santilli, Silvio Severino, Emilian Postolache, Valentino Maiorca, Michele Mancusi, Riccardo Marin, and Emanuele Rodolà. May 2023. Accelerating transformer inference for translation via parallel decoding. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 12336–12355. Association for Computational Linguistics. https://arxiv.org/abs/2305.10427
Y Zhang, Y Zhang, L Cui, G Fu, Oct 2023, Non-autoregressive Text Editing with Copy-aware Latent Alignments, arXiv preprint arXiv:2310.07821, https://arxiv.org/pdf/2310.07821.pdf
Tri Dao, Daniel Haziza, Francisco Massa, Grigory Sizov, October 13, 2023, Flash-Decoding for long-context inference, PyTorch Blog, https://pytorch.org/blog/flash-decoding/
Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, Richard Socher, Sep 2019, CTRL: A Conditional Transformer Language Model for Controllable Generation, https://arxiv.org/abs/1909.05858, Code: https://github.com/salesforce/ctrl
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe, Mar 2022, Training language models to follow instructions with human feedback, https://arxiv.org/abs/2203.02155 (InstructGPT main paper from OpenAI in 2022.)
Ning Gong, Nianmin Yao, June 2023, A generalized decoding method for neural text generation, Computer Speech & Language, Volume 81, 101503, https://www.sciencedirect.com/science/article/abs/pii/S0885230823000220
Cohere, 2023, Temperature, https://docs.cohere.com/docs/temperature
GC Garbacea, 2023, Neural Language Generation for Content Adaptation: Explainable, Efficient Low-Resource Text Simplification and Evaluation, Ph.D. thesis, Computer Science and Engineering, University of Michigan, https://deepblue.lib.umich.edu/bitstream/handle/2027.42/178028/garbacea_1.pdf?sequence=1 (Broad thesis with sections on beam search decoding optimizations and AI safety issues such as bias.)
Yizhe Zhang, Guoyin Wang, Chunyuan Li, Zhe Gan, Chris Brockett, and Bill Dolan. Pointer: Constrained text generation via insertion-based generative pre-training. arXiv preprint arXiv:2005.00558, 2020. https://arxiv.org/abs/2005.00558
Bryan Eikema and Wilker Aziz. 2020. Is MAP decoding all you need? The inadequacy of the mode in neural machine translation. Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), December 8- 13, 2020, pages 4506–4520. International Committee on Computational Linguistics. https://arxiv.org/abs/2005.10283
Haoran Yang, Deng Cai, Huayang Li, Wei Bi, Wai Lam, Shuming Shi, May 2023, A Frustratingly Simple Decoding Method for Neural Text Generation, https://arxiv.org/abs/2305.12675
Clara Meister, Tiago Pimentel, Gian Wiher, and Ryan Cotterell. 2022. Typical decoding for natural language generation. arXiv preprint arXiv:2202.00666, https://arxiv.org/abs/2202.00666 (The "typical sampling" decoding algorithm.)
Yixuan Su, Tian Lan, Yan Wang, Dani Yogatama, Lingpeng Kong, and Nigel Collier. 2022. A contrastive framework for neural text generation. Advances in Neural Information Processing Systems, https://arxiv.org/abs/2202.06417 (The "contrastive search" decoding algorithm.)
Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. SimCSE: Simple contrastive learning of sentence embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894–6910, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics. https://arxiv.org/abs/2104.08821 (A "contrastive" decoding algorithm.)
John Hewitt, Christopher D. Manning, and Percy Liang. 2022. Truncation sampling as language model desmoothing. In Findings of the Conference on Empirical Methods in Natural Language Processing (Findings of EMNLP). https://arxiv.org/abs/2210.15191 (The "truncation sampling" decoding algorithm.)
Xiang Lisa Li, Ari Holtzman, Daniel Fried, Percy Liang, Jason Eisner, Tatsunori Hashimoto, Luke Zettlemoyer, and Mike Lewis. 2022. Contrastive decoding: Open-ended text generation as optimization. arXiv preprint arXiv:2210.15097, https://arxiv.org/abs/2210.15097 (A "contrastive decoding" algorithm.)
Ari Holtzman, Jan Buys, Maxwell Forbes, Antoine Bosselut, David Golub, Yejin Choi, 2018, Learning to Write with Cooperative Discriminators, https://arxiv.org/abs/1805.06087
Massimo Caccia, Lucas Caccia, William Fedus, Hugo Larochelle, Joelle Pineau, and Laurent Charlin, 2020, Language GANs falling short, International Conference on Learning Representations. https://arxiv.org/abs/1811.02549
Moin Nadeem, Tianxing He, Kyunghyun Cho, and James Glass, 2020, A systematic characterization of sampling algorithms for open-ended language generation, Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pp. 334–346. https://arxiv.org/abs/2009.07243, Code: https://github.com/moinnadeem/characterizing-sampling-algorithms
Hugh Zhang, Daniel Duckworth, Daphne Ippolito, and Arvind Neelakantan, 2021, Trading off diversity and quality in natural language generation, EACL 2021, p. 25, https://arxiv.org/abs/2004.10450
Yunqi Zhu, Xuebing Yang, Yuanyuan Wu, Wensheng Zhang, 22 Mar 2024, Hierarchical Skip Decoding for Efficient Autoregressive Text Generation, https://arxiv.org/abs/2403.14919 (A new decoding algorithm called Hierarchical Skip Decoding involving layer skipping.)
Yassir Fathullah, Puria Radmard, Adian Liusie, Mark J. F. Gales, 2024, Who Needs Decoders? Efficient Estimation of Sequence-Level Attributes with Proxies, Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics Volume 1: Long Papers, pages 1478–1496 March 17-22, 2024, https://aclanthology.org/2024.eacl-long.89.pdf (Non-autoregressive decoding methods in special use cases such as machine language translation.)
Abhimanyu Bambhaniya, Ritik Raj, Geonhwa Jeong, Souvik Kundu, Sudarshan Srinivasan, Midhilesh Elavazhagan, Madhu Kumar, Tushar Krishna, 3 Jun 2024, Demystifying Platform Requirements for Diverse LLM Inference Use Cases, https://arxiv.org/abs/2406.01698 Code: https://github.com/abhibambhaniya/GenZ-LLM-Analyzer (Analysis of cost of serving LLMs, including separate profiles of prefill versus decoding phases, and the cost of extra prompt processing in RAG architectures with prepended information.)
Yechen Xu, Xinhao Kong, Tingjun Chen, Danyang Zhuo, 4 Jun 2024 (v2), Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution, https://arxiv.org/abs/2406.00059 Code: https://github.com/conveyor-sys/conveyor (Speeding up inference by partially running tools in parallel to the LLM query procesisng, rather than sequentially after the LLM request, by detecting tool requests deep inside the decoding algorithm and starting them off immediately, before the LLM has finished generating the fully decoed output.)
Hao (Mark) Chen, Wayne Luk, Ka Fai Cedric Yiu, Rui Li, Konstantin Mishchenko, Stylianos I. Venieris, Hongxiang Fan, 28 May 2024, Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference, https://arxiv.org/abs/2405.18628 Code: https://github.com/hmarkc/parallel-prompt-decoding (Similar to speculative decoding with extra trained prompt tokens and a tree-structured verification of multiple optional draft sequences.)
Maxime Peyrard, Martin Josifoski, Robert West, 21 Mar 2024, The Era of Semantic Decoding, https://arxiv.org/abs/2403.14562
Ethan Shen, Alan Fan, Sarah M Pratt, Jae Sung Park, Matthew Wallingford, Sham M. Kakade, Ari Holtzman, Ranjay Krishna, Ali Farhadi, Aditya Kusupati, 28 May 2024, Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass, https://arxiv.org/abs/2405.18400 https://github.com/RAIVNLab/SuperposedDecoding (Generating multiple possible drafts from a single decoding algorithm with one model pass by superimposing embeddings and using top-k decoding.)
Rya Sanovar, Srikant Bharadwaj, Renee St. Amant, Victor Rühle, Saravan Rajmohan, 17 May 2024, Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers, https://arxiv.org/abs/2405.10480
Yifu Qiu, Zheng Zhao, Yftah Ziser, Anna Korhonen, Edoardo M. Ponti, Shay B. Cohen, 15 May 2024, Spectral Editing of Activations for Large Language Model Alignment, https://arxiv.org/pdf/2405.09719 Code: https://github.com/yfqiu-nlp/sea-llm
D Shin, May 8, 2024, Multi-User Language Model Resource Allocation Using Contextual Pause Token Aware Transformers, Technical Disclosure Commons, https://www.tdcommons.org/dpubs_series/6981/ PDF: https://www.tdcommons.org/cgi/viewcontent.cgi?article=8121&context=dpubs_series (Interesting idea of training a model how and when to pause during inference, so it can be pre-empted if needed, and thus the overall system can schedule batching of multiple queries more optimally.)
Shujian Zhang, Korawat Tanwisuth, Chengyue Gong, Pengcheng He, Mingyuan Zhou, 7 May 2024, Switchable Decision: Dynamic Neural Generation Networks, https://arxiv.org/abs/2405.04513 (Switching and skipping sub-layer components such as attention heads, FFNs, or input token skipping, using decisions made based on allocating computation resources.)
Shikhar Tuli, Chi-Heng Lin, Yen-Chang Hsu, Niraj K. Jha, Yilin Shen, Hongxia Jin, 1 May 2024, DynaMo: Accelerating Language Model Inference with Dynamic Multi-Token Sampling, https://arxiv.org/abs/2405.00888 (A model trained to predict multiple tokens ahead.)
Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozière, David Lopez-Paz, Gabriel Synnaeve, 30 Apr 2024, Better & Faster Large Language Models via Multi-token Prediction, https://arxiv.org/abs/2404.19737
Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari, 22 Apr 2024, OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework, Apple Research, https://arxiv.org/abs/2404.14619 Code: https://huggingface.co/apple/OpenELM
Jared Lichtarge, Christopher Alberti, Shankar Kumar, Noam Shazeer, Niki Parmar, 31 Oct 2018, Weakly Supervised Grammatical Error Correction using Iterative Decoding, https://arxiv.org/abs/1811.01710
Cunchen Hu, Heyang Huang, Liangliang Xu, Xusheng Chen, Jiang Xu, Shuang Chen, Hao Feng, Chenxi Wang, Sa Wang, Yungang Bao, Ninghui Sun, Yizhou Shan, 20 Jan 2024, Inference without Interference: Disaggregate LLM Inference for Mixed Downstream Workloads, https://arxiv.org/abs/2401.11181 (Separating the prefill and decoding phases for optimization.)
Amey Agrawal, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Ramachandran Ramjee, 31 Aug 2023, SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills, https://arxiv.org/abs/2308.16369 (Examines the different GPU costs of prefill vs decoding phases, and optimizes decoding by "piggybacking" off the more intense computation during prefill.)
You Zhou, Xiujing Lin, Xiang Zhang, Maolin Wang, Gangwei Jiang, Huakang Lu, Yupeng Wu, Kai Zhang, Zhe Yang, Kehang Wang, Yongduo Sui, Fengwei Jia, Zuoli Tang, Yao Zhao, Hongxuan Zhang, Tiannuo Yang, Weibo Chen, Yunong Mao, Yi Li, De Bao, Yu Li, Hongrui Liao, Ting Liu, Jingwen Liu, Jinchi Guo, Xiangyu Zhao, Ying WEI, Hong Qian, Qi Liu, Xiang Wang, Wai Kin (Victor)Chan, Chenliang Li, Yusen Li, Shiyu Yang, Jining Yan, Chao Mou, Shuai Han, Wuxia Jin, Guannan Zhang, Xiaodong Zeng, Nov 2023, On the Opportunities of Green Computing: A Survey, https://arxiv.org/abs/2311.00447 (Extensive survey of environmental and green AI issues, along with a survey of various optimization methods to reduce AI resource requirements in training and inference.)
Pratyush Patel, Esha Choukse, Chaojie Zhang, Íñigo Goiri, Aashaka Shah, Saeed Maleki, Ricardo Bianchini, 30 Nov 2023, Splitwise: Efficient generative LLM inference using phase splitting, https://arxiv.org/abs/2311.18677 (Separates the two Transformer phases of initial prompt computation or prefill to generate the KV cache, and the token generation phase or decoding algorithm onto two machines.)
Yao Zhao, Zhitian Xie, Chenyi Zhuang, Jinjie Gu, Jan 2024, Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy, https://arxiv.org/abs/2312.12728 Code: https://github.com/alipay/PainlessInferenceAcceleration
Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, Tianqi Chen, Zhihao Jia, 23 Dec 2023, Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems, https://arxiv.org/abs/2312.15234
Yang Song, Chenlin Meng, Renjie Liao, Stefano Ermon, 2021, Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving, Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021, https://proceedings.mlr.press/v139/song21a/song21a.pdf
Yichao Fu, Peter Bailis, Ion Stoica, Hao Zhang, Nov 21, 2023, Break the Sequential Dependency of LLM Inference Using Lookahead Decoding, https://lmsys.org/blog/2023-11-21-lookahead-decoding/ Code: https://github.com/hao-ai-lab/LookaheadDecoding (Generates tokens in parallel by using Jacobi iteration.)
N Varshney, A Chatterjee, M Parmar, C Baral, Oct 2023, arXiv preprint arXiv:2310.18581, Accelerating LLM Inference by Enabling Intermediate Layer Decoding, https://arxiv.org/pdf/2310.18581.pdf (Dynamic confidence-based early exiting analysis on LLama models.)
Chufan Shi, Haoran Yang, Deng Cai, Zhisong Zhang, Yifan Wang, Yujiu Yang, Wai Lam, 10 Feb 2024, A Thorough Examination of Decoding Methods in the Era of LLMs, https://arxiv.org/abs/2402.06925 (Evaluates a number of decoding algorithms with several 7B models including Llama2-7B, and also with 4-bit and 8-bit quantization.)
Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhijun Tu, Kai Han, Hailin Hu, Dacheng Tao, 5 Feb 2024. A Survey on Transformer Compression. https://arxiv.org/abs/2402.05964 (Model compression survey paper with focus on pruning, quantization, knowledge distillation, and efficient architecture design.)
Xuanlei Zhao, Bin Jia, Haotian Zhou, Ziming Liu, Shenggan Cheng, Yang You, 2 Mar 2024, HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices, https://arxiv.org/abs/2403.01164
Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, Ramachandran Ramjee, 4 Mar 2024, Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve, https://arxiv.org/abs/2403.02310 (Faster latency by scheduling of prefill and decoding algorithm phases.)
C Hooper, S Kim, H Mohammadzadeh, H Genc, Oct 2023, SPEED: Speculative Pipelined Execution for Efficient Decoding https://arxiv.org/pdf/2310.12072.pdf
Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. Guided open vocabulary image captioning with constrained beam search, 2017, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 936–945, https://arxiv.org/abs/1612.00576
Chris Hokamp and Qun Liu, 2017, Lexically constrained decoding for sequence generation using grid beam search. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1535–1546, https://arxiv.org/abs/1704.07138
David Spuler, March 2024, Chapter 26. Decoding Algorithms, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
S Yang, G Lee, J Cho, D Papailiopoulos, 2023, Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding, https://arxiv.org/abs/2307.05908
Ke Hong, Guohao Dai, Jiaming Xu, Qiuli Mao, Xiuhong Li, Jun Liu, kangdi chen, Yuhan Dong, Yu Wang, 2024, FlashDecoding++: Faster Large Language Model Inference with Asynchronization, Flat GEMM Optimization, and Heuristics, Part of Proceedings of Machine Learning and Systems 6 (MLSys 2024) Conference, PDF: https://proceedings.mlsys.org/paper_files/paper/2024/file/5321b1dabcd2be188d796c21b733e8c7-Paper-Conference.pdf (Next generation of Flash Decoding, with improved ascynchronous parallelism of Softmax in both prefill and decoding phases, heuristic dataflow management algorithms, and enhanced GEMM during the decoding phase.)
kipply's blog, 2023-03-30, Transformer Taxonomy (the last lit review), https://kipp.ly/transformer-taxonomy/ (Papers for all the Transformer architectures and milestone papers for the major optimization improvements on them.)
Trenton Bricken, November 20, 2019, Tail Free Sampling A new way to sample from language models for text generation, https://www.trentonbricken.com/Tail-Free-Sampling/ (Alternative to top-k/top-p decoding.)
Sean Welleck, Amanda Bertsch, Matthew Finlayson, Hailey Schoelkopf, Alex Xie, Graham Neubig, Ilia Kulikov, Zaid Harchaoui, 24 Jun 2024, From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models, https://arxiv.org/abs/2406.16838 (Survey and theoretical analysis of many different decoding algorithms, along with various ways to speed them up such as speculative decoding and KV caches.)
Mouxiang Chen, Hao Tian, Zhongxin Liu, Xiaoxue Ren, Jianling Sun, 5 Jun 2024 (v2), JumpCoder: Go Beyond Autoregressive Coder via Online Modification, https://arxiv.org/abs/2401.07870 Code: https://github.com/Keytoyze/JumpCoder
Zexuan Qiu, Zijing Ou, Bin Wu, Jingjing Li, Aiwei Liu, Irwin King, 25 Jun 2024, Entropy-Based Decoding for Retrieval-Augmented Large Language Models, https://arxiv.org/abs/2406.17519 (Enhanced decoding algorithm for multi-document RAG processing.)
Youngsuk Park, Kailash Budhathoki, Liangfu Chen, Jonas Kübler, Jiaji Huang, Matthäus Kleindessner, Jun Huan, Volkan Cevher, Yida Wang, George Karypis, 12 Jul 2024, Inference Optimization of Foundation Models on AI Accelerators, KDD’24, August 25–29, 2024, Barcelona, Spain, https://arxiv.org/abs/2407.09111
Jiaao He, Kezhao Huang, Jidong Zhai, July 2024, FASTDECODE: High-Throughput LLM Serving through Disaggregating Attention Computation, https://openreview.net/pdf?id=GahfuPsGw2 (Distributing KV caches to multiple nodes.)
Xukun Liu, Bowen Lei, Ruqi Zhang, Dongkuan Xu, 27 Jun 2024, Adaptive Draft-Verification for Efficient Large Language Model Decoding, https://arxiv.org/abs/2407.12021 Project: https://anonymous.4open.science/r/ADED-C7D5 (A draft-and-verification method that is similar to speculative decoding, but differs.)
Leo Donisch, Sigurd Schacht, Carsten Lanquillon, 6 Aug 2024, Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations, https://arxiv.org/abs/2408.03130
Yunjia Xi, Hangyu Wang, Bo Chen, Jianghao Lin, Menghui Zhu, Weiwen Liu, Ruiming Tang, Weinan Zhang, Yong Yu, 11 Aug 2024, A Decoding Acceleration Framework for Industrial Deployable LLM-based Recommender Systems, https://arxiv.org/abs/2408.05676 (Determining when speculative decoding is most beneficial.)
Sidharth Mudgal, Jong Lee, Harish Ganapathy, Yaguang Li, Tao Wang, Yanping Huang, Zhifeng Chen, Heng-Tze Cheng, Michael Collins, Trevor Strohman, Jilin Chen, Alex Beutel, Ahmad Beirami, July 2024, Controlled Decoding from Language Models, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:36486-36503, 2024, https://proceedings.mlr.press/v235/mudgal24a.html
Wenhong Zhu, Hongkun Hao, Zhiwei He, Yiming Ai, Rui Wang, July 2024, Improving Open-Ended Text Generation via Adaptive Decoding, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:62386-62404, 2024, https://proceedings.mlr.press/v235/zhu24d.html
Chenhan Yuan, Fei Huang, Ru Peng, Keming Lu, Bowen Yu, Chang Zhou, Jingren Zhou, 20 Aug 2024, Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model, https://arxiv.org/abs/2408.10764 Code: https://github.com/chenhan97/Otter (Inference intervention in the decoding algorithm.)
Yi Cheng, Xiao Liang, Yeyun Gong, Wen Xiao, Song Wang, Yuji Zhang, Wenjun Hou, Kaishuai Xu, Wenge Liu, Wenjie Li, Jian Jiao, Qi Chen, Peng Cheng, Wayne Xiong, Integrative Decoding: Improve Factuality via Implicit Self-consistency, 3 Oct 2024 (v2), https://arxiv.org/abs/2410.01556 (Prepends a previous response to improve decoding accuracy.)
Xinyi Zeng, Yuying Shang, Yutao Zhu, Jiawei Chen, Yu Tian, 9 Oct 2024, Root Defence Strategies: Ensuring Safety of LLM at the Decoding Level, https://arxiv.org/abs/2410.06809
K Ahmed, KW Chang, G Van den Broeck, Oct 2024, Controllable Generation via Locally Constrained Resampling, Neurips Safe Generative AI Workshop 2024, https://openreview.net/pdf?id=v091fzXTu0
Yuxuan Liu, Wenyuan Li, Laizhong Cui, Hailiang Yang, 17 Oct 2024, Cerberus: Efficient Inference with Adaptive Parallel Decoding and Sequential Knowledge Enhancement, https://arxiv.org/abs/2410.13344
Rongxiang Wang and Felix Xiaozhu Lin. 2024. Turbocharge Speech Understanding with Pilot Inference. In Proceedings of the 30th Annual International Conference on Mobile Computing and Networking (ACM MobiCom '24). Association for Computing Machinery, New York, NY, USA, 1299–1313. https://doi.org/10.1145/3636534.3690694 https://dl.acm.org/doi/abs/10.1145/3636534.3690694 https://dl.acm.org/doi/pdf/10.1145/3636534.3690694 ("Pilot inference" is a specialized mix of caching, computation reuse, and backtracking in beam search for speech understanding, and is somewhat related to speculative decoding, and similar to continual inference for processing a stream.)
Yixiong Fang, Ziran Yang, Zhaorun Chen, Zhuokai Zhao, Jiawei Zhou, 9 Dec 2024, From Uncertainty to Trust: Enhancing Reliability in Vision-Language Models with Uncertainty-Guided Dropout Decoding, https://arxiv.org/abs/2412.06474
Xuezhi Wang, Denny Zhou, 23 May 2024 (v2), Chain-of-Thought Reasoning Without Prompting, https://arxiv.org/abs/2402.10200 ("CoT decoding" is examining the alternative paths in the decoding algorithm, which is somewhat similar to Chain-of-Thought reasoning.)
Y Li, K Livescu, J Zhou, Dec 2024, Beyond Token Generation: Adaptive Chunk-Distilled Language Modeling, 38th Conference on Neural Information Processing Systems (NeurIPS 2024), https://neurips2024-enlsp.github.io/papers/paper_90.pdf (Generate multiple tokens in decoding by inserting RAG chunks directly into the decoding output.)
Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
Mehul Damani, Idan Shenfeld, Andi Peng, Andreea Bobu, Jacob Andreas, 7 Oct 2024, Learning How Hard to Think: Input-Adaptive Allocation of LM Computation, https://arxiv.org/abs/2410.04707
Jianyi Zhang, Da-Cheng Juan, Cyrus Rashtchian, Chun-Sung Ferng, Heinrich Jiang, Yiran Chen, 27 Nov 2024 (v2), SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models, https://arxiv.org/abs/2411.02433 https://jayzhang42.github.io/sled_page/ (Decoding algorithm that compares logit values in the final layer with those from earlier layers.)
Yuval Shalev, Amir Feder, Ariel Goldstein, 19 Jun 2024, Distributional reasoning in LLMs: Parallel reasoning processes in multi-hop reasoning, https://arxiv.org/abs/2406.13858 (Using embeddings from intermediate model layers in decoding to mimic reasoning pathways.)
Eden Biran, Daniela Gottesman, Sohee Yang, Mor Geva, Amir Globerson, 14 Oct 2024 (v2), Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries, https://arxiv.org/abs/2406.12775 (Backpatching prior layers using embeddings from the current activations to mimic multi-step reasoning.)
Jacob Pfau, William Merrill, Samuel R. Bowman, 24 Apr 2024, Let's Think Dot by Dot: Hidden Computation in Transformer Language Models, https://arxiv.org/abs/2404.15758 (Use of dummy "filler tokens" similar to "pause tokens" or "reasoning tokens" to aid multi-step reasoning in decoding.)
Haoran Wang, Kai Shu, Jan 2025, MakeEveryTokenCount: ASystematic Survey on Decoding Methods for Foundation Model, https://www.researchgate.net/profile/Haoran-Wang-96/publication/387703971_Make_Every_Token_Count_A_Systematic_Survey_on_Decoding_Methods_for_Foundation_Models/links/67784c8ce74ca64e1f49eb15/Make-Every-Token-Count-A-Systematic-Survey-on-Decoding-Methods-for-Foundation-Models.pdf https://github.com/wang2226/Awesome-LLM-Decoding
Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F. Wong, and Lidia S. Chao. 2019, Learning deep transformer models for machine translation. In Proc. of ACL, 2019. https://arxiv.org/abs/1906.01787
Xuezhe Ma, Chunting Zhou, Xian Li, Graham Neubig, and Eduard H. Hovy. FlowSeq: Non-autoregressive conditional sequence generation with generative flow. In Proc. of EMNLP, 2019. https://arxiv.org/abs/1909.02480.
Raphael Shu, Jason Lee, Hideki Nakayama, and Kyunghyun Cho. 2020, Latent-variable non-autoregressive neural machine translation with deterministic inference using a delta posterior. In Proc. of AAAI, 2020. https://arxiv.org/abs/1908.07181
Huan Ma, Jingdong Chen, Guangyu Wang, Changqing Zhang, 1 Feb 2025, Estimating LLM Uncertainty with Logits, https://arxiv.org/abs/2502.00290
Zeyu Tang, Zhenhao Chen, Loka Li, Xiangchen Song, Yunlong Deng, Yifan Shen, Guangyi Chen, Peter Spirtes, Kun Zhang, 5 Feb 2025, Reflection-Window Decoding: Text Generation with Selective Refinement, https://arxiv.org/abs/2502.03678 (Combination of sliding window attention with pausing.)
Weihua Du, Yiming Yang, Sean Welleck, 7 Feb 2025, Optimizing Temperature for Language Models with Multi-Sample Inference, https://arxiv.org/abs/2502.05234 https://github.com/StigLidu/TURN
Jacob Trauger, Ambuj Tewari, 16 May 2025, On Next-Token Prediction in LLMs: How End Goals Determine the Consistency of Decoding Algorithms, https://arxiv.org/abs/2505.11183
Zhibin Wang, Rui Ning, Chao Fang, Zhonghui Zhang, Xi Lin, Shaobo Ma, Mo Zhou, Xue Li, Zhongfeng Wang, Chengying Huan, Rong Gu, Kun Yang, Guihai Chen, Sheng Zhong, Chen Tian, 23 May 2025, FlashForge: Ultra-Efficient Prefix-Aware Attention for LLM Decoding, https://arxiv.org/abs/2505.17694
Niels M\"undler and Jasper Dekoninck and Martin Vechev, 13 Aug 2025, Constrained Decoding of Diffusion LLMs with Context-Free Grammars, https://arxiv.org/abs/2508.10111
Haonan Ge, Yiwei Wang, Ming-Hsuan Yang, Yujun Cai, 14 Aug 2025, MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs, https://arxiv.org/abs/2508.10264
Timon Merk, Saeed Salehi, Richard M. Koehler, Qiming Cui, Maria Olaru, Amelia Hahn, Nicole R. Provenza, Simon Little, Reza Abbasi-Asl, Phil A. Starr, Wolf-Julian Neumann, 13 Aug 2025, Pre-trained Transformer-models using chronic invasive electrophysiology for symptom decoding without patient-individual training, https://arxiv.org/abs/2508.10160
Keyu Chen, Zhifeng Shen, Daohai Yu, Haoqian Wu, Wei Wen, Jianfeng He, Ruizhi Qiao, Xing Sun, 14 Aug 2025, ASPD: Unlocking Adaptive Serial-Parallel Decoding by Exploring Intrinsic Parallelism in LLMs, https://arxiv.org/abs/2508.08895
Ran Wang, Xiaoxuan Liu, Hao Ren, Gang Chen, Fanchao Qi, Maosong Sun, 22 Jul 2025, WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding, https://arxiv.org/abs/2507.16768
Sijin Yu, Zijiao Chen, Wenxuan Wu, Shengxian Chen, Zhongliang Liu, Jingxin Nie, Xiaofen Xing, Xiangmin Xu, Xin Zhang, 22 Jul 2025, From Flat to Round: Redefining Brain Decoding with Surface-Based fMRI and Cortex Structure, https://arxiv.org/abs/2507.16389
Yuxi Lin and Yaxue Fang and Zehong Zhang and Zhouwu Liu and Siyun Zhong and Fulong Yu, 22 Jul 2025, Decoding Translation-Related Functional Sequences in 5'UTRs Using Interpretable Deep Learning Models, https://arxiv.org/abs/2507.16801
Arindam Ghosh, Mark Fuhs, Bongjun Kim, Anurag Chowdhury, Monika Woszczyna, 14 Jul 2025, ASR-Guided Speaker-Role Diarization and Diarization-Guided ASR Decoding, https://arxiv.org/abs/2507.17765
Milad Taghipour, Bane Vasic, 23 Jul 2025, Action-List Reinforcement Learning Syndrome Decoding for Binary Linear Block Codes, https://arxiv.org/abs/2507.17893
Alex Liu, Lief Esbenshade, Shawon Sarkar, Victor Tian, Zachary Zhang, Kevin He, Min Sun, 23 Jul 2025, Decoding Instructional Dialogue: Human-AI Collaborative Analysis of Teacher Use of AI Tool at Scale, https://arxiv.org/abs/2507.17985
Anushka Tiwari, Sayantan Pal, Rohini K. Srihari, Kaiyi Ji, 19 Jul 2025, Task-Agnostic Continual Prompt Tuning with Gradient-Based Selection and Decoding, https://arxiv.org/abs/2507.14725
Xiaojuan Zhang and Tianyu Jiang and Haoxiang Zong and Chen Zhang and Chendan Li and Marta Molinas, 13 Jul 2025, AI-Based Impedance Encoding-Decoding Method for Online Impedance Network Construction of Wind Farms, https://arxiv.org/abs/2507.14187
Donghoon Kim, Minji Bae, Kyuhong Shim, Byonghyo Shim, 21 Jul 2025, Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models, https://arxiv.org/abs/2505.08622
Taeyoun Kwon, Junhyuk Ahn, Taegeun Yun, Heeju Jwa, Yoonchae Choi, Siwon Park, Nam-Joon Kim, Jangchan Kim, Hyun Gon Ryu, and Hyuk-Jae Lee, 9 Aug 2025, Whisfusion: Parallel ASR Decoding via a Diffusion Transformer, https://arxiv.org/abs/2508.07048
Lilit Grigoryan, Vladimir Bataev, Nikolay Karpov, Andrei Andrusenko, Vitaly Lavrukhin, Boris Ginsburg, 10 Aug 2025, FlexCTC: GPU-powered CTC Beam Decoding with advanced Contextual Abilities, https://arxiv.org/abs/2508.07315
Hao Yang, Qinghua Zhao, Lei Li, 28 Jul 2025, How Chain-of-Thought Works? Tracing Information Flow from Decoding, Projection, and Activation, https://arxiv.org/abs/2507.20758
David Ye, Jan Williams, Mars Gao, Stefano Riva, Matteo Tomasetto, David Zoro, J. Nathan Kutz, 28 Jul 2025, PySHRED: A Python package for SHallow REcurrent Decoding for sparse sensing, model reduction and scientific discovery, https://arxiv.org/abs/2507.20954
Jinzhou Wu, Baoping Tang, Qikang Li, Yi Wang, Cheng Li, Shujian Yu, 28 Jul 2025, When Brain Foundation Model Meets Cauchy-Schwarz Divergence: A New Framework for Cross-Subject Motor Imagery Decoding, https://arxiv.org/abs/2507.21037
Max Peeperkorn, Tom Kouwenhoven, Dan Brown and Anna Jordanous, 28 Jul 2025, Mind the Gap: Conformative Decoding to Improve Output Diversity of Instruction-Tuned Large Language Models, https://arxiv.org/abs/2507.20956
Ningyuan Xi, Xiaoyu Wang, Yetao Wu, Teng Chen, Qingqing Gu, Yue Zhao, Jinxian Qu, Zhonglin Jiang, Yong Chen, Luo Ji, 26 Jul 2025, MeTHanol: Modularized Thinking Language Models with Intermediate Layer Thinking, Decoding and Bootstrapping Reasoning, https://arxiv.org/abs/2409.12059
Vishal Raman, Vijai Aravindh R, 29 Jul 2025, Evo-DKD: Dual-Knowledge Decoding for Autonomous Ontology Evolution in Large Language Models, https://arxiv.org/abs/2507.21438
Dian Chen, Yansong Qu, Xinyang Li, Ming Li, Shengchuan Zhang, 31 Jul 2025, XSpecMesh: Quality-Preserving Auto-Regressive Mesh Generation Acceleration via Multi-Head Speculative Decoding, https://arxiv.org/abs/2507.23777
Shukai Gong, Yiyang Fu, Fengyuan Ran, Quyu Kong, Feng Zhou, 31 Jul 2025, TPP-SD: Accelerating Transformer Point Process Sampling with Speculative Decoding, https://arxiv.org/abs/2507.09252
Songsheng Wang, Rucheng Yu, Zhihang Yuan, Chao Yu, Feng Gao, Yu Wang and Derek F. Wong, 30 Jul 2025, Spec-VLA: Speculative Decoding for Vision-Language-Action Models with Relaxed Acceptance, https://arxiv.org/abs/2507.22424
Woojae Jeong, Aditya Kommineni, Kleanthis Avramidis, Colin McDaniel, Donald Berry, Myzelle Hughes, Thomas McGee, Elsi Kaiser, Dani Byrd, Assal Habibi, B. Rael Cahn, Idan A. Blank, Kristina Lerman, Dimitrios Pantazis, Sudarsana R. Kadiri, Takfarinas Medani, Shrikanth Narayanan, and Richard M. Leahy, 30 Jul 2025, Decoding Neural Signatures of Semantic Evaluations in Depression and Suicidality, https://arxiv.org/abs/2507.22313
Ramchalam Kinattinkara Ramakrishnan, Zhaocong Yuan, Shaojie Zhuo, Chen Feng, Yicheng Lin, Chenzheng Su, Xiaopeng Zhang, 31 Jul 2025, OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding, https://arxiv.org/abs/2507.02659
Manh Nguyen, Sunil Gupta and Hung Le, 4 Aug 2025, CAAD: Context-Aware Adaptive Decoding for Truthful Text Generation, https://arxiv.org/abs/2508.02184
Yike Zhang and Zhiyuan He and Huiqiang Jiang and Chengruidong Zhang and Yuqing Yang and Jianyong Wang and Lili Qiu, 4 Aug 2025, LeanK: Learnable K Cache Channel Pruning for Efficient Decoding, https://arxiv.org/abs/2508.02215
Taehan Lee, Hyukjun Lee, 3 Aug 2025, Token Pruning in Audio Transformers: Optimizing Performance and Decoding Patch Importance, https://arxiv.org/abs/2504.01690
Bolian Li, Yifan Wang, Anamika Lochab, Ananth Grama, Ruqi Zhang, 3 Aug 2025, Cascade Reward Sampling for Efficient Decoding-Time Alignment, https://arxiv.org/abs/2406.16306
Fatih Gulec, Hamdan Awan, Nigel Wallbridge, Andrew W. Eckford, 5 Aug 2025, Decoding and Engineering the Phytobiome Communication for Smart Agriculture, https://arxiv.org/abs/2508.03584
Jilong Li, Zhenxi Song, Jiaqi Wang, Meishan Zhang, Honghai Liu, Min Zhang, Zhiguo Zhang, 5 Aug 2025, BrainECHO: Semantic Brain Signal Decoding through Vector-Quantized Spectrogram Reconstruction for Whisper-Enhanced Text Generation, https://arxiv.org/abs/2410.14971
Md Raisul Kibria, S\'ebastien Lafond, Janan Arslan, 6 Aug 2025, Decoding the Multimodal Maze: A Systematic Review on the Adoption of Explainability in Multimodal Attention-based Models, https://arxiv.org/abs/2508.04427
Enyu Zhou, Kai Sheng, Hao Chen, Xin He, 6 Aug 2025, CARD: Cache-Assisted Parallel Speculative Decoding for Efficient Large Language Model Inference, https://arxiv.org/abs/2508.04462
Shunqi Mao, Chaoyi Zhang, Weidong Cai, 6 Aug 2025, Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding, https://arxiv.org/abs/2503.10183
Kang Liu and Zhuoqi Ma and Zikang Fang and Yunan Li and Kun Xie and Qiguang Miao, 7 Aug 2025, PriorRG: Prior-Guided Contrastive Pre-training and Coarse-to-Fine Decoding for Chest X-ray Report Generation, https://arxiv.org/abs/2508.05353
Jungbin Cho, Junwan Kim, Jisoo Kim, Minseo Kim, Mingu Kang, Sungeun Hong, Tae-Hyun Oh, Youngjae Yu, 7 Aug 2025, DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding, https://arxiv.org/abs/2411.19527
Hossein Entezari Zarch, Lei Gao, Chaoyi Jiang, Murali Annavaram, 7 Aug 2025, DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding, https://arxiv.org/abs/2504.05598
Woojeong Kim, Junxiong Wang, Jing Nathan Yan, Mohamed Abdelfattah, Alexander M. Rush, 11 Aug 2025, OverFill: Two-Stage Models for Efficient Language Model Decoding, https://arxiv.org/abs/2508.08446
Ziqi Wang, Hailiang Zhao, Cheng Bao, Wenzhuo Qian, Yuhao Yang, Xueqiang Sun, Shuiguang Deng, 1 Aug 2025, XFMNet: Decoding Cross-Site and Nonstationary Water Patterns via Stepwise Multimodal Fusion for Long-Term Water Quality Forecasting, https://arxiv.org/abs/2508.08279
Lingzhe Zhang, Liancheng Fang, Chiming Duan, Minghua He, Leyi Pan, Pei Xiao, Shiyu Huang, Yunpeng Zhai, Xuming Hu, Philip S. Yu, Aiwei Liu, 12 Aug 2025, A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models, https://arxiv.org/abs/2508.08712
Xingyou Song, Dara Bahri, 12 Aug 2025, Decoding-based Regression, https://arxiv.org/abs/2501.19383
Qiaoqiao Ren, Remko Proesmans, Yuanbo Hou, Francis wyffels, and Tony Belpaeme, 12 Aug 2025, Touch and Tell: Multimodal Decoding of Human Emotions and Social Gestures for Robots, https://arxiv.org/abs/2412.03300
Changhong Jing, Yan Liu, Shuqiang Wang, Bruce X.B. Yu, Gong Chen, Zhejing Hu, Zhi Zhang, Yanyan Shen, 15 Aug 2025, PTSM: Physiology-aware and Task-invariant Spatio-temporal Modeling for Cross-Subject EEG Decoding, https://arxiv.org/abs/2508.11357
Oscar Ma\~nas, Pierluca D'Oro, Koustuv Sinha, Adriana Romero-Soriano, Michal Drozdzal, Aishwarya Agrawal, 15 Aug 2025, Controlling Multimodal LLMs via Reward-guided Decoding, https://arxiv.org/abs/2508.11616
Pengcheng Huang, Shuhao Liu, Zhenghao Liu, Yukun Yan, Shuo Wang, Zulong Chen, Tong Xiao, 18 Aug 2025, PC-Sampler: Position-Aware Calibration of Decoding Bias in Masked Diffusion Models, https://arxiv.org/abs/2508.13021
Jihoon Park, Seungeun Oh, and Seong-Lyun Kim, 18 Aug 2025, Energy-Efficient Wireless LLM Inference via Uncertainty and Importance-Aware Speculative Decoding, https://arxiv.org/abs/2508.12590
Yuanhao Li, Badong Chen, Wenjun Bai, Yasuharu Koike, Okito Yamashita, 5 Aug 2025, Robust Sparse Bayesian Learning Based on Minimum Error Entropy for Noisy High-Dimensional Brain Activity Decoding, https://arxiv.org/abs/2508.11657
Dylan Cope, Peter McBurney, 18 Aug 2025, Decoding Communications with Partial Information, https://arxiv.org/abs/2508.13326
Oriana Presacan, Alireza Nik, Vajira Thambawita, Bogdan Ionescu, Michael Riegler, 19 Aug 2025, A Comparative Study of Decoding Strategies in Medical Text Generation, https://arxiv.org/abs/2508.13580
Sanggeon Yun, Raheeb Hassan, Ryozo Masukawa, Mohsen Imani, 20 Aug 2025, MissionHD: Data-Driven Refinement of Reasoning Graph Structure through Hyperdimensional Causal Path Encoding and Decoding, https://arxiv.org/abs/2508.14746
Majid Daliri, Christopher Musco, Ananda Theertha Suresh, 20 Aug 2025, Coupling without Communication and Drafter-Invariant Speculative Decoding, https://arxiv.org/abs/2408.07978
Julian Oestreich and Lydia M\"uller, 21 Aug 2025, Evaluating Structured Decoding for Text-to-Table Generation: Evidence from Three Datasets, https://arxiv.org/abs/2508.15910
Yicheng Ji, Jun Zhang, Heming Xia, Jinpeng Chen, Lidan Shou, Gang Chen, Huan Li, 22 Aug 2025, SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning, https://arxiv.org/abs/2508.16201
Lingxiao Li, Salar Rahili, Yiwei Zhao, 20 Aug 2025, Correctness-Guaranteed Code Generation via Constrained Decoding, https://arxiv.org/abs/2508.15866
Jungyoub Cha, Hyunjong Kim, Sungzoon Cho, 22 Aug 2025, SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences, https://arxiv.org/abs/2505.20776
Xuekang Wang, Shengyu Zhu, Xueqi Cheng, 25 Aug 2025, Speculative Safety-Aware Decoding, https://arxiv.org/abs/2508.17739
Jaydip Sen, Subhasis Dasgupta, Hetvi Waghela, 21 Aug 2025, Confidence-Modulated Speculative Decoding for Large Language Models, https://arxiv.org/abs/2508.15371
Abdul Rehman Akbar, Usama Sajjad, Ziyu Su, Wencheng Li, Fei Xing, Jimmy Ruiz, Wei Chen, Muhammad Khalid Khan Niazi, 22 Aug 2025, CellEcoNet: Decoding the Cellular Language of Pathology with Deep Learning for Invasive Lung Adenocarcinoma Recurrence Prediction, https://arxiv.org/abs/2508.16742
Ziyin Zhang and Jiahao Xu and Tian Liang and Xingyu Chen and Zhiwei He and Rui Wang and Zhaopeng Tu, 24 Aug 2025, Draft Model Knows When to Stop: Self-Verification Speculative Decoding for Long-Form Generation, https://arxiv.org/abs/2411.18462
Itai Gat, Heli Ben-Hamu, Marton Havasi, Daniel Haziza, Jeremy Reizenstein, Gabriel Synnaeve, David Lopez-Paz, Brian Karrer, Yaron Lipman, 4 Sep 2025, Set Block Decoding is a Language Model Inference Accelerator, https://arxiv.org/abs/2509.04185
Iro Lim, Haein Ji, and Byungjun Kim, 4 Sep 2025, Decoding the Poetic Language of Emotion in Korean Modern Poetry: Insights from a Human-Labeled Dataset and AI Modeling, https://arxiv.org/abs/2509.03932
Shengyin Sun and Yiming Li and Xing Li and Yingzhao Lian and Weizhe Lin and Hui-Ling Zhen and Zhiyuan Yang and Chen Chen and Xianzhi Yu and Mingxuan Yuan and Chen Ma, 30 Aug 2025, Scaling Up, Speeding Up: A Benchmark of Speculative Decoding for Efficient LLM Test-Time Scaling, https://arxiv.org/abs/2509.04474
Bruno Aristimunha, Dung Truong, Pierre Guetschel, Seyed Yahya Shirazi, Isabelle Guyon, Alexandre R. Franco, Michael P. Milham, Aviv Dotan, Scott Makeig, Alexandre Gramfort, Jean-Remi King, Marie-Constance Corsi, Pedro A. Vald\'es-Sosa, Amit Majumdar, Alan Evans, Terrence J Sejnowski, Oren Shriki, Sylvain Chevallier, Arnaud Delorme, 5 Sep 2025, EEG Foundation Challenge: From Cross-Task to Cross-Subject EEG Decoding, https://arxiv.org/abs/2506.19141
Dylan Cutler, Arun Kandoor, Nishanth Dikkala, Nikunj Saunshi, Xin Wang, Rina Panigrahy, 26 Aug 2025, StagFormer: Time Staggering Transformer Decoding for RunningLayers In Parallel, https://arxiv.org/abs/2501.15665
Sining Zhoubian, Dan Zhang, Yuxiao Dong, Jie Tang, 27 Aug 2025, ReST-RL: Achieving Accurate Code Reasoning of LLMs with Optimized Self-Training and Decoding, https://arxiv.org/abs/2508.19576
Afrar Jahin, Yi Pan, Yingfeng Wang, Tianming Liu, Wei Zhang, 26 Aug 2025, Quantum-Classical Hybrid Molecular Autoencoder for Advancing Classical Decoding, https://arxiv.org/abs/2508.19394
Yang Sun, Lixin Zou, Dan Luo, Zhiyong Xie, Long Zhang, Liming Dong, Yunwei Zhao, Xixun Lin, Yanxiong Lu, Chenliang Li, 27 Aug 2025, LFD: Layer Fused Decoding to Exploit External Knowledge in Retrieval-Augmented Generation, https://arxiv.org/abs/2508.19614
Pengxiang Li, Yefan Zhou, Dilxat Muhtar, Lu Yin, Shilin Yan, Li Shen, Yi Liang, Soroush Vosoughi, Shiwei Liu, 27 Aug 2025, Diffusion Language Models Know the Answer Before Decoding, https://arxiv.org/abs/2508.19982
Zhixuan Liang, Yizhuo Li, Tianshuo Yang, Chengyue Wu, Sitong Mao, Liuao Pei, Xiaokang Yang, Jiangmiao Pang, Yao Mu, Ping Luo, 27 Aug 2025, Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies, https://arxiv.org/abs/2508.20072
Seongwan Park, Taeklim Kim, Youngjoong Ko, 27 Aug 2025, Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval, https://arxiv.org/abs/2506.00041
Zhuoran Yu and Yong Jae Lee, 27 Aug 2025, How Multimodal LLMs Solve Image Tasks: A Lens on Visual Grounding, Task Reasoning, and Answer Decoding, https://arxiv.org/abs/2508.20279
Weizhi Gao, Xiaorui Liu, Feiyi Wang, Dan Lu, Junqi Yin, 28 Aug 2025, Decoding Memories: An Efficient Pipeline for Self-Consistency Hallucination Detection, https://arxiv.org/abs/2508.21228
Haofei Yin, Mengbai Xiao, Tinghong Li, Xiao Zhang, Dongxiao Yu, Guanghui Zhang, 29 Aug 2025, SpecPipe: Accelerating Pipeline Parallelism-based LLM Inference with Speculative Decoding, https://arxiv.org/abs/2504.04104
Mingyu Yang, Jae-Young Choi, Kihyo Moon, Minsung Jang, and Eunjoo Joen, 1 Sep 2025, DSDE: Dynamic Speculative Decoding with KLD Stability for Real-World Serving, https://arxiv.org/abs/2509.01083
Xiaoqiang Lin, Aritra Ghosh, Bryan Kian Hsiang Low, Anshumali Shrivastava, Vijai Mohan, 1 Sep 2025, REFRAG: Rethinking RAG based Decoding, https://arxiv.org/abs/2509.01092
Kyeongman Park, Nakyeong Yang, Kyomin Jung, 2 Sep 2025, Avoidance Decoding for Diverse Multi-Branch Story Generation, https://arxiv.org/abs/2509.02170
Erfan Baghaei Potraghloo, Seyedarmin Azizi, Souvik Kundu, Massoud Pedram, 2 Sep 2025, Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation, https://arxiv.org/abs/2509.02510
Parv Kapoor, Akila Ganlath, Changliu Liu, Sebastian Scherer, Eunsuk Kang, 1 Sep 2025, Constrained Decoding for Robotics Foundation Models, https://arxiv.org/abs/2509.01728
Minxu Liu, Donghai Guan, Chuhang Zheng, Chunwei Tian, Jie Wen, Qi Zhu, 2 Sep 2025, ViEEG: Hierarchical Visual Neural Representation for EEG Brain Decoding, https://arxiv.org/abs/2505.12408
GodsGift Uzor, Tania-Amanda Nkoyo Fredrick Eneye, Chukwuebuka Ijezue, 5 Sep 2025, Advanced Brain Tumor Segmentation Using EMCAD: Efficient Multi-scale Convolutional Attention Decoding, https://arxiv.org/abs/2509.05431
Ishaan Verma, 6 Sep 2025, Decoding Latent Attack Surfaces in LLMs: Prompt Injection via HTML in Web Summarization, https://arxiv.org/abs/2509.05831
Jipeng Li, Zeyu Gao, Yubin Qi, Hande Dong, Weijian Chen, Qiang Lin, 9 Sep 2025, Unleashing the True Potential of LLMs: A Feedback-Triggered Self-Correction with Long-Term Multipath Decoding, https://arxiv.org/abs/2509.07676
Xiaomeng Hu, Fei Huang, Chenhan Yuan, Junyang Lin, Tsung-Yi Ho, 1 Sep 2025, CARE: Decoding Time Safety Alignment via Rollback and Introspection Intervention, https://arxiv.org/abs/2509.06982
Tom Kempton and Stuart Burrell, 9 Sep 2025, Local Normalization Distortion and the Thermodynamic Formalism of Decoding Strategies for Large Language Models, https://arxiv.org/abs/2503.21929
Hoshitaro Ohnishi and Hideo Mukai, 12 Sep 2025, A Symmetry-Integrated Approach to Surface Code Decoding, https://arxiv.org/abs/2509.10164
Xing Gao, Zherui Huang, Weiyao Lin, Xiao Sun, 11 Sep 2025, ProgD: Progressive Multi-scale Decoding with Dynamic Graphs for Joint Multi-agent Motion Forecasting, https://arxiv.org/abs/2509.09210
Weibin Feng, Ran Tao, John Cartlidge, Jin Zheng, 18 Sep 2025, VMDNet: Time Series Forecasting with Leakage-Free Samplewise Variational Mode Decomposition and Multibranch Decoding, https://arxiv.org/abs/2509.15394
Senkang Hu, Xudong Han, Jinqi Jiang, Yihang Tao, Zihan Fang, Sam Tak Wu Kwong, Yuguang Fang, 19 Sep 2025, Distribution-Aligned Decoding for Efficient LLM Task Adaptation, https://arxiv.org/abs/2509.15888
Wei Zhong, Manasa Bharadwaj, Yixiao Wang, Nikhil Verma, Yipeng Ji, Chul Lee, 19 Sep 2025, Cross-Attention Speculative Decoding, https://arxiv.org/abs/2505.24544
Sudeshna Jana, Manjira Sinha and Tirthankar Dasgupta, 14 Sep 2025, Decoding Plastic Toxicity: An Intelligent Framework for Conflict-Aware Relational Metapath Extraction from Scientific Abstracts, https://arxiv.org/abs/2509.11330
Shanmuka Sadhu, Arca Baran, Preeti Pandey, and Ayush Kumar, 15 Sep 2025, Task Decoding based on Eye Movements using Synthetic Data Augmentation, https://arxiv.org/abs/2509.11547
Cheng-Yang Tsai, Tzu-Wei Huang, Shao-Yu Wei, Guan-Wei Chen, Hung-Ying Chu, Yu-Cheng Lin, 14 Sep 2025, Decoding Musical Origins: Distinguishing Human and AI Composers, https://arxiv.org/abs/2509.11369
Wei-Hsin Yeh, Yu-An Su, Chih-Ning Chen, Yi-Hsueh Lin, Calvin Ku, Wen-Hsin Chiu, Min-Chun Hu, Lun-Wei Ku, 15 Sep 2025, CoachMe: Decoding Sport Elements with a Reference-Based Coaching Instruction Generation Model, https://arxiv.org/abs/2509.11698
Yudong Shen, Wenyu Wu, Jiali Mao, Yixiao Tong, Guoping Liu, Chaoya Wang, 15 Sep 2025, Bridging the Gap Between Sparsity and Redundancy: A Dual-Decoding Framework with Global Context for Map Inference, https://arxiv.org/abs/2509.11731
Haiduo Huang, Fuwei Yang, Zhenhua Liu, Xuanwu Yin, Dong Li, Pengju Ren, Emad Barsoum, 15 Sep 2025, SpecVLM: Fast Speculative Decoding in Vision-Language Models, https://arxiv.org/abs/2509.11815
Hongxiang Zhang, Hao Chen, Muhao Chen, Tianyi Zhang, 15 Sep 2025, Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation, https://arxiv.org/abs/2505.23657
Yurui Chang, Bochuan Cao, Lu Lin, 13 Sep 2025, Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation, https://arxiv.org/abs/2503.03106
Yeongbin Seo and Dongha Lee and Jaehyung Kim and Jinyoung Yeo, 18 Sep 2025, Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning, https://arxiv.org/abs/2509.15188
Matan Avitan, Moran Baruch, Nir Drucker, Itamar Zimerman, Yoav Goldberg, 10 Sep 2025, Efficient Decoding Methods for Language Models on Encrypted Data, https://arxiv.org/abs/2509.08383