Aussie AI
Decoding Algorithms
-
Last Updated 1 January, 2026
-
by David Spuler, Ph.D.
What are Decoding Algorithms?
The decoding algorithm in Transformer AI engines is the method whereby the decoder emits tokens for the output message. At the end of each decoder sequence, the output is a list of "logits" with probabilities for the predictions of the next best token. The algorithm by which the decoder decides to output one token, or multiple tokens, and which ones, is called the decoding algorithm.
Logits vs Activations
Each decoding step is aimed at producing a single token (i.e., the next word to output). The output of a decoding phase for one token is actually a two-step process:
- The "activation vector" or "activations" are computed (numbers representing "embeddings"), and then
- The "logits" are computed from the activation vector (called "unembedding").
These two vectors are not usually the same size:
- Activations vector — size is the "hidden" model dimension (e.g., 4096)
- Logits vector — size is the model vocabulary size (e.g., 50,000 unique tokens).
Logits are very closely related to tokens, and there is one logit value per token. Each logit value represent the LLM's prediction of how likely it would be to output this token as the next one. We could simply take the logit with the highest probability, which is the most likely token according to the LLM, and output that token. This is called "greedy decoding."
Note that most of the LLM's processing is not using logits, but uses activation vectors. The logits only appear only at the very end of a decoding phase. In all the interim steps, which are usually multiple layers of computations inside the model, we use an "embedding space" representation called an activation vector. We don't actually work on "tokens" or their probabilities, but we work on the probabilities of what I call the "signals" in the embedding space, as stored in the activation vector.
The activations are a vector of numbers representing the likelihood of each "signal" in the embeddings For example, a signal might be something like "noun" or "adjective" signals, but there are literally thousands of them, and not everyone understands what every value in an embedding actually represents in reality. But the LLM sure knows!
This vector of numbers is the "activation vector" and is usually shortened to "activations." These values represent the extent to which each signal has been "activated" in each neuron. Each layer of computation modifies the activations, and we get the final activations after the final model layer.
At this very final phase, we need to take these "activations," which are based on the model internal dimension (e.g., 4096) that represents how many signals it's tracking. We need to convert that to logit probabilities, one per token, and there are perhaps 50,000 tokens (depends on the "vocabulary size" but 50,000 or 100,000 is common). Hence, we need to convert a 4096-length vector of numbers ("activations") into a 50,000-length vector of numbers ("logits").
The "unembedding matrix" is what we use. Multiplying the activation vector by this matrix, which is large and rectangular (e.g., 4096x50,000), is how this is done, This converts the 4096-vector into a 50,000-vector. The embedding matrix is large, and expensive to use, which is why we only do this once per decoding phase, rather than once per layer.
Anyway, the computation of activations is not the decoding algorithm. Nor is the multiplication by the unembedding matrix to get the logits vector. Rather, the decoding algorithm is the final phase, which operates on the logits vector of probabilities for each of the 50,000 tokens, and thereby chooses the next token to output.
Types of Decoding Algorithms
There are several possible decoding algorithms for the basic situation of choosing one token to output from a vector of probabilities for each token:
- Greedy decoding — always choose the highest-probability token.
- Top-k sampling (random sampling) — choose from k most likely tokens.
- Top-p sampling (nucleus sampling) — a finesse on the top-k decoding algorithm.
- Beam search decoding — a more complex "tree" search of multiple token sequences.
- Edit decoding — using the input context to help decode the output (e.g., grammar checking).
The above are all variations on a theme: take a vector of token probabilities as the input, and analyze these probabilities to choose exactly one of the tokens as the output.
At a higher-level, there are more advanced options, and the main classes of decoding algorithms are:
- Autoregressive decoding
- Non-Autoregressive (NAR) decoding
- Parallel decoding
- Multi-token output
Other issues for decoding algorithms include:
- Prefill phase (runs before decoding)
- Temperature (scaling hyper-parameter that affects decoding)
Parallel Decoding Algorithms
There are several types of parallel optimizations for decoding:
- Speculative decoding
- Generalized speculative decoding
- Lookahead decoding
- Lookup decoding (including "prompt lookup decoding" and "retrieval lookup decoding")
- Parallel decoding (generally)
Multi-model decoding algorithms have also been examined:
- Supervised decoding (see big-little architectures)
- Ensemble decoding (see ensemble architectures).
- Collaborative decoding
- Consensus decoding
Hybrid Decoding Optimizations
The decoding algorithm may also be combined with other optimizations that improve the decoding process, such as:
- Non-autoregressive decoding
- Token pruning
- Prompt compression (input compression)
Beam Search Decoding
Beam search decoding is an advanced type of decoding that works on a tree of potential output sequences. This is a complex search space that keeps multiple candidate token sequences in reserve, until it chooses the best one. Beam search can look ahead a few tokens, and then backtrack to choose a different final output.
Research papers on beam search:
- Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, Zhihao Jia, 2024. SpecInfer: Accelerating Large Language Model Serving with Tree-based Speculative Inference and Verification, ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, April 2024, Pages 932–949, https://doi.org/10.1145/3620666.3651335 https://dl.acm.org/doi/abs/10.1145/3620666.3651335 Code: https://github.com/flexflow/FlexFlow/
- Jared Lichtarge, Christopher Alberti, Shankar Kumar, Noam Shazeer, and Niki Parmar. 2018. Weakly supervised grammatical error correction using iterative decoding. CoRR, abs/1811.01710. https://arxiv.org/abs/1811.01710 (Beam search decoding with a high threshold to emit corrections.)
- Jindrich Libovicky, Jindrich Helcl, Marek Tlusty, Ondrej Bojar, and Pavel Pecina. 2016. CUNI system for WMT16 automatic post-editing and multimodal translation tasks. In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, pages 646–654, Berlin, Germany. https://arxiv.org/abs/1606.07481 (Post-editing of machine translation.)
- Daniel Dahlmeier, Hwee Tou Ng, 2012, A Beam-Search Decoder for Grammatical Error Correction, Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 568–578, Jeju Island, Korea, 12–14 July 2012, https://aclanthology.org/D12-1052.pdf
- Xiaoming (Jason) Cui, Ashraf Bhuiyan, 2023, Optimizing Transformer Model Inference on Intel® Processors, https://www.intel.com/content/www/us/en/developer/articles/technical/optimize-transformer-model-inference-processors.html
- Ashwin K. Vijayakumar, Michael Cogswell, Ramprasaath R. Selvaraju, Qing Sun, Stefan Lee, David J. Crandall, and Dhruv Batra. 2018. Diverse beam search for improved description of complex scenes. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 7371–7379. AAAI Press. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17329
- Xiaohui Wang, Ying Xiong, Yang Wei, Mingxuan Wang, Lei Li Apr 2021, LightSeq: A High Performance Inference Library for Transformers, https://arxiv.org/pdf/2010.13887.pdf
- Chufan Shi, Haoran Yang, Deng Cai, Zhisong Zhang, Yifan Wang, Yujiu Yang, Wai Lam, 10 Feb 2024, A Thorough Examination of Decoding Methods in the Era of LLMs, https://arxiv.org/abs/2402.06925 (Evaluates a number of decoding algorithms with several 7B models including Llama2-7B, and also with 4-bit and 8-bit quantization.)
- GC Garbacea, 2023, Neural Language Generation for Content Adaptation: Explainable, Efficient Low-Resource Text Simplification and Evaluation, Ph.D. thesis, Computer Science and Engineering, University of Michigan, https://deepblue.lib.umich.edu/bitstream/handle/2027.42/178028/garbacea_1.pdf?sequence=1 (Broad thesis with sections on beam search decoding optimizations and AI safety issues such as bias.)
- Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. Guided open vocabulary image captioning with constrained beam search, 2017, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 936–945, https://arxiv.org/abs/1612.00576
- Chris Hokamp and Qun Liu, 2017, Lexically constrained decoding for sequence generation using grid beam search. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1535–1546, https://arxiv.org/abs/1704.07138
- Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica, Oct 2023, Efficient Memory Management for Large Language Model Serving with PagedAttention, SOSP ’23, October 23–26, 2023, Koblenz, Germany, https://dl.acm.org/doi/pdf/10.1145/3600006.3613165 (The original Paged Attention and vLLM paper, focusing on optimizing memory size of the KV cache using methods similar to operating-system memory paging.)
- Zhaorun Chen, Zhuokai Zhao, Hongyin Luo, Huaxiu Yao, Bo Li, Jiawei Zhou, July 2024, HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:7824-7846, 2024, https://proceedings.mlr.press/v235/chen24bi.html PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24bi/chen24bi.pdf https://github.com/BillChan226/HALC
- Tinghui Zhu, Kai Zhang, Jian Xie, Yu Su, 4 Feb 2024 (v2), Deductive Beam Search: Decoding Deducible Rationale for Chain-of-Thought Reasoning, https://arxiv.org/abs/2401.17686
- Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Dragomir Radev, Yejin Choi, and Noah A. Smith. 2024. A Call for Clarity in Beam Search: How It Works and When It Stops. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 77–90, Torino, Italia. ELRA and ICCL. https://aclanthology.org/2024.lrec-main.7/ https://aclanthology.org/2024.lrec-main.7.pdf
- Zongyue Qin, Zifan He, Neha Prakriya, Jason Cong, Yizhou Sun, 25 Sep 2024, Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference, https://arxiv.org/abs/2409.16560
- Shixiaowei02, Oct 2024, TensorRT-LLM 0.13.0 Release Latest, https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.13.0
- Yejin Lee, Anna Sun, Basil Hosmer, Bilge Acun, Can Balioglu, Changhan Wang, Charles David Hernandez, Christian Puhrsch, Daniel Haziza, Driss Guessous, Francisco Massa, Jacob Kahn, Jeffrey Wan, Jeremy Reizenstein, Jiaqi Zhai, Joe Isaacson, Joel Schlosser, Juan Pino, Kaushik Ram Sadagopan, Leonid Shamis, Linjian Ma, Min-Jae Hwang, Mingda Chen, Mostafa Elhoushi, Pedro Rodriguez, Ram Pasunuru, Scott Yih, Sravya Popuri, Xing Liu, Carole-Jean Wu, 30 Sep 2024, Characterizing and Efficiently Accelerating Multimodal Generation Model Inference, https://arxiv.org/abs/2410.00215 (Analyzes the bottlenecks in inference, finding the usual problems of autoregression, but also more interesting issues such as that linear kernels can be expensive, and KV cache reordering is a bottleneck in beam search, and layer skipping is analyzed.)
- Xinyu Lin, Chaoqun Yang, Wenjie Wang, Yongqi Li, Cunxiao Du, Fuli Feng, See-Kiong Ng, Tat-Seng Chua, 8 Oct 2024 (v2), Efficient Inference for Large Language Model-based Generative Recommendation, https://arxiv.org/abs/2410.05165
- Rongxiang Wang and Felix Xiaozhu Lin. 2024. Turbocharge Speech Understanding with Pilot Inference. In Proceedings of the 30th Annual International Conference on Mobile Computing and Networking (ACM MobiCom '24). Association for Computing Machinery, New York, NY, USA, 1299–1313. https://doi.org/10.1145/3636534.3690694 https://dl.acm.org/doi/abs/10.1145/3636534.3690694 https://dl.acm.org/doi/pdf/10.1145/3636534.3690694 ("Pilot inference" is a specialized mix of caching, computation reuse, and backtracking in beam search for speech understanding, and is somewhat related to speculative decoding, and similar to continual inference for processing a stream.)
- NVIDIA, Dec 2024, Multi-Head, Multi-Query, and Group-Query Attention, https://nvidia.github.io/TensorRT-LLM/advanced/gpt-attention.html#kv-cache
- Xuezhi Wang, Denny Zhou, 23 May 2024 (v2), Chain-of-Thought Reasoning Without Prompting, https://arxiv.org/abs/2402.10200 ("CoT decoding" is examining the alternative paths in the decoding algorithm, which is somewhat similar to Chain-of-Thought reasoning.)
- Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
- Haoran Wang, Kai Shu, Jan 2025, MakeEveryTokenCount: ASystematic Survey on Decoding Methods for Foundation Model, https://www.researchgate.net/profile/Haoran-Wang-96/publication/387703971_Make_Every_Token_Count_A_Systematic_Survey_on_Decoding_Methods_for_Foundation_Models/links/67784c8ce74ca64e1f49eb15/Make-Every-Token-Count-A-Systematic-Survey-on-Decoding-Methods-for-Foundation-Models.pdf https://github.com/wang2226/Awesome-LLM-Decoding
- Edward Beeching, Lewis Tunstall, Sasha Rush Dec 16, 2024, Scaling Test Time Compute with Open Source Models, https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute
- Maciej Besta, Julia Barth, Eric Schreiber, Ales Kubicek, Afonso Catarino, Robert Gerstenberger, Piotr Nyczyk, Patrick Iff, Yueling Li, Sam Houliston, Tomasz Sternal, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Łukasz Flis, Hannes Eberhard, Hubert Niewiadomski, Torsten Hoefler, 23 Jan 2025 (v3), Reasoning Language Models: A Blueprint, https://arxiv.org/abs/2501.11223 (Survey and blueprint for how to build a Large Reasoning Model.)
- Wendi Cui, Jiaxin Zhang, Zhuohang Li, Hao Sun, Damien Lopez, Kamalika Das, Bradley A. Malin, Sricharan Kumar, 26 Feb 2025, Automatic Prompt Optimization via Heuristic Search: A Survey, https://arxiv.org/abs/2502.18746 (Survey of auto prompting, from basic LLM enhancements to some methods quite similar to RALM and TALM.)
- Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H.S. Torr, Salman Khan, Fahad Shahbaz Khan, 28 Feb 2025, LLM Post-Training: A Deep Dive into Reasoning Large Language Models, https://arxiv.org/abs/2502.21321 https://github.com/mbzuai-oryx/Awesome-LLM-Post-training
- Yangchao Wu, Zongyue Qin, Alex Wong, Stefano Soatto, 20 May 2025, STree: Speculative Tree Decoding for Hybrid State-Space Models, https://arxiv.org/abs/2505.14969
- Mikhail Andronov, Natalia Andronova, Michael Wand, J\"urgen Schmidhuber, Djork-Arn\'e Clevert, 2 Aug 2025, Fast and scalable retrosynthetic planning with a transformer neural network and speculative beam search, https://arxiv.org/abs/2508.01459
- Harold Silv\`ere Kiossou and Siegfried Nijssen and Pierre Schaus, 8 Aug 2025, A Generic Complete Anytime Beam Search for Optimal Decision Tree, https://arxiv.org/abs/2508.06064
- Jingjin Wang, Jiawei Han, 3 Oct 2025, PropRAG: Guiding Retrieval with Beam Search over Proposition Paths, https://arxiv.org/abs/2504.18070
Phrase Banning
Phrase banning is a feature extension for LLM decoding to disallow selected words or phrases, rather than a speed optimization. The idea is to block the LLM from outputting certain words or phrases, rather than post-processing LLM output to remove words. Words or phrases can be "banned" at the decoder level, forcing the LLM decoding phase to backtrack whenever it tries to emit a disallowed word or phrase. If a model has whole-word tokenization, then individual words can be banned at the current decoding step, by modifying simple decoding algorithms like greedy or top-k/top-p decoding. However, banning multi-word phrases or other multi-token sequences requires backtracking similar to beam search decoding. In fact, it makes sense to merge the phrase banning algorithm into beam search or other tree decoding methods. Banning phrases is usually efficient, because it has only a small token search cost to detect the phrases, and although backtracking is expensive, hopefully it is a relatively rare condition.
Research papers on phrase banning:
- Lost Ruins, Oct 11, 2024, koboldcpp-1.76, https://github.com/LostRuins/koboldcpp/releases/tag/v1.76 (Release includes "anti-slop" using "phrase banning" decoding algorithm.)
- Sam Paech, 2024, antislop-sampler, https://github.com/sam-paech/antislop-sampler?tab=readme-ov-file (Decoding algorithm for "phrase banning" with backtracking.)
- Bilgehan Sel, Dingcheng Li, Phillip Wallis, Vaishakh Keshava, Ming Jin, Siddhartha Reddy Jonnalagadda, 11 Mar 2025, Backtracking for Safety, https://arxiv.org/abs/2503.08919
Tree Decoding
Tree decoding is the use of alternative pathways in decoding, in the form of a hierarchical tree. This idea is a generalization of beam search decoding. One of the applications of tree decoding is in the attempt to mimic Chain-of-Thought reasoning in a single inference step using a tree of pathways in CoT decoding.
Research papers on tree decoding:
- Ziyu Wan, Xidong Feng, Muning Wen, Stephen Marcus Mcaleer, Ying Wen, Weinan Zhang, Jun Wang, July 2024, AlphaZero-Like Tree-Search can Guide Large Language Model Decoding and Training, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:49890-49920, 2024, https://proceedings.mlr.press/v235/wan24c.html PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/wan24c/wan24c.pdf
- Xiangxiang Gao, Weisheng Xie, Yiwei Xiang, Feng Ji, 17 Dec 2024, Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree, https://arxiv.org/abs/2412.12639
- Xuezhi Wang, Denny Zhou, 23 May 2024 (v2), Chain-of-Thought Reasoning Without Prompting, https://arxiv.org/abs/2402.10200 ("CoT decoding" is examining the alternative paths in the decoding algorithm, which is somewhat similar to Chain-of-Thought reasoning.)
- Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
- Xidong Feng, Ziyu Wan, Muning Wen, Ying Wen, Weinan Zhang, and Jun Wang. 2023. Alphazero-like tree-search can guide large language model decoding and training. In NeurIPS 2023 Foundation Models for Decision Making Workshop. https://arxiv.org/abs/2309.17179
- Zongyue Qin, Zifan He, Neha Prakriya, Jason Cong, Yizhou Sun, 25 Sep 2024, Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference, https://arxiv.org/abs/2409.16560
- Penghui Yang, Cunxiao Du, Fengzhuo Zhang, Haonan Wang, Tianyu Pang, Chao Du, Bo An, 24 Feb 2025, LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification, https://arxiv.org/abs/2502.17421 https://github.com/sail-sg/LongSpec
- Yifu Ding, Wentao Jiang, Shunyu Liu, Yongcheng Jing, Jinyang Guo, Yingjie Wang, Jing Zhang, Zengmao Wang, Ziwei Liu, Bo Du, Xianglong Liu, Dacheng Tao, 27 Feb 2025 (v2), Dynamic Parallel Tree Search for Efficient LLM Reasoning, https://arxiv.org/abs/2502.16235
- Yangchao Wu, Zongyue Qin, Alex Wong, Stefano Soatto, 20 May 2025, STree: Speculative Tree Decoding for Hybrid State-Space Models, https://arxiv.org/abs/2505.14969
- Yuhao Shen, Junyi Shen, Quan Kong, Tianyu Liu, Yao Lu, Cong Wang, 16 May 2025, Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism, https://arxiv.org/abs/2506.01979
Contrastive Decoding
Contrastive decoding is a method whereby the probabilities of two or more outputs are "contrasted" to choose the best token to output. This can be done by examining prior layers during inference, or it can be done with multiple models.
- Chufan Shi, Haoran Yang, Deng Cai, Zhisong Zhang, Yifan Wang, Yujiu Yang, Wai Lam, 10 Feb 2024, A Thorough Examination of Decoding Methods in the Era of LLMs, https://arxiv.org/abs/2402.06925 (Evaluates a number of decoding algorithms with several 7B models including Llama2-7B, and also with 4-bit and 8-bit quantization.)
- Kaiyan Zhang, Jianyu Wang, Ning Ding, Biqing Qi, Ermo Hua, Xingtai Lv, Bowen Zhou, 18 Jun 2024, Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding, https://arxiv.org/abs/2406.12295 Code: https://github.com/TsinghuaC3I/FS-GEN
- Sean Welleck, Amanda Bertsch, Matthew Finlayson, Hailey Schoelkopf, Alex Xie, Graham Neubig, Ilia Kulikov, Zaid Harchaoui, 24 Jun 2024, From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models, https://arxiv.org/abs/2406.16838 (Survey and theoretical analysis of many different decoding algorithms, along with various ways to speed them up such as speculative decoding and KV caches.)
- Xiang Lisa Li, Ari Holtzman, Daniel Fried, Percy Liang, Jason Eisner, Tatsunori Hashimoto, Luke Zettlemoyer, Mike Lewis, 10 Jul 2023 (v2), Contrastive Decoding: Open-ended Text Generation as Optimization, https://arxiv.org/abs/2210.15097
- Hyunjong Ok, Jegwang Ryu, Jaeho Lee, 26 Jun 2024, Decoding with Limited Teacher Supervision Requires Understanding When to Trust the Teacher, https://arxiv.org/abs/2406.18002 (Examines the idea of not using the larger model to always verify, and when to trust either the smaller or larger models, which is an idea that generalized beyond speculative decoding.)
- Zexuan Qiu, Zijing Ou, Bin Wu, Jingjing Li, Aiwei Liu, Irwin King, 25 Jun 2024, Entropy-Based Decoding for Retrieval-Augmented Large Language Models, https://arxiv.org/abs/2406.17519 (Enhanced decoding algorithm for multi-document RAG processing.)
- Hongyi Yuan, Keming Lu, Fei Huang, Zheng Yuan, Chang Zhou, 13 Mar 2024 (v2), Speculative Contrastive Decoding, https://arxiv.org/abs/2311.08981
- Zhaorun Chen, Zhuokai Zhao, Hongyin Luo, Huaxiu Yao, Bo Li, Jiawei Zhou, July 2024, HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:7824-7846, 2024, https://proceedings.mlr.press/v235/chen24bi.html PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24bi/chen24bi.pdf https://github.com/BillChan226/HALC
- F. Li, X. zhang and P. Zhang, 2024, Mitigating Hallucination Issues in Small-Parameter LLMs through Inter-Layer Contrastive Decoding, 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 2024, pp. 1-8, doi: 10.1109/IJCNN60899.2024.10650644, https://ieeexplore.ieee.org/abstract/document/10650644
- Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
- Haoran Wang, Kai Shu, Jan 2025, MakeEveryTokenCount: ASystematic Survey on Decoding Methods for Foundation Model, https://www.researchgate.net/profile/Haoran-Wang-96/publication/387703971_Make_Every_Token_Count_A_Systematic_Survey_on_Decoding_Methods_for_Foundation_Models/links/67784c8ce74ca64e1f49eb15/Make-Every-Token-Count-A-Systematic-Survey-on-Decoding-Methods-for-Foundation-Models.pdf https://github.com/wang2226/Awesome-LLM-Decoding
- Phuc Phan, Hieu Tran, Long Phan, 23 Aug 2024 (v2), Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation, https://arxiv.org/abs/2402.14874
- Nikhil Anand, Nov 14, 2024, Making LLMs more Truthful with DoLa: A Contrastive Decoding Approach (Part I), https://ai.gopubby.com/making-llms-more-truthful-with-dola-a-contrastive-decoding-approach-part-i-1c2f90c91996 (Decoding by examining probabilities across layers.)
- Hongxiang Zhang, Hao Chen, Muhao Chen, Tianyi Zhang, 2 Jun 2025 (v2), Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation, https://arxiv.org/abs/2505.23657
- Che-Yu Chou, Hung-Hsuan Chen, 14 Aug 2025, Contrastive ECOC: Learning Output Codes for Adversarial Defense, https://arxiv.org/abs/2508.10491
- Shan Shen, Shenglu Hua, Jiajun Zou, Jiawei Liu, Jianwang Zhai, Chuan Shi, Wenjian Yu, 14 Aug 2025, Transferable Parasitic Estimation via Graph Contrastive Learning and Label Rebalancing in AMS Circuits, https://arxiv.org/abs/2507.06535
- Lei Tian, Xiaomin Li, Liqian Ma, Hao Yin, Zirui Zheng, Hefei Huang, Taiqing Li, Huchuan Lu, Xu Jia, 14 Aug 2025, CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting, https://arxiv.org/abs/2505.20469
- Amr Mousa, Neil Karavis, Michele Caprio, Wei Pan and Richard Allmendinger, 14 Aug 2025, TAR: Teacher-Aligned Representations via Contrastive Learning for Quadrupedal Locomotion, https://arxiv.org/abs/2503.20839
- Weijia Yang, Tian Lan, Leyuan Liu, Wei Chen, Tianqing Zhu, Sheng Wen, Xiaosong Zhang, 19 Jul 2025, CASPER: Contrastive Approach for Smart Ponzi Scheme Detecter with More Negative Samples, https://arxiv.org/abs/2507.16840
- Xiaoqiang He, 21 Jul 2025, CLAMP: Contrastive Learning with Adaptive Multi-loss and Progressive Fusion for Multimodal Aspect-Based Sentiment Analysis, https://arxiv.org/abs/2507.16854
- Piotr Masztalski, Micha{\l} Romaniuk, Jakub \.Zak, Mateusz Matuszewski, Konrad Kowalczyk, 23 Jul 2025, Clustering-based hard negative sampling for supervised contrastive speaker verification, https://arxiv.org/abs/2507.17540
- Arsh Tangri, Nichols Crawford Taylor, Haojie Huang, Robert Platt, 22 Jul 2025, Equivariant Goal Conditioned Contrastive Reinforcement Learning, https://arxiv.org/abs/2507.16139
- Xiaoya Li, Xiaofei Sun, Albert Wang, Jiwei Li and Chris Shum, 22 Jul 2025, CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning, https://arxiv.org/abs/2507.14111
- Zhijie Wang, Zixin Xu, Zhiyuan Pan, 24 Jul 2025, GCC-Spam: Spam Detection via GAN, Contrastive Learning, and Character Similarity Networks, https://arxiv.org/abs/2507.14679
- Yajiao Dai, Jun Li, Zhen Mei, Yiyang Ni, Shi Jin, Zengxiang Li, Sheng Guo, Wei Xiang, 12 Jul 2025, Semi-Supervised Federated Learning via Dual Contrastive Learning and Soft Labeling for Intelligent Fault Diagnosis, https://arxiv.org/abs/2507.14181
- Xiaotong Luo, Shengda Zhuo, Min Chen, Lichun Li, Ruizhao Lu, Wenqi Fan, Shuqiang Huang and Yin Tang, 12 Jul 2025, From Bias to Behavior: Learning Bull-Bear Market Dynamics with Contrastive Modeling, https://arxiv.org/abs/2507.14182
- Yiming Xu, Zhen Peng, Bin Shi, Xu Hua, Bo Dong, Song Wang, Chen Chen, 19 Jul 2025, Revisiting Graph Contrastive Learning on Anomaly Detection: A Structural Imbalance Perspective, https://arxiv.org/abs/2507.14677
- Abdul-Kazeem Shamba, Kerstin Bach and Gavin Taylor, 20 Jul 2025, eMargin: Revisiting Contrastive Learning with Margin-Based Separation, https://arxiv.org/abs/2507.14828
- Jinzhi Wang, Bin Li, Qingke Peng, Haozhou Li, Zeyuan Zeng, Ruimeng Li, Kaixuan Yang, Jiangbo Zhang, Biyi Zhou, Yaoying Wang, 20 Jul 2025, LumiCRS: Asymmetric Contrastive Prototype Learning for Long-Tail Conversational Recommender Systems, https://arxiv.org/abs/2507.04722
- Sho Oshima, Yuji Okamoto, Taisei Tosaki, Ryosuke Kojima, Yasushi Okuno, 19 Jul 2025, Supervised Graph Contrastive Learning for Gene Regulatory Network, https://arxiv.org/abs/2505.17786
- Chaoqun Cui, Caiyan Jia, 10 Aug 2025, Propagation Tree Is Not Deep: Adaptive Graph Contrastive Learning Approach for Rumor Detection, https://arxiv.org/abs/2508.07201
- WonJun Moon, Hyun Seok Seong, Jae-Pil Heo, 11 Aug 2025, Selective Contrastive Learning for Weakly Supervised Affordance Grounding, https://arxiv.org/abs/2508.07877
- Mohammad Zia Ur Rehman, Anukriti Bhatnagar, Omkar Kabde, Shubhi Bansal, Nagendra Kumar, 7 Aug 2025, ImpliHateVid: A Benchmark Dataset and Two-stage Contrastive Learning Framework for Implicit Hate Speech Detection in Videos, https://arxiv.org/abs/2508.06570
- Mengting Pan, Fan Li, Xiaoyang Wang, Wenjie Zhang, Xuemin Lin, 10 Aug 2025, HiTeC: Hierarchical Contrastive Learning on Text-Attributed Hypergraph with Semantic-Aware Augmentation, https://arxiv.org/abs/2508.03104
- Binxiong Li, Yuefei Wang, Binyu Zhao, Heyang Gao, Benhan Yang, Quanzhou Luo, Xue Li, Xu Xiang, Yujie Liu, Huijie Tang, 28 Jul 2025, Attributed Graph Clustering with Multi-Scale Weight-Based Pairwise Coarsening and Contrastive Learning, https://arxiv.org/abs/2507.20505
- Chengkai Wang, Di Wu, Yunsheng Liao, Wenyao Zheng, Ziyi Zeng, Xurong Gao, Hemmings Wu, Zhoule Zhu, Jie Yang, Lihua Zhong, Weiwei Cheng, Yun-Hsuan Chen and Mohamad Sawan, 27 Jul 2025, NeuroCLIP: A Multimodal Contrastive Learning Method for rTMS-treated Methamphetamine Addiction Analysis, https://arxiv.org/abs/2507.20189
- Wenhao Ma, Yu-Cheng Chang, Jie Yang, Yu-Kai Wang, Chin-Teng Lin, 28 Jul 2025, Contrastive learning-based agent modeling for deep reinforcement learning, https://arxiv.org/abs/2401.00132
- Sanqing Qu, Tianpei Zou, Florian R\"ohrbein, Cewu Lu, Guang Chen, Dacheng Tao, Changjun Jiang, 26 Jul 2025, GLC++: Source-Free Universal Domain Adaptation through Global-Local Clustering and Contrastive Affinity Learning, https://arxiv.org/abs/2403.14410
- Maximillian Chen and Ruoxi Sun and Tomas Pfister and Sercan \"O. Ar{\i}k, 27 Jul 2025, Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training, https://arxiv.org/abs/2406.00222
- Yu Tai, Xinglong Wu, Hongwei Yang, Hui He, Duanjing Chen, Yuanming Shao and Weizhe Zhang, 28 Jul 2025, How to Bridge Spatial and Temporal Heterogeneity in Link Prediction? A Contrastive Method, https://arxiv.org/abs/2411.00612
- Kristin Qi, Jiali Cheng, Youxiang Zhu, Hadi Amiri, Xiaohui Liang, 28 Jul 2025, Unveil Multi-Picture Descriptions for Multilingual Mild Cognitive Impairment Detection via Contrastive Learning, https://arxiv.org/abs/2505.17067
- Fabrizio Lo Scudo, Alessio De Rango, Luca Furnari, Alfonso Senatore, Donato D'Ambrosio, Giuseppe Mendicino and Gianluigi Greco, 23 Jul 2025, Advancing Wildfire Risk Prediction via Morphology-Aware Curriculum Contrastive Learning, https://arxiv.org/abs/2507.21147
- Yaoyu Zhang and Chi-Guhn Lee, 28 Jul 2025, A Contrastive Diffusion-based Network (CDNet) for Time Series Classification, https://arxiv.org/abs/2507.21357
- David A Kelly and Hana Chockler, 31 Jul 2025, Causal Identification of Sufficient, Contrastive and Complete Feature Sets in Image Classification, https://arxiv.org/abs/2507.23497
- Binxiong Li, Xu Xiang, Xue Li, Quanzhou Lou, Binyu Zhao, Yujie Liu, Huijie Tang, Benhan Yang, 31 Jul 2025, GCL-GCN: Graphormer and Contrastive Learning Enhanced Attributed Graph Clustering Network, https://arxiv.org/abs/2507.19095
- Qile Liu, Weishan Ye, Lingli Zhang, Zhen Liang, 31 Jul 2025, EEG-SCMM: Soft Contrastive Masked Modeling for Cross-Corpus EEG-Based Emotion Recognition, https://arxiv.org/abs/2408.09186
- Ziwei Wang, Siyang Li, Xiaoqing Chen, and Dongrui Wu, 31 Jul 2025, MVCNet: Multi-View Contrastive Network for Motor Imagery Classification, https://arxiv.org/abs/2502.17482
- Gianluca Carloni, Biagio Brattoli, Seongho Keum, Jongchan Park, Taebum Lee, Chang Ho Ahn, Sergio Pereira, 29 Jul 2025, Pathology Foundation Models are Scanner Sensitive: Benchmark and Mitigation with Contrastive ScanGen Loss, https://arxiv.org/abs/2507.22092
- Sara Sarto, Nicholas Moratelli, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara, 29 Jul 2025, Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training, https://arxiv.org/abs/2410.07336
- Tuna Han Salih Meral, Enis Simsar, Federico Tombari, Pinar Yanardag, 29 Jul 2025, Contrastive Test-Time Composition of Multiple LoRA Models for Image Generation, https://arxiv.org/abs/2403.19776
- Zizhuo Zhang, Jianing Zhu, Xinmu Ge, Zihua Zhao, Zhanke Zhou, Xuan Li, Xiao Feng, Jiangchao Yao, Bo Han, 1 Aug 2025, Co-Reward: Self-supervised Reinforcement Learning for Large Language Model Reasoning via Contrastive Agreement, https://arxiv.org/abs/2508.00410
- Yiming Xu, Xu Hua, Zhen Peng, Bin Shi, Jiarun Chen, Xingbo Fu, Song Wang, Bo Dong, 1 Aug 2025, Text-Attributed Graph Anomaly Detection via Multi-Scale Cross- and Uni-Modal Contrastive Learning, https://arxiv.org/abs/2508.00513
- Shiyi Liu, Buwen Liang, Yuetong Fang, Zixuan Jiang and Renjing Xu, 1 Aug 2025, Hierarchical Multi-Label Contrastive Learning for Protein-Protein Interaction Prediction Across Organisms, https://arxiv.org/abs/2507.02724
- Amrit Rajeev, Udayaadithya Avadhanam, Harshula Tulapurkar, SaiBarath Sundar, 1 Aug 2025, Small sample-based adaptive text classification through iterative and contrastive description refinement, https://arxiv.org/abs/2508.00957
- Xiaoya Li, Xiaofei Sun, Albert Wang, Chris Shum and Jiwei Li, 4 Aug 2025, CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search, https://arxiv.org/abs/2508.02091
- Jing Lan, Hexiao Ding, Hongzhao Chen, Yufeng Jiang, Ng Nga Chun, Gerald W.Y. Cheng, Zongxi Li, Jing Cai, Liang-ting Lin, Jung Sun Yoo, 3 Aug 2025, Contrastive Multi-Task Learning with Solvent-Aware Augmentation for Drug Discovery, https://arxiv.org/abs/2508.01799
- Yujia Tong, Tian Zhang, Jingling Yuan, Yuze Wang, Chuang Hu, 3 Aug 2025, LetheViT: Selective Machine Unlearning for Vision Transformers via Attention-Guided Contrastive Learning, https://arxiv.org/abs/2508.01569
- Kosmas Pinitas and Konstantinos Makantasis and Georgios N. Yannakakis, 30 Jul 2025, Privileged Contrastive Pretraining for Multimodal Affect Modelling, https://arxiv.org/abs/2508.03729
- Hyungbin Kim, Incheol Baek, Yon Dohn Chung, 6 Aug 2025, Decoupled Contrastive Learning for Federated Learning, https://arxiv.org/abs/2508.04005
- Thang Duc Tran, Thai Hoang Le, 6 Aug 2025, WSS-CL: Weight Saliency Soft-Guided Contrastive Learning for Efficient Machine Unlearning Image Classification, https://arxiv.org/abs/2508.04308
- Rui Zuo, Simon Khan, Zifan Wang, Garrett Ethan Katz, Qinru Qiu, 6 Aug 2025, Why the Agent Made that Decision: Contrastive Explanation Learning for Reinforcement Learning, https://arxiv.org/abs/2411.16120
- Sahil Sethi, David Chen, Thomas Statchen, Michael C. Burkhart, Nipun Bhandari, Bashar Ramadan, Brett Beaulieu-Jones, 6 Aug 2025, ProtoECGNet: Case-Based Interpretable Deep Learning for Multi-Label ECG Classification with Contrastive Learning, https://arxiv.org/abs/2504.08713
- Tianchen Fang, Guiru Liu, 7 Aug 2025, RegionMed-CLIP: A Region-Aware Multimodal Contrastive Learning Pre-trained Model for Medical Image Understanding, https://arxiv.org/abs/2508.05244
- Kang Liu and Zhuoqi Ma and Zikang Fang and Yunan Li and Kun Xie and Qiguang Miao, 7 Aug 2025, PriorRG: Prior-Guided Contrastive Pre-training and Coarse-to-Fine Decoding for Chest X-ray Report Generation, https://arxiv.org/abs/2508.05353
- Wonjun Kang, Byeongkeun Ahn, Minjae Lee, Kevin Galim, Seunghyuk Oh, Hyung Il Koo, Nam Ik Cho, 7 Aug 2025, UNCAGE: Contrastive Attention Guidance for Masked Generative Transformers in Text-to-Image Generation, https://arxiv.org/abs/2508.05399
- Willian T. Lunardi, Abdulrahman Banabila, Dania Herzalla, and Martin Andreoni, 7 Aug 2025, Contrastive Representation Modeling for Anomaly Detection, https://arxiv.org/abs/2501.05130
- Shengzhu Yang, Jiawei Du, Shuai Lu, Weihang Zhang, Ningli Wang, Huiqi Li, 8 Aug 2025, CLIPin: A Non-contrastive Plug-in to CLIP for Multimodal Semantic Alignment, https://arxiv.org/abs/2508.06434
- Zihu Wang, Boxun Xu, Hejia Geng, Peng Li, 8 Aug 2025, Khan-GCL: Kolmogorov-Arnold Network Based Graph Contrastive Learning with Hard Negatives, https://arxiv.org/abs/2505.15103
- Huifa Li, Jie Fu, Xinlin Zhuang, Haolin Yang, Xinpeng Ling, Tong Cheng, Haochen xue, Imran Razzak, Zhili Chen, 7 Aug 2025, scAGC: Learning Adaptive Cell Graphs with Contrastive Guidance for Single-Cell Clustering, https://arxiv.org/abs/2508.09180
- Ziyu Liu, Azadeh Alavi, Minyi Li, Xiang Zhang, 13 Aug 2025, A Unified Contrastive-Generative Framework for Time Series Classification, https://arxiv.org/abs/2508.09451
- Han Yu, Huiyuan Yang, Akane Sano, 12 Aug 2025, LEAVES: Learning Views for Time-Series Biobehavioral Data in Contrastive Learning, https://arxiv.org/abs/2210.07340
- Minghui Sun, Matthew M. Engelhard, Benjamin A. Goldstein, 15 Aug 2025, Borrowing From the Future: Enhancing Early Risk Assessment through Contrastive Learning, https://arxiv.org/abs/2508.11210
- Bin Ma, Yifei Zhang, Yongjin Xian, Qi Li, Linna Zhou, Gongxun Miao, 15 Aug 2025, A Cross-Modal Rumor Detection Scheme via Contrastive Learning by Exploring Text and Image internal Correlations, https://arxiv.org/abs/2508.11141
- Haojie Zhang, Yixiong Liang, Hulin Kuang, Lihui Cen, Zhe Qu, Yigang Cen, Min Zeng, Shichao Kan, 8 Aug 2025, Contrastive Regularization over LoRA for Multimodal Biomedical Image Incremental Learning, https://arxiv.org/abs/2508.11673
- Reza Shirkavand, Shangqian Gao, Peiran Yu, Heng Huang, 17 Aug 2025, Cost-Aware Contrastive Routing for LLMs, https://arxiv.org/abs/2508.12491
- Alicja Ziarko, Michal Bortkiewicz, Michal Zawalski, Benjamin Eysenbach and Piotr Milos, 18 Aug 2025, Contrastive Representations for Temporal Reasoning, https://arxiv.org/abs/2508.13113
- Yihan Wang, Yiwei Lu, Guojun Zhang, Franziska Boenisch, Adam Dziedzic, Yaoliang Yu, Xiao-Shan Gao, 16 Aug 2025, MUC: Machine Unlearning for Contrastive Learning with Black-box Evaluation, https://arxiv.org/abs/2406.03603
- Kai Sun, Yushi Bai, Zhen Yang, Jiajie Zhang, Ji Qi, Lei Hou and Juanzi Li, 17 Aug 2025, Hard Negative Contrastive Learning for Fine-Grained Geometric Understanding in Large Multimodal Models, https://arxiv.org/abs/2505.20152
- Lingyu Si, Jingyao Wang, Wenwen Qiang, 19 Aug 2025, A Generalized Learning Framework for Self-Supervised Contrastive Learning, https://arxiv.org/abs/2508.13596
- Ruobing Jiang, Yacong Li, Haobing Liu, Yanwei Yu, 19 Aug 2025, Incorporating Attributes and Multi-Scale Structures for Heterogeneous Graph Contrastive Learning, https://arxiv.org/abs/2503.13911
- Tianxi Cai, Feiqing Huang, Ryumei Nakada, Linjun Zhang, Doudou Zhou, 19 Aug 2025, Contrastive Learning on Multimodal Analysis of Electronic Health Records, https://arxiv.org/abs/2403.14926
- Qian Zhanga, Ruilin Zhang, Jun Xiao, Yifan Liu and Zhe Wang, 12 Aug 2025, MCLPD:Multi-view Contrastive Learning for EEG-based PD Detection Across Datasets, https://arxiv.org/abs/2508.14073
- Chen-Hao Chang, Hui-Ju Hung, Chia-Hsun Lu, Chih-Ya Shen, 20 Aug 2025, Enhancing Contrastive Link Prediction With Edge Balancing Augmentation, https://arxiv.org/abs/2508.14808
- Guilhem Faur\'e (MULTISPEECH), Mostafa Sadeghi (MULTISPEECH), Sam Bigeard (MULTISPEECH), Slim Ouni (LORIA, MULTISPEECH), 20 Aug 2025, Towards Skeletal and Signer Noise Reduction in Sign Language Production via Quaternion-Based Pose Encoding and Contrastive Learning, https://arxiv.org/abs/2508.14574
- Yifan Zhang, Junhui Hou, 20 Aug 2025, Is Contrastive Distillation Enough for Learning Comprehensive 3D Representations?, https://arxiv.org/abs/2412.08973
- Yi Yuan, Joseph Van Duyn, Runze Yan, Zhuoyi Huang, Sulaiman Vesal, Sergey Plis, Xiao Hu, Gloria Hyunjung Kwak, Ran Xiao, Alex Fedorov, 21 Aug 2025, Learning ECG Representations via Poly-Window Contrastive Learning, https://arxiv.org/abs/2508.15225
- Junho Song, Jong-Hwan Jang, DongGyun Hong, Joon-myoung Kwon, and Yong-Yeon Jo, 21 Aug 2025, CREMA: A Contrastive Regularized Masked Autoencoder for Robust ECG Diagnostics across Clinical Domains, https://arxiv.org/abs/2407.07110
- Pouria Mortezaagha, Arya Rahgozar, 17 Aug 2025, An Auditable Pipeline for Fuzzy Full-Text Screening in Systematic Reviews: Integrating Contrastive Semantic Highlighting and LLM Judgment, https://arxiv.org/abs/2508.15822
- Wenqiao Zhu, Ji Liu, Rongjuncheng Zhang, Haipang Wu, Yulun Zhang, 21 Aug 2025, CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning, https://arxiv.org/abs/2508.15868
- Yulin Zhu, Xing Ai, Yevgeniy Vorobeychik, Kai Zhou, 22 Aug 2025, Robust Graph Contrastive Learning with Information Restoration, https://arxiv.org/abs/2307.12555
- Yushi Lin, Peng Yang, 23 Aug 2025, A Decoupled LOB Representation Framework for Multilevel Manipulation Detection with Supervised Contrastive Learning, https://arxiv.org/abs/2508.17086
- Muhammad Aqeel, Danijel Skocaj, Marco Cristani, Francesco Setti, 25 Aug 2025, A Contrastive Learning-Guided Confident Meta-learning for Zero Shot Anomaly Detection, https://arxiv.org/abs/2508.17827
- Bin Tan, Wangyao Ge, Yidi Wang, Xin Liu, Jeff Burtoft, Hao Fan, Hui Wang, 25 Aug 2025, PCR-CA: Parallel Codebook Representations with Contrastive Alignment for Multiple-Category App Recommendation, https://arxiv.org/abs/2508.18166
- Jiajun He, Naoki Sawada, Koichi Miyazaki, Tomoki Toda, 4 Sep 2025, PARCO: Phoneme-Augmented Robust Contextual ASR via Contrastive Entity Disambiguation, https://arxiv.org/abs/2509.04357
- Wenhui Cui, Christopher Sandino, Hadi Pouransari, Ran Liu, Juri Minxha, Ellen L. Zippi, Aman Verma, Anna Sedlackova, Behrooz Mahasseni, Erdrin Azemi, 4 Sep 2025, CPEP: Contrastive Pose-EMG Pre-training Enhances Gesture Generalization on EMG Signals, https://arxiv.org/abs/2509.04699
- Wuchao Liu, Han Peng, Wengen Li, Yichao Zhang, Jihong Guan and Shuigeng Zhou, 23 Aug 2025, scI2CL: Effectively Integrating Single-cell Multi-omics by Intra- and Inter-omics Contrastive Learning, https://arxiv.org/abs/2508.18304
- Jiangfeng Sun, Sihao He, Zhonghong Ou, Meina Song, 24 Aug 2025, Structures Meet Semantics: Multimodal Fusion via Graph Contrastive Learning, https://arxiv.org/abs/2508.18322
- Md. Rashid Shahriar Khan, Md. Abrar Hasan, Mohammod Tareq Aziz Justice, 25 Aug 2025, Context-Aware Zero-Shot Anomaly Detection in Surveillance Using Contrastive and Predictive Spatiotemporal Modeling, https://arxiv.org/abs/2508.18463
- Eichi Takaya and Ryusei Inamori, 26 Aug 2025, ModAn-MulSupCon: Modality-and Anatomy-Aware Multi-Label Supervised Contrastive Pretraining for Medical Imaging, https://arxiv.org/abs/2508.18613
- Yi-Ping Hsu, Po-Wei Wang, Chantat Eksombatchai, Jiajing Xu, 26 Aug 2025, Taming the One-Epoch Phenomenon in Online Recommendation System by Two-stage Contrastive ID Pre-training, https://arxiv.org/abs/2508.18700
- Junhua Liu and Yong Keat Tan and Bin Fu and Kwan Hui Lim, 26 Aug 2025, From Intents to Conversations: Generating Intent-Driven Dialogues with Contrastive Learning for Multi-Turn Classification, https://arxiv.org/abs/2411.14252
- Yifan Dou, Adam Khadre, Ruben C Petreaca, Golrokh Mirzaei, 26 Aug 2025, MS-ConTab: Multi-Scale Contrastive Learning of Mutation Signatures for Pan Cancer Representation and Stratification, https://arxiv.org/abs/2508.19424
- Jinyuan Feng, Chaopeng Wei, Tenghai Qiu, Tianyi Hu, Zhiqiang Pu, 28 Aug 2025, CoMoE: Contrastive Representation for Mixture-of-Experts in Parameter-Efficient Fine-tuning, https://arxiv.org/abs/2505.17553
- Xin Huang, Ruibin Li, Tong Jia, Wei Zheng, Ya Wang, 28 Aug 2025, Visual Perturbation and Adaptive Hard Negative Contrastive Learning for Compositional Reasoning in Vision-Language Models, https://arxiv.org/abs/2505.15576
- Amartya Banerjee, Somnath Kar, Anirban Pal, Debabrata Maiti, 31 Aug 2025, Valid Property-Enhanced Contrastive Learning for Targeted Optimization & Resampling for Novel Drug Design, https://arxiv.org/abs/2509.00684
- Smayan Khanna, Doruk Efe G\"okmen, Risi Kondor, Vincenzo Vitelli, 1 Sep 2025, Graph Contrastive Learning versus Untrained Baselines: The Role of Dataset Size, https://arxiv.org/abs/2509.01541
- Hiroshi Sasaki, 2 Sep 2025, Structure-aware Contrastive Learning for Diagram Understanding of Multimodal Models, https://arxiv.org/abs/2509.01959
- Juhyeon Lee, Wonduk Seo, Hyunjin An, Seunghyun Lee, Yi Bu, 2 Sep 2025, Better by Comparison: Retrieval-Augmented Contrastive Reasoning for Automatic Prompt Optimization, https://arxiv.org/abs/2509.02093
- Micha Livne, 30 Aug 2025, Contrastive MIM: A Contrastive Mutual Information Framework for Unified Generative and Discriminative Representation Learning, https://arxiv.org/abs/2411.10548
- Alexander Marusov, Aleksandr Yugay, Alexey Zaytsev, 2 Sep 2025, A theoretical framework for self-supervised contrastive learning for continuous dependent data, https://arxiv.org/abs/2506.09785
- Yanmei Hu and Yihang Wu and Bing Sun and Xue Yue and Biao Cai and Xiangtao Li and Yang Chen, 30 Aug 2025, Contrastive clustering based on regular equivalence for influential node identification in complex networks, https://arxiv.org/abs/2509.02609
- Yiru Jiao, Sander van Cranenburgh, Simeon Calvert, Hans van Lint, 3 Sep 2025, Structure-preserving contrastive learning for spatial time series, https://arxiv.org/abs/2502.06380
- Jack Wilkie, Hanan Hindy, Christos Tachtatzis, Robert Atkinson, 8 Sep 2025, Contrastive Self-Supervised Network Intrusion Detection using Augmented Negative Pairs, https://arxiv.org/abs/2509.06550
- Serge Lionel Nikiema, Jordan Samhi, Micheline B\'en\'edicte Moumoula, Alb\'erick Euraste Djir\'e, Abdoul Kader Kabor\'e, Jacques Klein and Tegawend\'e F. Bissyand\'e, 6 Sep 2025, Using Contrastive Learning to Improve Two-Way Reasoning in Large Language Models: The Obfuscation Task as a Case Study, https://arxiv.org/abs/2509.05553
- Yuyao Ge, Shenghua Liu, Yiwei Wang, Lingrui Mei, Baolong Bi, Xuanshan Zhou, Jiayu Yao, Jiafeng Guo, Xueqi Cheng, 8 Sep 2025, Focusing by Contrastive Attention: Enhancing VLMs' Visual Reasoning, https://arxiv.org/abs/2509.06461
- Mengxue Yang, Chun Yang, Jiaqi Zhu, Jiafan Li, Jingqi Zhang, Yuyang Li, Ying Li, 8 Sep 2025, SLiNT: Structure-aware Language Model with Injection and Contrastive Training for Knowledge Graph Completion, https://arxiv.org/abs/2509.06531
- Dipta Neogi, Nourash Azmine Chowdhury, Muhammad Rafsan Kabir, Mohammad Ashrafuzzaman Khan, 8 Sep 2025, Video-Based MPAA Rating Prediction: An Attention-Driven Hybrid Architecture Using Contrastive Learning, https://arxiv.org/abs/2509.06826
- Zahra Zamanzadeh Darban, Yiyuan Yang, Geoffrey I. Webb, Charu C. Aggarwal, Qingsong Wen, Shirui Pan, Mahsa Salehi, 7 Sep 2025, DACAD: Domain Adaptation Contrastive Learning for Anomaly Detection in Multivariate Time Series, https://arxiv.org/abs/2404.11269
- Moo Hyun Son, Juyoung Bae, Zelin Qiu, Jiale Peng, Kai Xin Li, Yifan Lin, Hao Chen, 9 Sep 2025, Multimodal Contrastive Pretraining of CBCT and IOS for Enhanced Tooth Segmentation, https://arxiv.org/abs/2509.07923
- Gul Rukh Khattak, Konstantinos Patlatzoglou, Joseph Barker, Libor Pastika, Boroumand Zeidaabadi, Ahmed El-Medany, Hesham Aggour, Yixiu Liang, Antonio H. Ribeiro, Jeffrey Annis, Antonio Luiz Pinho Ribeiro, Junbo Ge, Daniel B. Kramer, Jonathan W. Waks, Evan Brittain, Nicholas Peters, Fu Siong Ng, Arunashis Sau, 12 Sep 2025, Data distribution impacts the performance and generalisability of contrastive learning-based foundation models of electrocardiograms, https://arxiv.org/abs/2509.10369
- Shengqiang Fu, 12 Sep 2025, SI-FACT: Mitigating Knowledge Conflict via Self-Improving Faithfulness-Aware Contrastive Tuning, https://arxiv.org/abs/2509.10208
- Wenfang Wu, Tingting Yuan, Yupeng Li, Daling Wang, Xiaoming Fu, 12 Sep 2025, SignClip: Leveraging Mouthing Cues for Sign Language Translation by Multimodal Contrastive Fusion, https://arxiv.org/abs/2509.10266
- Christos Sgouropoulos, Christos Nikou, Stefanos Vlachos, Vasileios Theiou, Christos Foukanelis and Theodoros Giannakopoulos, 12 Sep 2025, Prototypical Contrastive Learning For Improved Few-Shot Audio Classification, https://arxiv.org/abs/2509.10074
- Zahraa Al Sahili, Ioannis Patras, Matthew Purver, 11 Sep 2025, Data Matters Most: Auditing Social Bias in Contrastive Vision Language Models, https://arxiv.org/abs/2501.13223
- Zahraa Al Sahili, Ioannis Patras, Matthew Purver, 11 Sep 2025, Breaking Language Barriers or Reinforcing Bias? A Study of Gender and Racial Disparities in Multilingual Contrastive Vision Language Models, https://arxiv.org/abs/2505.14160
- Jia Tang, Xinrui Wang and Songcan Chen, 18 Sep 2025, Global Pre-fixing, Local Adjusting: A Simple yet Effective Contrastive Strategy for Continual Learning, https://arxiv.org/abs/2509.15347
- Xinxin Meng, Jiangtao Guo, Yunxiang Zhang, Shun Huang, 19 Sep 2025, Contrastive Learning with Spectrum Information Augmentation in Abnormal Sound Detection, https://arxiv.org/abs/2509.15570
- Gwendal Le Vaillant and Yannick Molle, 16 Sep 2025, Contrastive timbre representations for musical instrument and synthesizer retrieval, https://arxiv.org/abs/2509.13285
- Artemis Panagopoulou, Le Xue, Honglu Zhou, silvio savarese, Ran Xu, Caiming Xiong, Chris Callison-Burch, Mark Yatskar, Juan Carlos Niebles, 15 Sep 2025, Contra4: Evaluating Contrastive Cross-Modal Reasoning in Audio, Video, Image, and 3D, https://arxiv.org/abs/2506.01275
- Wenhui Diao, Haichen Yu, Kaiyue Kang, Tong Ling, Di Liu, Yingchao Feng, Hanbo Bi, Libo Ren, Xuexue Li, Yongqiang Mao, Xian Sun, 16 Sep 2025, RingMo-Aerial: An Aerial Remote Sensing Foundation Model With Affine Transformation Contrastive Learning, https://arxiv.org/abs/2409.13366
- Carlos Celemin, Joseph Brennan, Pierluigi Vito Amadori, Tim Bradley, 15 Sep 2025, Learning Representations in Video Game Agents with Supervised Contrastive Imitation Learning, https://arxiv.org/abs/2509.11880
- Robin Narsingh Ranabhat, Longwei Wang, Amit Kumar Patel, KC santosh, 14 Sep 2025, Promoting Shape Bias in CNNs: Frequency-Based and Contrastive Regularization for Corruption Robustness, https://arxiv.org/abs/2509.11355
- Zihan Dong, Xin Zhou, Ryumei Nakada, Lexin Li and Linjun Zhang, 14 Sep 2025, Contrastive Network Representation Learning, https://arxiv.org/abs/2509.11316
- Jing Lan, Hexiao Ding, Hongzhao Chen, Yufeng Jiang, Nga-Chun Ng, Gwing Kei Yip, Gerald W.Y. Cheng, Yunlin Mao, Jing Cai, Liang-ting Lin, Jung Sun Yoo, 18 Sep 2025, Structure-Aware Contrastive Learning with Fine-Grained Binding Representations for Drug Discovery, https://arxiv.org/abs/2509.14788
- Neale Ratzlaff, Matthew Lyle Olson, Musashi Hinck, Estelle Aflalo, Shao-Yen Tseng, Vasudev Lal, Phillip Howard, 18 Sep 2025, Debias your Large Multi-Modal Model at Test-Time via Non-Contrastive Visual Attribute Steering, https://arxiv.org/abs/2411.12590
- Anna Van Elst, Debarghya Ghoshdastidar, 18 Sep 2025, Tight PAC-Bayesian Risk Certificates for Contrastive Learning, https://arxiv.org/abs/2412.03486
- Hao Fang, Jiawei Kong, Tianqu Zhuang, Yixiang Qiu, Kuofeng Gao, Bin Chen, Shu-Tao Xia, Yaowei Wang, Min Zhang, 10 Sep 2025, Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors, https://arxiv.org/abs/2505.15337
- Ranga Baminiwatte, Kazi Jewel Rana, Aaron J. Masino, 17 Sep 2025, PhenoGnet: A Graph-Based Contrastive Learning Framework for Disease Similarity Prediction, https://arxiv.org/abs/2509.14037
- Chenghao Huang, Xiaolu Chen, Yanru Zhang, and Hao Wang, 17 Sep 2025, FedCoSR: Personalized Federated Learning with Contrastive Shareable Representations for Label Heterogeneity in Non-IID Data, https://arxiv.org/abs/2404.17916
- Ziming Tang, Chengbin Hou, Tianyu Zhang, Bangxu Tian, Jinbao Wang, Hairong Lv, 2 Oct 2025, Enhancing Noise Robustness of Parkinson's Disease Telemonitoring via Contrastive Feature Augmentation, https://arxiv.org/abs/2510.01588
- Vladimir Krsmanovic, Matthias Cosler, Mohamed Ghanem, Bernd Finkbeiner, 2 Oct 2025, Learning Representations Through Contrastive Neural Model Checking, https://arxiv.org/abs/2510.01853
- Aida Tayebi, Ali Khodabandeh Yalabadi, Mehdi Yazdani-Jahromi, Ozlem Ozmen Garibay, 2 Oct 2025, FairContrast: Enhancing Fairness through Contrastive learning and Customized Augmenting Methods on Tabular Data, https://arxiv.org/abs/2510.02017
- Mertcan Cokbas, Ziteng Liu, Zeyi Tao, Chengkai Zhang, Elder Veliz, Qin Huang, Ellie Wen, Huayu Li, Qiang Jin, Murat Duman, Benjamin Au, Guy Lebanon, Sagar Chordia, 2 Oct 2025, C2AL: Cohort-Contrastive Auxiliary Learning for Large-scale Recommendation Systems, https://arxiv.org/abs/2510.02215
- Donghuo Zeng, 2 Oct 2025, Comparing Contrastive and Triplet Loss in Audio-Visual Embedding: Intra-Class Variance and Greediness Analysis, https://arxiv.org/abs/2510.02161
- Taeyoung Kim, Jimin Lee, Myungkyu Koo, Dongyoung Kim, Kyungmin Lee, Changyeon Kim, Younggyo Seo, Jinwoo Shin, 2 Oct 2025, Contrastive Representation Regularization for Vision-Language-Action Models, https://arxiv.org/abs/2510.01711
- Paul Felix Valsecchi Oliva, O. Deniz Akyildiz and Andrew Duncan, 2 Oct 2025, Uniform-in-time convergence bounds for Persistent Contrastive Divergence Algorithms, https://arxiv.org/abs/2510.01944
- Rita T. Sousa and Heiko Paulheim, 13 Oct 2025, Improving Knowledge Graph Embeddings through Contrastive Learning with Negative Statements, https://arxiv.org/abs/2510.11868
- Jean Ponce (ENS-PSL, NYU), Basile Terver (FAIR, WILLOW), Martial Hebert (CMU), Michael Arbel (Thoth), 14 Oct 2025, Dual Perspectives on Non-Contrastive Self-Supervised Learning, https://arxiv.org/abs/2507.01028
- Licong Lin, Song Mei, 13 Oct 2025, A Statistical Theory of Contrastive Learning via Approximate Sufficient Statistics, https://arxiv.org/abs/2503.17538
- Aopeng Wang, Ke Deng, Yongli Ren and Jun Luo, 1 Oct 2025, Rehearsal-free and Task-free Online Continual Learning With Contrastive Prompt, https://arxiv.org/abs/2510.00467
- Julius Ott, Nastassia Vysotskaya, Huawei Sun, Lorenzo Servadei, Robert Wille, 1 Oct 2025, Feature Identification for Hierarchical Contrastive Learning, https://arxiv.org/abs/2510.00837
- Zhen Yin, Shenghua Wang, 1 Oct 2025, Span-level Detection of AI-generated Scientific Text via Contrastive Learning and Structural Calibration, https://arxiv.org/abs/2510.00890
- Daniele Molino, Camillo Maria Caruso, Filippo Ruffini, Paolo Soda, Valerio Guarrasi, 30 Sep 2025, Text-to-CT Generation via 3D Latent Diffusion Model with Contrastive Vision-Language Pretraining, https://arxiv.org/abs/2506.00633
- Jae Hyoung Jeon, Cheolsu Lim and Myungjoo Kang, 1 Oct 2025, Divergence-Based Similarity Function for Multi-View Contrastive Learning, https://arxiv.org/abs/2507.06560
- Rami Zewail, 24 Sep 2025, Diffusion-Augmented Contrastive Learning: A Noise-Robust Encoder for Biosignal Representations, https://arxiv.org/abs/2509.20048
- Dayu Tan, Jing Chen, Xiaoping Zhou, Yansen Su and Chunhou Zheng, 24 Sep 2025, PGCLODA: Prompt-Guided Graph Contrastive Learning for Oligopeptide-Infectious Disease Association Prediction, https://arxiv.org/abs/2509.20290
- Yiqiao Chen, Zijian Huang, Zhenghui Feng, 10 Sep 2025, Advancing Few-Shot Pediatric Arrhythmia Classification with a Novel Contrastive Loss and Multimodal Learning, https://arxiv.org/abs/2509.19315
- Mingyu Lu, Ethan Weinberger, Chanwoo Kim, Su-In Lee, 23 Sep 2025, CellCLIP -- Learning Perturbation Effects in Cell Painting via Text-Guided Contrastive Learning, https://arxiv.org/abs/2506.06290
- Yuecheng Li, Lele Fu, Sheng Huang, Chuan Chen, Lei Yang, Zibin Zheng, 23 Sep 2025, CueGCL: Cluster-aware Personalized Self-Training for Unsupervised Graph Contrastive Learning, https://arxiv.org/abs/2311.11073
- Phuong Q. Dao, Mark Roantree, and Vuong M. Ngo, 20 Oct 2025, An Enhanced Dual Transformer Contrastive Network for Multimodal Sentiment Analysis, https://arxiv.org/abs/2510.23617
- Edward Fish, Richard Bowden, 28 Oct 2025, Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign Language Translation, https://arxiv.org/abs/2506.00129
- Nimrod Berman and Omkar Joglekar and Eitan Kosman and Dotan Di Castro and Omri Azencot, 23 Oct 2025, Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge, https://arxiv.org/abs/2510.20819
- Jianyang Gu, Samuel Stevens, Elizabeth G Campolongo, Matthew J Thompson, Net Zhang, Jiaman Wu, Andrei Kopanev, Zheda Mai, Alexander E. White, James Balhoff, Wasila Dahdul, Daniel Rubenstein, Hilmar Lapp, Tanya Berger-Wolf, Wei-Lun Chao, Yu Su, 23 Oct 2025, BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning, https://arxiv.org/abs/2505.23883
- Daohan Su, Yang Zhang, Xunkai Li, Rong-Hua Li, Guoren Wang, 18 Oct 2025, Toward General Digraph Contrastive Learning: A Dual Spatial Perspective, https://arxiv.org/abs/2510.16311
- Kathryn Wantlin, Chongyi Zheng, Benjamin Eysenbach, 20 Oct 2025, Consistent Zero-Shot Imitation with Contrastive Goal Inference, https://arxiv.org/abs/2510.17059
- Vera Pavlova and Mohammed Makhlouf, 19 Oct 2025, MOSAIC: Masked Objective with Selective Adaptation for In-domain Contrastive Learning, https://arxiv.org/abs/2510.16797
- Junhao Zhao, Zishuai Liu, Ruili Fang, Jin Lu, Linghan Zhang, Fei Dou, 19 Oct 2025, CARE: Contrastive Alignment for ADL Recognition from Event-Triggered Sensor Streams, https://arxiv.org/abs/2510.16988
- Sharana Dharshikgan Suresh Dass, Hrishav Bakul Barua, Ganesh Krishnasamy, Raveendran Paramesran, Raphael C.-W. Phan, 19 Oct 2025, Seeing in the Dark: A Teacher-Student Framework for Dark Video Action Recognition via Knowledge Distillation and Contrastive Learning, https://arxiv.org/abs/2502.03724
- Ziqiang Cui, Yunpeng Weng, Xing Tang, Xiaokun Zhang, Shiwei Li, Peiyang Liu, Bowei He, Dugang Liu, Weihong Luo, Xiuqiang He, Chen Ma, 20 Oct 2025, SRA-CL: Semantic Retrieval Augmented Contrastive Learning for Sequential Recommendation, https://arxiv.org/abs/2503.04162
- Sumit Mamtani, Yash Thesia, 18 Oct 2025, Fine-Grained Classification: Connecting Metadata via Cross-Contrastive Pre-Training, https://arxiv.org/abs/2504.20322
- Yasser H. Khalil, Mehdi Setayesh, Hongliang Li, 19 Sep 2025, CoUn: Empowering Machine Unlearning via Contrastive Learning, https://arxiv.org/abs/2509.16391
- Jingming Yan, Yiyuan Luo, Vaggos Chatziafratis, Ioannis Panageas, Parnian Shahkar, Stelios Stavroulakis, 21 Sep 2025, The Complexity of Finding Local Optima in Contrastive Learning, https://arxiv.org/abs/2509.16898
- Haofeng Huang, Yifei Han, Long Zhang, Bin Li, Yangfan He, 22 Sep 2025, MVCL-DAF++: Enhancing Multimodal Intent Recognition via Prototype-Aware Contrastive Alignment and Coarse-to-Fine Dynamic Attention Fusion, https://arxiv.org/abs/2509.17446
- Jiahe Qian, Yaoyu Fang, Ziqiao Weng, Xinkun Wang, Lee A. Cooper, Bo Zhou, 21 Sep 2025, Learning from Gene Names, Expression Values and Images: Contrastive Masked Text-Image Pretraining for Spatial Transcriptomics Representation Learning, https://arxiv.org/abs/2509.16892
- Qi'ao Xu, Pengfei Wang, Bo Zhong, Tianwen Qian, Xiaoling Wang, Ye Wang, Hong Yu, 22 Sep 2025, TS-P$^2$CL: Plug-and-Play Dual Contrastive Learning for Vision-Guided Medical Time Series Classification, https://arxiv.org/abs/2509.17802
- Shiguang Wu, Yaqing Wang, Yatao Bian, Quanming Yao, 20 Sep 2025, Learning to Learn with Contrastive Meta-Objective, https://arxiv.org/abs/2410.05975
- Xiaohao Liu, Xiaobo Xia, See-Kiong Ng, Tat-Seng Chua, 22 Sep 2025, Continual Multimodal Contrastive Learning, https://arxiv.org/abs/2503.14963
- YongKyung Oh, Alex Bui, 19 Sep 2025, Multi-View Contrastive Learning for Robust Domain Adaptation in Medical Time Series Analysis, https://arxiv.org/abs/2506.22393
- Yang Wang, Qibin Liang, Chenghao Xiao, Yizhi Li, Noura Al Moubayed, Chenghua Lin, 22 Sep 2025, Audio Contrastive-based Fine-tuning: Decoupling Representation Learning and Classification, https://arxiv.org/abs/2309.11895
- Yuwei Niu, Shuo He, Qi Wei, Zongyu Wu, Feng Liu, Lei Feng, 22 Sep 2025, Test-Time Multimodal Backdoor Detection by Contrastive Prompting, https://arxiv.org/abs/2405.15269
- Kaitong Cai, Jusheng Zhang, Yijia Fan, Jing Yang, Keze Wang, 26 Oct 2025, RaCoT: Plug-and-Play Contrastive Example Generation Mechanism for Enhanced LLM Reasoning Reliability, https://arxiv.org/abs/2510.22710
- Samuel Bright-Thonney, Christina Reissel, Gaia Grosso, Nathaniel Woodward, Katya Govorkova, Andrzej Novak, Sang Eon Park, Eric Moreno, Philip Harris, 24 Oct 2025, AutoSciDACT: Automated Scientific Discovery through Contrastive Embedding and Hypothesis Testing, https://arxiv.org/abs/2510.21935
- Zhixin Pan and Ziyu Shu and Amberbir Alemayoh, 24 Oct 2025, Towards Low-Latency and Adaptive Ransomware Detection Using Contrastive Learning, https://arxiv.org/abs/2510.21957
- Matthew So, Judah Goldfeder, Mark Lis, Hod Lipson, 27 Oct 2025, Bi-Encoder Contrastive Learning for Fingerprint and Iris Biometrics, https://arxiv.org/abs/2510.22937
- Ngoc N. Tran, Lam Tran, Hoang Phan, Anh Bui, Tung Pham, Toan Tran, Dinh Phung, Trung Le, 27 Oct 2025, Generalization Bounds for Robust Contrastive Learning: From Theory to Practice, https://arxiv.org/abs/2311.09671
- Alain Riou, Joan Serr\`a, Yuki Mitsufuji, 27 Oct 2025, Automatic Music Sample Identification with Multi-Track Contrastive Learning, https://arxiv.org/abs/2510.11507
- Chenlang Yi, Zizhan Xiong, Qi Qi, Xiyuan Wei, Girish Bathla, Ching-Long Lin, Bobak Jack Mortazavi, Tianbao Yang, 24 Oct 2025, AdFair-CLIP: Adversarial Fair Contrastive Language-Image Pre-training for Chest X-rays, https://arxiv.org/abs/2506.23467
- Fang Chen, Alex Villa, Gongbo Liang, Xiaoyi Lu, Meng Tang, 24 Oct 2025, Contrastive Conditional-Unconditional Alignment for Long-tailed Diffusion Model, https://arxiv.org/abs/2507.09052
- Haoyu Zhang, Yuxuan Cheng, Wenqi Fan, Yulong Chen, Yifan Zhang, 15 Oct 2025, Rethinking Graph Domain Adaptation: A Spectral Contrastive Perspective, https://arxiv.org/abs/2510.13254
- Yue Xing, Yingnan Deng, Heyao Liu, Ming Wang, Yun Zi, Xiaoxuan Sun, 15 Oct 2025, Contrastive Learning-Based Dependency Modeling for Anomaly Detection in Cloud Services, https://arxiv.org/abs/2510.13368
- Dominik J. M\"uhlematter, Lin Che, Ye Hong, Martin Raubal, Nina Wiedemann, 15 Oct 2025, UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations, https://arxiv.org/abs/2510.13774
- Xuanchen Wang, Heng Wang, Weidong Cai, 15 Oct 2025, MotionBeat: Motion-Aligned Music Representation via Embodied Contrastive Learning and Bar-Equivariant Contact-Aware Encoding, https://arxiv.org/abs/2510.13244
- Eun Woo Im, Muhammad Kashif Ali, Vivek Gupta, 15 Oct 2025, Self-Augmented Visual Contrastive Decoding, https://arxiv.org/abs/2510.13315
- Pierre Glaser and Kevin Han Huang and Arthur Gretton, 15 Oct 2025, Near-Optimality of Contrastive Divergence Algorithms, https://arxiv.org/abs/2510.13438
- Hongkuan Zhou, Lavdim Halilaj, Sebastian Monka, Stefan Schmid, Yuqicheng Zhu, Jingcheng Wu, Nadeem Nazer, Steffen Staab, 15 Oct 2025, Seeing and Knowing in the Wild: Open-domain Visual Entity Recognition with Large-scale Knowledge Graphs via Contrastive Learning, https://arxiv.org/abs/2510.13675
- Jinwei Hu, Zhenglin Huang, Xiangyu Yin, Wenjie Ruan, Guangliang Cheng, Yi Dong, Xiaowei Huang, 15 Oct 2025, FALCON: Fine-grained Activation Manipulation by Contrastive Orthogonal Unalignment for Large Language Model, https://arxiv.org/abs/2502.01472
- Micha Livne, 25 Sep 2025, Contrastive Mutual Information Learning: Toward Robust Representations without Positive-Pair Augmentations, https://arxiv.org/abs/2509.21511
- Hua Yuan, Ning Xu, Xin Geng, Yong Rui, 26 Sep 2025, Enriching Knowledge Distillation with Intra-Class Contrastive Learning, https://arxiv.org/abs/2509.22053
- Boyi Chen, Zhangyu Wang, Fabian Deuser, Johann Maximilian Zollner, Martin Werner, 25 Sep 2025, Enhancing Contrastive Learning for Geolocalization by Discovering Hard Negatives on Semivariograms, https://arxiv.org/abs/2509.21573
- Alberto Olivares-Alarcos, Sergi Foix, J\'ulia Borr\`as, Gerard Canal and Guillem Aleny\`a, 26 Sep 2025, Ontological foundations for contrastive explanatory narration of robot plans, https://arxiv.org/abs/2509.22493
- Yifan Zhang, Chen Huang, Yueke Zhang, Huajie Shao, Kevin Leach, Yu Huang, 26 Sep 2025, Pre-Training Representations of Binary Code Using Contrastive Learning, https://arxiv.org/abs/2210.05102
- Md Abrar Jahin, Md. Akmol Masud, M. F. Mridha, Nilanjan Dey, Zeyar Aung, 8 Oct 2025, Quantum Rationale-Aware Graph Contrastive Learning for Jet Discrimination, https://arxiv.org/abs/2411.01642
- Xinyi Gao, Yayong Li, Tong Chen, Guanhua Ye, Wentao Zhang, Hongzhi Yin, 8 Oct 2025, Contrastive Graph Condensation: Advancing Data Versatility through Self-Supervised Learning, https://arxiv.org/abs/2411.17063
- Lingjie Yi, Raphael Douady, Chao Chen, 7 Oct 2025, Decipher the Modality Gap in Multimodal Contrastive Learning: From Convergent Representations to Pairwise Alignment, https://arxiv.org/abs/2510.03268
- Tianxiang Zhao and Youqing Wang and Jinlu Wang and Jiapu Wang and Mingliang Cui and Junbin Gao and Jipeng Guo, 3 Oct 2025, Hybrid-Collaborative Augmentation and Contrastive Sample Adaptive-Differential Awareness for Robust Attributed Graph Clustering, https://arxiv.org/abs/2510.02731
- Wannan Yang, Xinchi Qiu, Lei Yu, Yuchen Zhang, Oliver Aobo Yang, Narine Kokhlikyan, Nicola Cancedda, Diego Garcia-Olano, 25 Sep 2025, Hallucination reduction with CASAL: Contrastive Activation Steering For Amortized Learning, https://arxiv.org/abs/2510.02324
- Hamed Fard, Tobias Schalau, Gerhard Wunder, 27 Sep 2025, An Investigation into the Performance of Non-Contrastive Self-Supervised Learning Methods for Network Intrusion Detection, https://arxiv.org/abs/2510.02349
- Jingyuan Deng, Yujiu Yang, 3 Oct 2025, MaskCD: Mitigating LVLM Hallucinations by Image Head Masked Contrastive Decoding, https://arxiv.org/abs/2510.02790
- Jingze Zhu, Yongliang Wu, Wenbo Zhu, Jiawang Cao, Yanqiang Zheng, Jiawei Chen, Xu Yang, Bernt Schiele, Jonas Fischer, Xinting Hu, 3 Oct 2025, LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers, https://arxiv.org/abs/2507.04404
- Yoshinari Fujinuma, 21 Oct 2025, Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge, https://arxiv.org/abs/2510.18196
- Kazusato Oko, Licong Lin, Yuhang Cai, Song Mei, 21 Oct 2025, A Statistical Theory of Contrastive Pre-training and Multimodal Generative AI, https://arxiv.org/abs/2501.04641
- D. Darankoum, C. Habermacher, J. Volle and S. Grudinin, 24 Sep 2025, CoSupFormer : A Contrastive Supervised learning approach for EEG signal Classification, https://arxiv.org/abs/2509.20489
- Weili Zeng, Yichao Yan, 25 Sep 2025, Flow Matching in the Low-Noise Regime: Pathologies and a Contrastive Remedy, https://arxiv.org/abs/2509.20952
- Ayan Sar, Pranav Singh Puri, Sumit Aich, Tanupriya Choudhury, Abhijit Kumar, 24 Sep 2025, SwasthLLM: a Unified Cross-Lingual, Multi-Task, and Meta-Learning Zero-Shot Framework for Medical Diagnosis Using Contrastive Representations, https://arxiv.org/abs/2509.20567
- Thanh Binh Le, Hoang Nhat Khang Vo, Tan-Ha Mai, Trong Nhan Phan, 25 Sep 2025, Revolutionizing Precise Low Back Pain Diagnosis via Contrastive Learning, https://arxiv.org/abs/2509.20813
- Jiehui Luo, Yuguo Yin, Yuxin Xie, Jinghan Ru, Xianwei Zhuang, Minghua He, Aofan Liu, Zihan Xiong, Dongchao Yang, 25 Sep 2025, SupCLAP: Controlling Optimization Trajectory Drift in Audio-Text Contrastive Learning with Support Vector Regularization, https://arxiv.org/abs/2509.21033
- Nero Z. Li, Xuehao Zhai, Zhichao Shi, Boshen Shi, Xuhui Jiang, 25 Sep 2025, Fractal Graph Contrastive Learning, https://arxiv.org/abs/2505.11356
- Jiali Chen, Avijit Mukherjee, 24 Sep 2025, Generative and Contrastive Graph Representation Learning, https://arxiv.org/abs/2505.11776
- Yueming Sun, Long Yang, 29 Sep 2025, Spatial-Functional awareness Transformer-based graph archetype contrastive learning for Decoding Visual Neural Representations from EEG, https://arxiv.org/abs/2509.24761
- Xi Zhang, Zaiqiao Meng, Jake Lever, Edmond S. L. Ho, 27 Sep 2025, CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding, https://arxiv.org/abs/2509.23379
- Yu-Che Tsai, Kuan-Yu Chen, Yuan-Chi Li, Yuan-Hao Chen, Ching-Yu Tsai, and Shou-De Lin, 29 Sep 2025, Let LLMs Speak Embedding Languages: Generative Text Embeddings via Iterative Contrastive Refinement, https://arxiv.org/abs/2509.24291
- Jeremias D\"otterl, 29 Sep 2025, Contrastive Learning for Correlating Network Incidents, https://arxiv.org/abs/2509.24446
- Guangming Huang, Yunfei Long, Cunjin Luo, 29 Sep 2025, Similarity-Dissimilarity Loss for Multi-label Supervised Contrastive Learning, https://arxiv.org/abs/2410.13439
- Haitao Li, Che Liu, Zhengyao Ding, Ziyi Liu, Wenqi Shao, Zhengxing Huang, 29 Sep 2025, Fine-grained Contrastive Learning for ECG-Report Alignment with Waveform Enhancement, https://arxiv.org/abs/2505.11939
- Asifullah Khan, Laiba Asmatullah, Anza Malik, Shahzaib Khan and Hamna Asif, 28 Sep 2025, A Survey on Self-supervised Contrastive Learning for Multimodal Text-Image Analysis, https://arxiv.org/abs/2503.11101
- Danush Khanna, Gurucharan Marthi Krishna Kumar, Basab Ghosh, Yaswanth Narsupalli, Vinija Jain, Vasu Sharma, Aman Chadha, Amitava Das, 28 Sep 2025, AdversariaL attacK sAfety aLIgnment(ALKALI): Safeguarding LLMs through GRACE: Geometric Representation-Aware Contrastive Enhancement- Introducing Adversarial Vulnerability Quality Index (AVQI), https://arxiv.org/abs/2506.08885
- Usman Ali, Ali Zia, Waqas Ali, Umer Ramzan, Abdul Rehman, Muhammad Tayyab Chaudhry, and Wei Xiang, 17 Oct 2025, Hypergraph Contrastive Sensor Fusion for Multimodal Fault Diagnosis in Induction Motors, https://arxiv.org/abs/2510.15547
- Mallikarjuna Tupakula, 30 Sep 2025, Thin Bridges for Drug Text Alignment: Lightweight Contrastive Learning for Target Specific Drug Retrieval, https://arxiv.org/abs/2510.03309
- Ali Azizpour, Reza Ramezanpour, Ashutosh Sabharwal, Santiago Segarra, 4 Oct 2025, From Moments to Models: Graphon Mixture-Aware Mixup and Contrastive Learning, https://arxiv.org/abs/2510.03690
- Ali Elahi, 3 Oct 2025, Identifying Financial Risk Information Using RAG with a Contrastive Insight, https://arxiv.org/abs/2510.03521
- Byungjun Kim, Soobin Um, Jong Chul Ye, 4 Oct 2025, Diverse Text-to-Image Generation via Contrastive Noise Optimization, https://arxiv.org/abs/2510.03813
- Jiashuo Sun, Shixuan Liu, Zhaochen Su, Xianrui Zhong, Pengcheng Jiang, Bowen Jin, Peiran Li, Weijia Shi, Jiawei Han, 6 Oct 2025, GRACE: Generative Representation Learning via Contrastive Policy Optimization, https://arxiv.org/abs/2510.04506
- Chanjoo Jung, Jaehyung Kim, 6 Oct 2025, TiTok: Transfer Token-level Knowledge via Contrastive Excess to Transplant LoRA, https://arxiv.org/abs/2510.04682
- Kuang Yuan, Yang Gao, Xilin Li, Xinhao Mei, Syavosh Zadissa, Tarun Pruthi, Saeed Bagheri Sereshki, 4 Oct 2025, Lightweight and Generalizable Acoustic Scene Representations via Contrastive Fine-Tuning and Distillation, https://arxiv.org/abs/2510.03728
- Liangjian Wen, Qun Dai, Jianzhuang Liu, Jiangtao Zheng, Yong Dai, Dongkai Wang, Zhao Kang, Jun Wang, Zenglin Xu, Jiang Duan, 4 Oct 2025, InfMasking: Unleashing Synergistic Information by Contrastive Multimodal Interactions, https://arxiv.org/abs/2509.25270
- Achleshwar Luthra, Priyadarsi Mishra, Tomer Galanti, 9 Oct 2025, On the Alignment Between Supervised and Self-Supervised Contrastive Learning, https://arxiv.org/abs/2510.08852
- Lin Wang, Weisong Wang, Xuanji Xiao, Qing Li, 9 Oct 2025, Contrastive Learning Augmented Social Recommendations, https://arxiv.org/abs/2502.15695
- Mengshi Qi, Hao Ye, Jiaxuan Peng, Huadong Ma, 24 Oct 2025, Action Quality Assessment via Hierarchical Pose-guided Multi-stage Contrastive Regression, https://arxiv.org/abs/2501.03674
- Junyuan Liu, Quan Qin, Guangsheng Dong, Xinglei Wang, Jiazhuang Feng, Zichao Zeng, and Tao Cheng, 10 Oct 2025, Beyond AlphaEarth: Toward Human-Centered Spatial Representation via POI-Guided Contrastive Learning, https://arxiv.org/abs/2510.09894
- Zeyu Ling, Xiaodong Gu, Jiangnan Tang, Changqing Zou, 11 Oct 2025, SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation, https://arxiv.org/abs/2510.10069
- Byeongchan Lee, 12 Oct 2025, Understanding Self-supervised Contrastive Learning through Supervised Objectives, https://arxiv.org/abs/2510.10572
- Xufei Lv, Kehai Chen, Haoyuan Sun, Xuefeng Bai, Min Zhang, Houde Liu, Kehai Chen, 13 Oct 2025, The Hidden Link Between RLHF and Contrastive Learning, https://arxiv.org/abs/2506.22578
- Cuipeng Wang, Haipeng Wang, 13 Oct 2025, Contrastive Representation Distillation via Multi-Scale Feature Decoupling, https://arxiv.org/abs/2502.05835
- Fernanda Fam\'a, Roberto Pereira, Charalampos Kalalas, Paolo Dini, Lorena Qendro, Fahim Kawsar, Mohammad Malekzadeh, 9 Oct 2025, Contrastive Self-Supervised Learning at the Edge: An Energy Perspective, https://arxiv.org/abs/2510.08374
- Houcheng Jiang, Junfeng Fang, Jiaxin Wu, Tianyu Zhang, Chen Gao, Yong Li, Xiang Wang, Xiangnan He, Yang Deng, 9 Oct 2025, Contrastive Weak-to-strong Generalization, https://arxiv.org/abs/2510.07884
- Jannek Ulm, Kevin Du, V\'esteinn Sn{\ae}bjarnarson, 9 Oct 2025, Contrastive Decoding for Synthetic Data Generation in Low-Resource Language Modeling, https://arxiv.org/abs/2510.08245
- Andrew Lee, Ian Chuang, Dechen Gao, Kai Fukazawa, Iman Soltani, 9 Oct 2025, Gaze on the Prize: Shaping Visual Attention with Return-Guided Contrastive Learning, https://arxiv.org/abs/2510.08442
- Chongyi Zheng, Ruslan Salakhutdinov, Benjamin Eysenbach, 8 Oct 2025, Contrastive Difference Predictive Coding, https://arxiv.org/abs/2310.20141
- Pardis Taghavi, Tian Liu, Renjie Li, Reza Langari, Zhengzhong Tu, 8 Oct 2025, CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation, https://arxiv.org/abs/2505.21904
- Kiril Bangachev, Guy Bresler, Iliyas Noman, Yury Polyanskiy, 23 Sep 2025, Global Minimizers of Sigmoid Contrastive Loss, https://arxiv.org/abs/2509.18552
- Riad Ahmed Anonto, Sardar Md. Saffat Zabin, M. Saifur Rahman, 22 Sep 2025, Align Where the Words Look: Cross-Attention-Guided Patch Alignment with Contrastive and Transport Regularization for Bengali Captioning, https://arxiv.org/abs/2509.18369
- Songqi Zhou, Zeyuan Liu, Benben Jiang, 22 Oct 2025, FairNet: Dynamic Fairness Correction without Performance Loss via Contrastive Conditional LoRA, https://arxiv.org/abs/2510.19421
- Yunzhe Wang, Soham Hans, Volkan Ustun, 22 Oct 2025, X-Ego: Acquiring Team-Level Tactical Situational Awareness via Cross-Egocentric Contrastive Video Representation Learning, https://arxiv.org/abs/2510.19150
- Berkay Guler, Giovanni Geraci, Hamid Jafarkhani, 22 Oct 2025, A Multi-Task Foundation Model for Wireless Channel Representation Using Contrastive and Masked Autoencoder Learning, https://arxiv.org/abs/2505.09160
- Daksh Pandey, 19 Sep 2025, Polynomial Contrastive Learning for Privacy-Preserving Representation Learning on Graphs, https://arxiv.org/abs/2509.25205
- Yanan Zhao and Feng Ji and Jingyang Dai and Jiaze Ma and Wee Peng Tay, 30 Sep 2025, Less is More: Towards Simple Graph Contrastive Learning, https://arxiv.org/abs/2509.25742
- Lina Conti, Dennis Fucci, Marco Gaido, Matteo Negri, Guillaume Wisniewski, Luisa Bentivogli, 30 Sep 2025, The Unheard Alternative: Contrastive Explanations for Speech-to-Text Models, https://arxiv.org/abs/2509.26543
- Jungsoo Lee, Janghoon Cho, Hyojin Park, Munawar Hayat, Kyuwoong Hwang, Fatih Porikli, Sungha Choi, 30 Sep 2025, Generalized Contrastive Learning for Universal Multimodal Retrieval, https://arxiv.org/abs/2509.25638
- Sattwik Basu, Chaitanya Amballa, Zhongweiyang Xu, Jorge Van\v{c}o Sampedro, Srihari Nelakuditi, Romit Roy Choudhury, 30 Sep 2025, Contrastive Diffusion Guidance for Spatial Inverse Problems, https://arxiv.org/abs/2509.26489
- Wenpeng Lu, Sibo Wei, Xueping Peng, Yi-fei Wang, Usman Naseem and Shoujin Wang, 30 Sep 2025, Medical Question Summarization with Entity-driven Contrastive Learning, https://arxiv.org/abs/2304.07437
- Yuhang Zhang, Jiaping Xiao, Chao Yan, and Mir Feroskhan, 7 Oct 2025, Oracle-Guided Masked Contrastive Reinforcement Learning for Visuomotor Policies, https://arxiv.org/abs/2510.05692
- Minoh Jeong, Alfred Hero, 6 Oct 2025, Generalizing Supervised Contrastive learning: A Projection Perspective, https://arxiv.org/abs/2506.09810
- Minoh Jeong, Seonho Kim, Alfred Hero, 6 Oct 2025, Probabilistic Variational Contrastive Learning, https://arxiv.org/abs/2506.10159
- Hao Yin, Guangzong Si, Zilei Wang, 7 Oct 2025, The Mirage of Performance Gains: Why Contrastive Decoding Fails to Mitigate Object Hallucinations in MLLMs?, https://arxiv.org/abs/2504.10020
- Ruchi Sandilya, Sumaira Perez, Charles Lynch, Lindsay Victoria, Benjamin Zebley, Derrick Matthew Buchanan, Mahendra T. Bhati, Nolan Williams, Timothy J. Spellman, Faith M. Gunning, Conor Liston, Logan Grosenick, 16 Oct 2025, Contrastive Diffusion Alignment: Learning Structured Latents for Controllable Generation, https://arxiv.org/abs/2510.14190
- Ashish Kattamuri, Ishita Prasad, Meetu Malhotra, Arpita Vats, Rahul Raja, Albert Lie, 10 Oct 2025, Bridging the Semantic Gap: Contrastive Rewards for Multilingual Text-to-SQL, https://arxiv.org/abs/2510.13827
- Kyungryul Back, Seongbeom Park, Milim Kim, Mincheol Kwon, SangHyeok Lee, Hyunyoung Lee, Junhee Cho, Seunghyun Park, Jinkyu Kim, 16 Oct 2025, Watermarking for Factuality: Guiding Vision-Language Models Toward Truth via Tri-layer Contrastive Decoding, https://arxiv.org/abs/2510.14304
- Rishal Aggarwal, Jacky Chen, Nicholas M. Boffi, David Ryan Koes, 15 Oct 2025, BoltzNCE: Learning Likelihoods for Boltzmann Generation with Stochastic Interpolants and Noise Contrastive Estimation, https://arxiv.org/abs/2507.00846
Flash Decoding
Flash decoding is a memory-reducing decoding algorithm introduced by the research team better known for "flash attention" (versions 1, 2, and 3 so far). This is similar memory access reductions applied to the decoding algorithm.
- Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, Tianqi Chen, Zhihao Jia, 23 Dec 2023, Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems, https://arxiv.org/abs/2312.15234
- Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu, 16 Jan 2024, A Survey of Resource-efficient LLM and Multimodal Foundation Models, https://arxiv.org/abs/2401.08092 Project: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey
- 8 Jun 2024 (v2), A Survey on Efficient Inference for Large Language Models, Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang, https://arxiv.org/abs/2404.14294
- Ke Hong, Guohao Dai, Jiaming Xu, Qiuli Mao, Xiuhong Li, Jun Liu, kangdi chen, Yuhan Dong, Yu Wang, 2024, FlashDecoding++: Faster Large Language Model Inference with Asynchronization, Flat GEMM Optimization, and Heuristics, Part of Proceedings of Machine Learning and Systems 6 (MLSys 2024) Conference, PDF: https://proceedings.mlsys.org/paper_files/paper/2024/file/5321b1dabcd2be188d796c21b733e8c7-Paper-Conference.pdf (Next generation of Flash Decoding, with improved ascynchronous parallelism of Softmax in both prefill and decoding phases, heuristic dataflow management algorithms, and enhanced GEMM during the decoding phase.)
- Together AI, Nov 13, 2023, Announcing Together Inference Engine – the fastest inference available, https://www.together.ai/blog/together-inference-engine-v1
- Tri Dao, Daniel Haziza, Francisco Massa, Grigory Sizov, October 12, 2023, Flash-Decoding for long-context inference, https://www.together.ai/blog/flash-decoding-for-long-context-inference
- Jinhao Li, Jiaming Xu, Shan Huang, Yonghua Chen, Wen Li, Jun Liu, Yaoxiu Lian, Jiayi Pan, Li Ding, Hao Zhou, Guohao Dai, 6 Oct 2024, Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective, https://arxiv.org/abs/2410.04466
- Aniruddha Nrusimha, William Brandon, Mayank Mishra, Yikang Shen, Rameswar Panda, Jonathan Ragan-Kelley, Yoon Kim, 28 May 2025, FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference, https://arxiv.org/abs/2505.22758 https://github.com/aninrusimha/flashformer (Optimizing kernels for low latency in a single isolated query, not a batch, via kernel fusion and running all components in one kernel, along with programming techniques like metaprogramming.)
Top-p Decoding
Top-p decoding is a longstanding decoding method that examines the cumulative probabilities of the top candidate tokens. Top-p is usually combined with top-k decoding into a hybrid top-k top-p decoding algorithm.
Research papers on Top-p decoding:
- David Spuler, March 2024, Chapter 26. Decoding Algorithms, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
- Sean Welleck, Amanda Bertsch, Matthew Finlayson, Hailey Schoelkopf, Alex Xie, Graham Neubig, Ilia Kulikov, Zaid Harchaoui, 24 Jun 2024, From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models, https://arxiv.org/abs/2406.16838 (Survey and theoretical analysis of many different decoding algorithms, along with various ways to speed them up such as speculative decoding and KV caches.)
- Hugging Face, 2024, Text Generation Inference, https://huggingface.co/docs/text-generation-inference/index
- David Spuler, March 2024, Top-p Decoding, in Generative AI in C++, https://www.aussieai.com/book/ch26-top-p-decoding
- Haoran Wang, Kai Shu, Jan 2025, MakeEveryTokenCount: ASystematic Survey on Decoding Methods for Foundation Model, https://www.researchgate.net/profile/Haoran-Wang-96/publication/387703971_Make_Every_Token_Count_A_Systematic_Survey_on_Decoding_Methods_for_Foundation_Models/links/67784c8ce74ca64e1f49eb15/Make-Every-Token-Count-A-Systematic-Survey-on-Decoding-Methods-for-Foundation-Models.pdf https://github.com/wang2226/Awesome-LLM-Decoding
- Florian Andreas Marwitz, Ralf M\"oller, Magnus Bender, and Marcel Gehrke, 20 Oct 2025, Denoising the Future: Top-p Distributions for Moving Through Time, https://arxiv.org/abs/2506.07578
Min-P Decoding
Min-p decoding is a new minor decoding modification, that mainly improves accuracy (rather than efficiency), but doesn't reduce efficiency either. Similar to top-p decoding, min-p tries to avoid showing tokens with too-low probabilities, so top-p and min-p have the same goal. However, min-p uses a lower threshold for the minimum probability allowed, and changes this threshold dynamically. The discovery of "min-p" was a nice piece of research work, since it is a small coding change that improves accuracy without sacrificing latency.
Research on min-p decoding:
- Ignacio de Gregorio, Aug 2024, Elevate LLM Performance by 20% Instantly with Min-P, https://medium.com/@ignacio.de.gregorio.noblejas/elevate-llm-performance-by-20-instantly-with-min-p-c961fe1daf3b
- Hugging Face, 2024, Min P style sampling - an alternative to Top P/TopK #27670, https://github.com/huggingface/transformers/issues/27670
- Minh Nguyen, Andrew Baker, Andreas Kirsch, Clement Neo, 1 Jul 2024, Min P Sampling: Balancing Creativity and Coherence at High Temperature, https://arxiv.org/abs/2407.01082
- Joao Gante, May 2024, New sampling strategy dropped in 🤗 transformers -- Min P sampling , Hugging Face, https://huggingface.co/posts/joaogante/319451541682734
Constrained Decoding
Constrained decoding is an optimization of the decoding algorithm where there are extra constraints on the token that can be output. This extra information can be used to either force inclusion of a particular token, or to exclude a subset of the tokens from consideration. Examples where there is extra information to use in decoding include:
- Programming language syntax (code generation)
- Parts-of-speech identification
For example, if you're programming an LLM decoding algorithm to output C++ code, then you know that the token 'if' is always followed by a token '(' in the code syntax. Hence, there's not really any need for a full LLM computation after an 'if' token, but the heuristic can be used. This idea is using the "constraint" of the language syntax to do "constrained decoding."
Clearly, that heuristic would be much faster, and easily coded. However, it's not all strawberries and cream, because the next token won't have a KV cache for the current token, if we use this heuristic. Hence, the next token would need to do a "mini-prefill" computation to calculate the KV cache, which means there's almost no point in avoiding the current token's computation (i.e., we are simply pushing the current token's computation onto the next token).
However, we've seen this issue of a "missing KV cache" before in early exit or layer skipping optimizations, where the KV cache is missing for any skipped layers (see KV caching). And there are various tricks to avoid fully re-computing the KV cache, such as propagation of the prior one or fusion with another layer. Similar ideas can be used when constrained decoding skips an LLM computation and the next token's KV cache is thereby absent.
Overlapped parallel computation can be used to address the missing KV cache, as also possible for early exit. The constraints of the language grammar allow the second token's inference to start almost immediately, possibly via a heuristic that does not even involve LLM layer execution. However, the computation of the current token's KV cache can still be completed, in parallel to the next token's decoding cycle, by ensuring that the next token's layers are staggered a little behind the current token's KV cache computation. This overlaps the next token's decoding phase with the current token's KV cache computation.
Research papers on constrained decoding:
- Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng, 6 Jun 2024 (v2), SGLang: Efficient Execution of Structured Language Model Programs, https://arxiv.org/abs/2312.07104 https://github.com/sgl-project/sglang
- K Ahmed, KW Chang, G Van den Broeck, Oct 2024, Controllable Generation via Locally Constrained Resampling, Neurips Safe Generative AI Workshop 2024, https://openreview.net/pdf?id=v091fzXTu0
- Gaya Mehenni, Amal Zouaq, 23 Nov 2024, Ontology-Constrained Generation of Domain-Specific Clinical Summaries, https://arxiv.org/abs/2411.15666
- Will Kurt, Nov 2024, Say What You Mean: A Response to 'Let Me Speak Freely', https://blog.dottxt.co/say-what-you-mean.html
- Zhi Rui Tam, Cheng-Kuang Wu, Yi-Lin Tsai, Chieh-Yen Lin, Hung-yi Lee, Yun-Nung Chen, 14 Oct 2024 (v3), Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models, https://arxiv.org/abs/2408.02442
- Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. Guided open vocabulary image captioning with constrained beam search, 2017, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 936–945, https://arxiv.org/abs/1612.00576
- Chris Hokamp and Qun Liu, 2017, Lexically constrained decoding for sequence generation using grid beam search. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1535–1546, https://arxiv.org/abs/1704.07138
- Yizhe Zhang, Guoyin Wang, Chunyuan Li, Zhe Gan, Chris Brockett, and Bill Dolan. Pointer: Constrained text generation via insertion-based generative pre-training. arXiv preprint arXiv:2005.00558, 2020. https://arxiv.org/abs/2005.00558
- Saibo Geng, Martin Josifoski, Maxime Peyrard, Robert West, 18 Jan 2024 (v6), Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning, https://arxiv.org/abs/2305.13971 https://github.com/epfl-dlab/GCD
- Yanjun Fu, Ethan Baker, Yu Ding, Yizheng Chen, 20 Jul 2024 (v3), Constrained Decoding for Secure Code Generation, https://arxiv.org/abs/2405.00218 https://codeguardplus.github.io/
- Zekun Hao, David W. Romero, Tsung-Yi Lin, Ming-Yu Liu, 12 Dec 2024, Meshtron: High-Fidelity, Artist-Like 3D Mesh Generation at Scale, https://arxiv.org/abs/2412.09548 https://research.nvidia.com/labs/dir/meshtron/ (Optimizations to avoid the quadratic Transformer cost, in both training and inference, include "hourglass neural architecture" analogous to widthwise pruning or slimming, sliding window attention, rolling KV cache, truncated sequence training, and a "robust sampling strategy" that is effectively a type of constrained decoding based on mesh layouts.)
- Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yongkang Wu, Zhonghua Li, Qi Ye, Zhicheng Dou, 16 Dec 2024, RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation, https://arxiv.org/abs/2412.11919 https://github.com/sunnynexus/RetroLLM
- Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
- Haoran Wang, Kai Shu, Jan 2025, MakeEveryTokenCount: ASystematic Survey on Decoding Methods for Foundation Model, https://www.researchgate.net/profile/Haoran-Wang-96/publication/387703971_Make_Every_Token_Count_A_Systematic_Survey_on_Decoding_Methods_for_Foundation_Models/links/67784c8ce74ca64e1f49eb15/Make-Every-Token-Count-A-Systematic-Survey-on-Decoding-Methods-for-Foundation-Models.pdf https://github.com/wang2226/Awesome-LLM-Decoding
- Theia Vogel, December 18, 2023, How to make LLMs go fast, https://vgel.me/posts/faster-inference/
- D Banerjee, T Suresh, S Ugare, S Misailovic, G Singh, Mar 2025, Preserving Reasoning Capabilities Under Constrained LLM Generation, https://openreview.net/pdf?id=RX3GIOkGHr
- Changran Xu, Yi Liu, Yunhao Zhou, Shan Huang, Ningyi Xu, Qiang Xu, 18 Mar 2025, Speculative Decoding for Verilog: Speed and Quality, All in One, https://arxiv.org/abs/2503.14153
- Niels M\"undler and Jasper Dekoninck and Martin Vechev, 13 Aug 2025, Constrained Decoding of Diffusion LLMs with Context-Free Grammars, https://arxiv.org/abs/2508.10111
- Lingxiao Li, Salar Rahili, Yiwei Zhao, 20 Aug 2025, Correctness-Guaranteed Code Generation via Constrained Decoding, https://arxiv.org/abs/2508.15866
- Parv Kapoor, Akila Ganlath, Changliu Liu, Sebastian Scherer, Eunsuk Kang, 1 Sep 2025, Constrained Decoding for Robotics Foundation Models, https://arxiv.org/abs/2509.01728
- Devansh, Sep 2025, The Chocolate Milk Cult’s Guide to Inference Scaling for AI Models: How to Reduce the costs of Running LLMs https://machine-learning-made-simple.medium.com/the-chocolate-milk-cults-guide-to-inference-scaling-for-ai-models-50aa2290eb50 (Deep analysis of using many progressive optimizations to real-life LLM inference.)
- Rajaa El Hamdani, Samy Haffoudhi, Nils Holzenberger, Fabian Suchanek, Thomas Bonald, and Fragkiskos D. Malliaros, 27 Sep 2025, Retrieval-Constrained Decoding Reveals Underestimated Parametric Knowledge in Language Models, https://arxiv.org/abs/2509.23417
Multi-Token Decoding
Multi-token decoding is an optimization whereby two or more tokens are output in a single decoding step. The idea of multi-token decoding is to train a special type of model so that it predicts not just the next token, but also the one after that (and possibly more). This improves on autoregressive decoding because the output is no longer one-at-a-time.
- Shikhar Tuli, Chi-Heng Lin, Yen-Chang Hsu, Niraj K. Jha, Yilin Shen, Hongxia Jin, 1 May 2024, DynaMo: Accelerating Language Model Inference with Dynamic Multi-Token Sampling, https://arxiv.org/abs/2405.00888 (A model trained to predict multiple tokens ahead.)
- Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozière, David Lopez-Paz, Gabriel Synnaeve, 30 Apr 2024, Better & Faster Large Language Models via Multi-token Prediction, https://arxiv.org/abs/2404.19737 Project: https://huggingface.co/facebook/multi-token-prediction
- Michael Nuñez, July 4, 2024, Meta drops AI bombshell: Multi-token prediction models now open for research, https://venturebeat.com/ai/meta-drops-ai-bombshell-multi-token-prediction-models-now-open-for-research/
- Zongyue Qin, Ziniu Hu, Zifan He, Neha Prakriya, Jason Cong, Yizhou Sun, 12 Jul 2024, Multi-Token Joint Speculative Decoding for Accelerating Large Language Model Inference, https://arxiv.org/abs/2407.09722
- Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D Lee, Deming Chen, and Tri Dao. Medusa: Simple llm inference acceleration framework with multiple decoding heads. arXiv preprint arXiv:2401.10774, 2024 https://arxiv.org/abs/2401.10774
- Guanqiao Qu, Qiyuan Chen, Wei Wei, Zheng Lin, Xianhao Chen, Kaibin Huang, July 2024, Mobile Edge Intelligence for Large Language Models: A Contemporary Survey, https://www.techrxiv.org/doi/pdf/10.36227/techrxiv.172115025.57884352
- Taehyeon Kim, Ananda Theertha Suresh, Kishore Papineni, Michael Riley, Sanjiv Kumar, Adrian Benton, 2024, Exploring and Improving Drafts in Blockwise Parallel Decoding, https://openreview.net/pdf?id=KtnUTS1f91
- Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer, 1 May 2024 (v6), LLM Inference Unveiled: Survey and Roofline Model Insights, https://arxiv.org/abs/2402.16363 Code: https://github.com/hahnyuan/LLM-Viewer
- David Spuler, 25th August, 2024, Hot Inference Optimization Techniques, https://www.aussieai.com/blog/hot-inference-research
- Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Tri Dao, September 11, 2023, Medusa: Simple framework for accelerating LLM generation with multiple decoding heads, https://www.together.ai/blog/medusa
- Wei Zhong, Manasa Bharadwaj, 1 Jun 2024 (v2), S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs, https://arxiv.org/abs/2405.20314
- Desh Raj, Gil Keren, Junteng Jia, Jay Mahadeokar, Ozlem Kalinli, 12 Sep 2024, Faster Speech-LLaMA Inference with Multi-token Prediction, https://arxiv.org/abs/2409.08148
- Zilin Xiao, Hongming Zhang, Tao Ge, Siru Ouyang, Vicente Ordonez, Dong Yu, 8 Oct 2024, ParallelSpec: Parallel Drafter for Efficient Speculative Decoding, https://arxiv.org/abs/2410.05589 (Multi-token prediction in draft models for speculative decoding.)
- Siru Ouyang, Shuohang Wang, Minhao Jiang, Ming Zhong, Donghan Yu, Jiawei Han, Yelong Shen, 14 Oct 2024, Temperature-Centric Investigation of Speculative Decoding with Knowledge Distillation, https://arxiv.org/abs/2410.10141 https://github.com/ozyyshr/TempSpec
- Tan Dat Nguyen, Ji-Hoon Kim, Jeongsoo Choi, Shukjae Choi, Jinseok Park, Younglo Lee, Joon Son Chung, 17 Oct 2024, Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding, https://arxiv.org/abs/2410.13839
- Anonymous Authors, Oct 2024, Optimized Multi-Token Joint Decoding With Auxiliary Model for LLM Inference, https://openreview.net/pdf?id=ZHhBawo3k5
- Pengfei Wu, Jiahao Liu, Zhuocheng Gong, Qifan Wang, Jinpeng Li, Jingang Wang, Xunliang Cai, Dongyan Zhao, 27 Oct 2024, FIRP: Faster LLM inference via future intermediate representation prediction, https://arxiv.org/abs/2410.20488
- DP Ghosh, DA Team, Oct 29, 2024, Multi-Token Prediction with Extended Transformer Layers, https://www.researchgate.net/profile/Debiprasad-Ghosh/publication/385311204_Multi-Token_Prediction_with_Extended_Transformer_Layers/links/671fdd2c55a5271cdee28059/Multi-Token-Prediction-with-Extended-Transformer-Layers.pdf
- Yash Akhauri, Safeen Huda, Mohamed S. Abdelfattah, 26 Nov 2024, Attamba: Attending To Multi-Token States, https://arxiv.org/abs/2411.17685
- Shibaranjani Dasgupta, Chandan Maity, Somdip Mukherjee, Rohan Singh, Diptendu Dutta, Debasish Jana, 14 Dec 2024, HITgram: A Platform for Experimenting with n-gram Language Models, https://arxiv.org/abs/2412.10717
- Y Li, K Livescu, J Zhou, Dec 2024, Beyond Token Generation: Adaptive Chunk-Distilled Language Modeling, 38th Conference on Neural Information Processing Systems (NeurIPS 2024), https://neurips2024-enlsp.github.io/papers/paper_90.pdf (Generate multiple tokens in decoding by inserting RAG chunks directly into the decoding output.)
- Tim Urista, Dec 2024, Dramatically Reduce Inference Costs with DeepSeek-V3: A New Era in Open-Source LLMs, https://ai.gopubby.com/dramatically-reduce-inference-costs-with-deepseek-v3-a-new-era-in-open-source-llms-4f1adf760ee1
- Yanhong Li, Karen Livescu, Jiawei Zhou, 31 Dec 2024, Chunk-Distilled Language Modeling, https://arxiv.org/abs/2501.00343 (Multi-token decoding using retrieval.)
- Sean Welleck, Amanda Bertsch, Matthew Finlayson, Hailey Schoelkopf, Alex Xie, Graham Neubig, Ilia Kulikov, Zaid Harchaoui, 20 Nov 2024 (v2), From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models, https://arxiv.org/abs/2406.16838
- Minhajul Hoque, Jan 4, 2025, DeepSeek V3: How They Achieved Big Results with Small Compute, https://ai.plainenglish.io/deepseek-v3-how-they-achieved-big-results-with-small-compute-fb694606d59a (DeepSeek optimizations included FP8 quantization with outlier handling, attention and KV cache optimization via Multi-Head Latent Attention (MHLA), and multi-token decoding.)
- Nandini Lokesh Reddy, Jan 2025, DeepSeek: Bridging Performance and Efficiency in Modern AI, https://medium.com/@nandinilreddy/deepseek-bridging-performance-and-efficiency-in-modern-ai-106181a85693
- Qianhui Zhao, Li Zhang, Fang Liu, Xiaoli Lian, Qiaoyuanhe Meng, Ziqian Jiao, Zetong Zhou, Borui Zhang, Runlin Guo, Jia Li, 24 Feb 2025, CodeSwift: Accelerating LLM Inference for Efficient Code Generation, https://arxiv.org/abs/2502.17139 (Using draft sequences from a datastore of code, to achieve parallel inference, similar to prompt looking decoding or retrieval lookup decoding.)
- Yunhai Hu, Zining Liu, Zhenyuan Dong, Tianfan Peng, Bradley McDanel, Sai Qian Zhang, 27 Feb 2025, Speculative Decoding and Beyond: An In-Depth Review of Techniques, https://arxiv.org/abs/2502.19732
- Yijiong Yu, 26 Mar 2025, Accelerate Parallelizable Reasoning via Parallel Decoding within One Sequence, https://arxiv.org/abs/2503.20533 https://github.com/yuyijiong/parallel-decoding-in-one-sequence
- Chengen Wang, Murat Kantarcioglu, 14 Mar 2025, A Review of DeepSeek Models' Key Innovative Techniques, https://arxiv.org/abs/2503.11486
- L. Xiong et al., May 2025, DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models, IEEE/CAA Journal of Automatica Sinica, vol. 12, no. 5, pp. 841-858, May 2025, doi: 10.1109/JAS.2025.125495, https://ieeexplore.ieee.org/abstract/document/11005752
- Anastasios Gerontopoulos, Spyros Gidaris, Nikos Komodakis, 15 May 2025, Multi-Token Prediction Needs Registers, https://arxiv.org/abs/2505.10518
- Somesh Mehra, Javier Alonso Garcia, Lukas Mauch, 13 Feb 2025, On multi-token prediction for efficient LLM inference, https://arxiv.org/abs/2502.09419?
- Xiaohao Liu, Xiaobo Xia, Weixiang Zhao, Manyi Zhang, Xianzhi Yu, Xiu Su, Shuo Yang, See-Kiong Ng, Tat-Seng Chua, 23 May 2025, L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models, https://arxiv.org/abs/2505.17505
- Stephen Diehl, 2025, Attention Wasn't All We Needed, https://www.stephendiehl.com/posts/post_transformers/
- Anirudhan Badrinath, Prabhat Agarwal, Laksh Bhasin, Jaewon Yang, Jiajing Xu, Charles Rosenberg, 6 Aug 2025, PinRec: Outcome-Conditioned, Multi-Token Generative Retrieval for Industry-Scale Recommendation Systems, https://arxiv.org/abs/2504.10507
- https://developer.nvidia.com/blog/an-introduction-to-speculative-decoding-for-reducing-latency-in-ai-inference/
- Carl Franzen, September 24, 2025, Chinese food delivery app Meituan's open source AI model LongCat-Flash-Thinking rivals GPT-5, https://venturebeat.com/ai/chinese-food-delivery-firm-meituans-open-source-ai-model-longcat-flash
- Xuan Luo, Weizhi Wang, Xifeng Yan, 13 Oct 2025, Direct Multi-Token Decoding, https://arxiv.org/abs/2510.11958
- Geigh Zollicoffer, Minh Vu, Manish Bhattarai, 20 Oct 2025, MTRE: Multi-Token Reliability Estimation for Hallucination Detection in VLMs, https://arxiv.org/abs/2505.11741
- Qimin Zhong, Hao Liao, Siwei Wang, Mingyang Zhou, Xiaoqun Wu, Rui Mao, Wei Chen, 27 Sep 2025, Understanding and Enhancing the Planning Capability of Language Models via Multi-Token Prediction, https://arxiv.org/abs/2509.23186
- Ruben Pascual, Mikel Sesma-Sara, Aranzazu Jurio, Daniel Paternain, Mikel Galar, 10 Oct 2025, Few-shot multi-token DreamBooth with LoRa for style-consistent character generation, https://arxiv.org/abs/2510.09475
- Yuxuan Cai, Xiaozhuan Liang, Xinghua Wang, Jin Ma, Haijin Liang, Jinwen Luo, Xinyu Zuo, Lisheng Duan, Yuyang Yin, Xi Chen, 16 Sep 2025, FastMTP: Accelerating LLM Inference with Enhanced Multi-Token Prediction, https://arxiv.org/abs/2509.18362
- Divyat Mahajan, Sachin Goyal, Badr Youbi Idrissi, Mohammad Pezeshki, Ioannis Mitliagkas, David Lopez-Paz, Kartik Ahuja, 16 Oct 2025, Beyond Multi-Token Prediction: Pretraining LLMs with Future Summaries, https://arxiv.org/abs/2510.14751
- Sebastian Raschka, PhD, Dec 18, 2025 (updated), The Big LLM Architecture Comparison: From DeepSeek V3 to Mistral 3 Large: A Look At Modern LLM Architecture Design, https://magazine.sebastianraschka.com/p/the-big-llm-architecture-comparison
Stop Tokens
Stop tokens are one way whereby LLMs can be trained to control the length of their output. The idea is that stop tokens are incorporated at the end of answers during the training phase, and when they occur in an inference phase, they cause the LLM to stop outputting further tokens at that point.
Research papers with coverage of stop token techniques:
- Louis-François Bouchard, May 10, 2024, How LLMs Know When to Stop Generating? Understand how LLMs like GPT-4 decide when they have answered your question, https://pub.towardsai.net/how-llms-know-when-to-stop-generating-b82a9a57e2c4
- Lianghong Guo, Yanlin Wang, Ensheng Shi, Wanjun Zhong, Hongyu Zhang, Jiachi Chen, Ruikai Zhang, Yuchi Ma, Zibin Zheng, 29 Jul 2024, When to Stop? Towards Efficient Code Generation in LLMs with Excess Token Prevention, https://arxiv.org/abs/2407.20042 Code: https://github.com/DeepSoftwareAnalytics/CodeFast
- Jiaming Li, Lei Zhang, Yunshui Li, Ziqiang Liu, yuelin bai, Run Luo, Longze Chen, Min Yang, 1 Oct 2024 (v2), Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models, https://arxiv.org/abs/2409.18943 https://github.com/Geaming2002/Ruler
- Bradley Butcher, Michael O'Keefe, James Titchener, 16 Dec 2024, Precise Length Control in Large Language Models, https://arxiv.org/abs/2412.11937
General Research on Decoding Algorithms
Papers on the various decoding methods include:
- S Bae, J Ko, H Song, SY Yun, Oct 2023, Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding, arXiv preprint arXiv:2310.05424, https://arxiv.org/pdf/2310.05424.pdf, Code: https://github.com/raymin0223/fast_robust_early_exit (Combination of early-exit with a "shallow-deep module" and parallel decoding.)
- Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K. Li, Richard Socher, 2018, Non-Autoregressive Neural Machine Translation, International Conference on Learning Representations, https://arxiv.org/abs/1711.02281 (Parallel decoding early paper.)
- Marjan Ghazvininejad, Omer Levy, Yinhan Liu, and Luke Zettlemoyer. 2019. Mask-predict: Parallel decoding of conditional masked language models. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 6111–6120. Association for Computational Linguistics. https://arxiv.org/abs/1904.09324
- Jiatao Gu and Xiang Kong. 2021. Fully non-autoregressive neural machine translation: Tricks of the trade. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 120–133, https://arxiv.org/abs/2012.15833
- Nikolay Savinov, Junyoung Chung, Mikolaj Binkowski, Erich Elsen, and Aaron van den Oord. 2022. Step-unrolled denoising autoencoders for text generation. International Conference on Learning Representations. https://arxiv.org/abs/2112.06749
- Andrea Santilli, Silvio Severino, Emilian Postolache, Valentino Maiorca, Michele Mancusi, Riccardo Marin, and Emanuele Rodolà. May 2023. Accelerating transformer inference for translation via parallel decoding. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 12336–12355. Association for Computational Linguistics. https://arxiv.org/abs/2305.10427
- Y Zhang, Y Zhang, L Cui, G Fu, Oct 2023, Non-autoregressive Text Editing with Copy-aware Latent Alignments, arXiv preprint arXiv:2310.07821, https://arxiv.org/pdf/2310.07821.pdf
- Tri Dao, Daniel Haziza, Francisco Massa, Grigory Sizov, October 13, 2023, Flash-Decoding for long-context inference, PyTorch Blog, https://pytorch.org/blog/flash-decoding/
- Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, Richard Socher, Sep 2019, CTRL: A Conditional Transformer Language Model for Controllable Generation, https://arxiv.org/abs/1909.05858, Code: https://github.com/salesforce/ctrl
- Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe, Mar 2022, Training language models to follow instructions with human feedback, https://arxiv.org/abs/2203.02155 (InstructGPT main paper from OpenAI in 2022.)
- Ning Gong, Nianmin Yao, June 2023, A generalized decoding method for neural text generation, Computer Speech & Language, Volume 81, 101503, https://www.sciencedirect.com/science/article/abs/pii/S0885230823000220
- Cohere, 2023, Temperature, https://docs.cohere.com/docs/temperature
- GC Garbacea, 2023, Neural Language Generation for Content Adaptation: Explainable, Efficient Low-Resource Text Simplification and Evaluation, Ph.D. thesis, Computer Science and Engineering, University of Michigan, https://deepblue.lib.umich.edu/bitstream/handle/2027.42/178028/garbacea_1.pdf?sequence=1 (Broad thesis with sections on beam search decoding optimizations and AI safety issues such as bias.)
- Yizhe Zhang, Guoyin Wang, Chunyuan Li, Zhe Gan, Chris Brockett, and Bill Dolan. Pointer: Constrained text generation via insertion-based generative pre-training. arXiv preprint arXiv:2005.00558, 2020. https://arxiv.org/abs/2005.00558
- Bryan Eikema and Wilker Aziz. 2020. Is MAP decoding all you need? The inadequacy of the mode in neural machine translation. Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), December 8- 13, 2020, pages 4506–4520. International Committee on Computational Linguistics. https://arxiv.org/abs/2005.10283
- Haoran Yang, Deng Cai, Huayang Li, Wei Bi, Wai Lam, Shuming Shi, May 2023, A Frustratingly Simple Decoding Method for Neural Text Generation, https://arxiv.org/abs/2305.12675
- Clara Meister, Tiago Pimentel, Gian Wiher, and Ryan Cotterell. 2022. Typical decoding for natural language generation. arXiv preprint arXiv:2202.00666, https://arxiv.org/abs/2202.00666 (The "typical sampling" decoding algorithm.)
- Yixuan Su, Tian Lan, Yan Wang, Dani Yogatama, Lingpeng Kong, and Nigel Collier. 2022. A contrastive framework for neural text generation. Advances in Neural Information Processing Systems, https://arxiv.org/abs/2202.06417 (The "contrastive search" decoding algorithm.)
- Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. SimCSE: Simple contrastive learning of sentence embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894–6910, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics. https://arxiv.org/abs/2104.08821 (A "contrastive" decoding algorithm.)
- John Hewitt, Christopher D. Manning, and Percy Liang. 2022. Truncation sampling as language model desmoothing. In Findings of the Conference on Empirical Methods in Natural Language Processing (Findings of EMNLP). https://arxiv.org/abs/2210.15191 (The "truncation sampling" decoding algorithm.)
- Xiang Lisa Li, Ari Holtzman, Daniel Fried, Percy Liang, Jason Eisner, Tatsunori Hashimoto, Luke Zettlemoyer, and Mike Lewis. 2022. Contrastive decoding: Open-ended text generation as optimization. arXiv preprint arXiv:2210.15097, https://arxiv.org/abs/2210.15097 (A "contrastive decoding" algorithm.)
- Ari Holtzman, Jan Buys, Maxwell Forbes, Antoine Bosselut, David Golub, Yejin Choi, 2018, Learning to Write with Cooperative Discriminators, https://arxiv.org/abs/1805.06087
- Massimo Caccia, Lucas Caccia, William Fedus, Hugo Larochelle, Joelle Pineau, and Laurent Charlin, 2020, Language GANs falling short, International Conference on Learning Representations. https://arxiv.org/abs/1811.02549
- Moin Nadeem, Tianxing He, Kyunghyun Cho, and James Glass, 2020, A systematic characterization of sampling algorithms for open-ended language generation, Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pp. 334–346. https://arxiv.org/abs/2009.07243, Code: https://github.com/moinnadeem/characterizing-sampling-algorithms
- Hugh Zhang, Daniel Duckworth, Daphne Ippolito, and Arvind Neelakantan, 2021, Trading off diversity and quality in natural language generation, EACL 2021, p. 25, https://arxiv.org/abs/2004.10450
- Yunqi Zhu, Xuebing Yang, Yuanyuan Wu, Wensheng Zhang, 22 Mar 2024, Hierarchical Skip Decoding for Efficient Autoregressive Text Generation, https://arxiv.org/abs/2403.14919 (A new decoding algorithm called Hierarchical Skip Decoding involving layer skipping.)
- Yassir Fathullah, Puria Radmard, Adian Liusie, Mark J. F. Gales, 2024, Who Needs Decoders? Efficient Estimation of Sequence-Level Attributes with Proxies, Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics Volume 1: Long Papers, pages 1478–1496 March 17-22, 2024, https://aclanthology.org/2024.eacl-long.89.pdf (Non-autoregressive decoding methods in special use cases such as machine language translation.)
- Abhimanyu Bambhaniya, Ritik Raj, Geonhwa Jeong, Souvik Kundu, Sudarshan Srinivasan, Midhilesh Elavazhagan, Madhu Kumar, Tushar Krishna, 3 Jun 2024, Demystifying Platform Requirements for Diverse LLM Inference Use Cases, https://arxiv.org/abs/2406.01698 Code: https://github.com/abhibambhaniya/GenZ-LLM-Analyzer (Analysis of cost of serving LLMs, including separate profiles of prefill versus decoding phases, and the cost of extra prompt processing in RAG architectures with prepended information.)
- Yechen Xu, Xinhao Kong, Tingjun Chen, Danyang Zhuo, 4 Jun 2024 (v2), Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution, https://arxiv.org/abs/2406.00059 Code: https://github.com/conveyor-sys/conveyor (Speeding up inference by partially running tools in parallel to the LLM query procesisng, rather than sequentially after the LLM request, by detecting tool requests deep inside the decoding algorithm and starting them off immediately, before the LLM has finished generating the fully decoed output.)
- Hao (Mark) Chen, Wayne Luk, Ka Fai Cedric Yiu, Rui Li, Konstantin Mishchenko, Stylianos I. Venieris, Hongxiang Fan, 28 May 2024, Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference, https://arxiv.org/abs/2405.18628 Code: https://github.com/hmarkc/parallel-prompt-decoding (Similar to speculative decoding with extra trained prompt tokens and a tree-structured verification of multiple optional draft sequences.)
- Maxime Peyrard, Martin Josifoski, Robert West, 21 Mar 2024, The Era of Semantic Decoding, https://arxiv.org/abs/2403.14562
- Ethan Shen, Alan Fan, Sarah M Pratt, Jae Sung Park, Matthew Wallingford, Sham M. Kakade, Ari Holtzman, Ranjay Krishna, Ali Farhadi, Aditya Kusupati, 28 May 2024, Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass, https://arxiv.org/abs/2405.18400 https://github.com/RAIVNLab/SuperposedDecoding (Generating multiple possible drafts from a single decoding algorithm with one model pass by superimposing embeddings and using top-k decoding.)
- Rya Sanovar, Srikant Bharadwaj, Renee St. Amant, Victor Rühle, Saravan Rajmohan, 17 May 2024, Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers, https://arxiv.org/abs/2405.10480
- Yifu Qiu, Zheng Zhao, Yftah Ziser, Anna Korhonen, Edoardo M. Ponti, Shay B. Cohen, 15 May 2024, Spectral Editing of Activations for Large Language Model Alignment, https://arxiv.org/pdf/2405.09719 Code: https://github.com/yfqiu-nlp/sea-llm
- D Shin, May 8, 2024, Multi-User Language Model Resource Allocation Using Contextual Pause Token Aware Transformers, Technical Disclosure Commons, https://www.tdcommons.org/dpubs_series/6981/ PDF: https://www.tdcommons.org/cgi/viewcontent.cgi?article=8121&context=dpubs_series (Interesting idea of training a model how and when to pause during inference, so it can be pre-empted if needed, and thus the overall system can schedule batching of multiple queries more optimally.)
- Shujian Zhang, Korawat Tanwisuth, Chengyue Gong, Pengcheng He, Mingyuan Zhou, 7 May 2024, Switchable Decision: Dynamic Neural Generation Networks, https://arxiv.org/abs/2405.04513 (Switching and skipping sub-layer components such as attention heads, FFNs, or input token skipping, using decisions made based on allocating computation resources.)
- Shikhar Tuli, Chi-Heng Lin, Yen-Chang Hsu, Niraj K. Jha, Yilin Shen, Hongxia Jin, 1 May 2024, DynaMo: Accelerating Language Model Inference with Dynamic Multi-Token Sampling, https://arxiv.org/abs/2405.00888 (A model trained to predict multiple tokens ahead.)
- Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozière, David Lopez-Paz, Gabriel Synnaeve, 30 Apr 2024, Better & Faster Large Language Models via Multi-token Prediction, https://arxiv.org/abs/2404.19737
- Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari, 22 Apr 2024, OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework, Apple Research, https://arxiv.org/abs/2404.14619 Code: https://huggingface.co/apple/OpenELM
- Jared Lichtarge, Christopher Alberti, Shankar Kumar, Noam Shazeer, Niki Parmar, 31 Oct 2018, Weakly Supervised Grammatical Error Correction using Iterative Decoding, https://arxiv.org/abs/1811.01710
- Cunchen Hu, Heyang Huang, Liangliang Xu, Xusheng Chen, Jiang Xu, Shuang Chen, Hao Feng, Chenxi Wang, Sa Wang, Yungang Bao, Ninghui Sun, Yizhou Shan, 20 Jan 2024, Inference without Interference: Disaggregate LLM Inference for Mixed Downstream Workloads, https://arxiv.org/abs/2401.11181 (Separating the prefill and decoding phases for optimization.)
- Amey Agrawal, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Ramachandran Ramjee, 31 Aug 2023, SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills, https://arxiv.org/abs/2308.16369 (Examines the different GPU costs of prefill vs decoding phases, and optimizes decoding by "piggybacking" off the more intense computation during prefill.)
- You Zhou, Xiujing Lin, Xiang Zhang, Maolin Wang, Gangwei Jiang, Huakang Lu, Yupeng Wu, Kai Zhang, Zhe Yang, Kehang Wang, Yongduo Sui, Fengwei Jia, Zuoli Tang, Yao Zhao, Hongxuan Zhang, Tiannuo Yang, Weibo Chen, Yunong Mao, Yi Li, De Bao, Yu Li, Hongrui Liao, Ting Liu, Jingwen Liu, Jinchi Guo, Xiangyu Zhao, Ying WEI, Hong Qian, Qi Liu, Xiang Wang, Wai Kin (Victor)Chan, Chenliang Li, Yusen Li, Shiyu Yang, Jining Yan, Chao Mou, Shuai Han, Wuxia Jin, Guannan Zhang, Xiaodong Zeng, Nov 2023, On the Opportunities of Green Computing: A Survey, https://arxiv.org/abs/2311.00447 (Extensive survey of environmental and green AI issues, along with a survey of various optimization methods to reduce AI resource requirements in training and inference.)
- Pratyush Patel, Esha Choukse, Chaojie Zhang, Íñigo Goiri, Aashaka Shah, Saeed Maleki, Ricardo Bianchini, 30 Nov 2023, Splitwise: Efficient generative LLM inference using phase splitting, https://arxiv.org/abs/2311.18677 (Separates the two Transformer phases of initial prompt computation or prefill to generate the KV cache, and the token generation phase or decoding algorithm onto two machines.)
- Yao Zhao, Zhitian Xie, Chenyi Zhuang, Jinjie Gu, Jan 2024, Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy, https://arxiv.org/abs/2312.12728 Code: https://github.com/alipay/PainlessInferenceAcceleration
- Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, Tianqi Chen, Zhihao Jia, 23 Dec 2023, Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems, https://arxiv.org/abs/2312.15234
- Yang Song, Chenlin Meng, Renjie Liao, Stefano Ermon, 2021, Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving, Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021, https://proceedings.mlr.press/v139/song21a/song21a.pdf
- Yichao Fu, Peter Bailis, Ion Stoica, Hao Zhang, Nov 21, 2023, Break the Sequential Dependency of LLM Inference Using Lookahead Decoding, https://lmsys.org/blog/2023-11-21-lookahead-decoding/ Code: https://github.com/hao-ai-lab/LookaheadDecoding (Generates tokens in parallel by using Jacobi iteration.)
- N Varshney, A Chatterjee, M Parmar, C Baral, Oct 2023, arXiv preprint arXiv:2310.18581, Accelerating LLM Inference by Enabling Intermediate Layer Decoding, https://arxiv.org/pdf/2310.18581.pdf (Dynamic confidence-based early exiting analysis on LLama models.)
- Chufan Shi, Haoran Yang, Deng Cai, Zhisong Zhang, Yifan Wang, Yujiu Yang, Wai Lam, 10 Feb 2024, A Thorough Examination of Decoding Methods in the Era of LLMs, https://arxiv.org/abs/2402.06925 (Evaluates a number of decoding algorithms with several 7B models including Llama2-7B, and also with 4-bit and 8-bit quantization.)
- Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhijun Tu, Kai Han, Hailin Hu, Dacheng Tao, 5 Feb 2024. A Survey on Transformer Compression. https://arxiv.org/abs/2402.05964 (Model compression survey paper with focus on pruning, quantization, knowledge distillation, and efficient architecture design.)
- Xuanlei Zhao, Bin Jia, Haotian Zhou, Ziming Liu, Shenggan Cheng, Yang You, 2 Mar 2024, HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices, https://arxiv.org/abs/2403.01164
- Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, Ramachandran Ramjee, 4 Mar 2024, Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve, https://arxiv.org/abs/2403.02310 (Faster latency by scheduling of prefill and decoding algorithm phases.)
- C Hooper, S Kim, H Mohammadzadeh, H Genc, Oct 2023, SPEED: Speculative Pipelined Execution for Efficient Decoding https://arxiv.org/pdf/2310.12072.pdf
- Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. Guided open vocabulary image captioning with constrained beam search, 2017, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 936–945, https://arxiv.org/abs/1612.00576
- Chris Hokamp and Qun Liu, 2017, Lexically constrained decoding for sequence generation using grid beam search. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1535–1546, https://arxiv.org/abs/1704.07138
- David Spuler, March 2024, Chapter 26. Decoding Algorithms, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
- S Yang, G Lee, J Cho, D Papailiopoulos, 2023, Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding, https://arxiv.org/abs/2307.05908
- Ke Hong, Guohao Dai, Jiaming Xu, Qiuli Mao, Xiuhong Li, Jun Liu, kangdi chen, Yuhan Dong, Yu Wang, 2024, FlashDecoding++: Faster Large Language Model Inference with Asynchronization, Flat GEMM Optimization, and Heuristics, Part of Proceedings of Machine Learning and Systems 6 (MLSys 2024) Conference, PDF: https://proceedings.mlsys.org/paper_files/paper/2024/file/5321b1dabcd2be188d796c21b733e8c7-Paper-Conference.pdf (Next generation of Flash Decoding, with improved ascynchronous parallelism of Softmax in both prefill and decoding phases, heuristic dataflow management algorithms, and enhanced GEMM during the decoding phase.)
- kipply's blog, 2023-03-30, Transformer Taxonomy (the last lit review), https://kipp.ly/transformer-taxonomy/ (Papers for all the Transformer architectures and milestone papers for the major optimization improvements on them.)
- Trenton Bricken, November 20, 2019, Tail Free Sampling A new way to sample from language models for text generation, https://www.trentonbricken.com/Tail-Free-Sampling/ (Alternative to top-k/top-p decoding.)
- Sean Welleck, Amanda Bertsch, Matthew Finlayson, Hailey Schoelkopf, Alex Xie, Graham Neubig, Ilia Kulikov, Zaid Harchaoui, 24 Jun 2024, From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models, https://arxiv.org/abs/2406.16838 (Survey and theoretical analysis of many different decoding algorithms, along with various ways to speed them up such as speculative decoding and KV caches.)
- Mouxiang Chen, Hao Tian, Zhongxin Liu, Xiaoxue Ren, Jianling Sun, 5 Jun 2024 (v2), JumpCoder: Go Beyond Autoregressive Coder via Online Modification, https://arxiv.org/abs/2401.07870 Code: https://github.com/Keytoyze/JumpCoder
- Zexuan Qiu, Zijing Ou, Bin Wu, Jingjing Li, Aiwei Liu, Irwin King, 25 Jun 2024, Entropy-Based Decoding for Retrieval-Augmented Large Language Models, https://arxiv.org/abs/2406.17519 (Enhanced decoding algorithm for multi-document RAG processing.)
- Youngsuk Park, Kailash Budhathoki, Liangfu Chen, Jonas Kübler, Jiaji Huang, Matthäus Kleindessner, Jun Huan, Volkan Cevher, Yida Wang, George Karypis, 12 Jul 2024, Inference Optimization of Foundation Models on AI Accelerators, KDD’24, August 25–29, 2024, Barcelona, Spain, https://arxiv.org/abs/2407.09111
- Jiaao He, Kezhao Huang, Jidong Zhai, July 2024, FASTDECODE: High-Throughput LLM Serving through Disaggregating Attention Computation, https://openreview.net/pdf?id=GahfuPsGw2 (Distributing KV caches to multiple nodes.)
- Xukun Liu, Bowen Lei, Ruqi Zhang, Dongkuan Xu, 27 Jun 2024, Adaptive Draft-Verification for Efficient Large Language Model Decoding, https://arxiv.org/abs/2407.12021 Project: https://anonymous.4open.science/r/ADED-C7D5 (A draft-and-verification method that is similar to speculative decoding, but differs.)
- Leo Donisch, Sigurd Schacht, Carsten Lanquillon, 6 Aug 2024, Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations, https://arxiv.org/abs/2408.03130
- Yunjia Xi, Hangyu Wang, Bo Chen, Jianghao Lin, Menghui Zhu, Weiwen Liu, Ruiming Tang, Weinan Zhang, Yong Yu, 11 Aug 2024, A Decoding Acceleration Framework for Industrial Deployable LLM-based Recommender Systems, https://arxiv.org/abs/2408.05676 (Determining when speculative decoding is most beneficial.)
- Sidharth Mudgal, Jong Lee, Harish Ganapathy, Yaguang Li, Tao Wang, Yanping Huang, Zhifeng Chen, Heng-Tze Cheng, Michael Collins, Trevor Strohman, Jilin Chen, Alex Beutel, Ahmad Beirami, July 2024, Controlled Decoding from Language Models, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:36486-36503, 2024, https://proceedings.mlr.press/v235/mudgal24a.html
- Wenhong Zhu, Hongkun Hao, Zhiwei He, Yiming Ai, Rui Wang, July 2024, Improving Open-Ended Text Generation via Adaptive Decoding, Proceedings of the 41st International Conference on Machine Learning, PMLR 235:62386-62404, 2024, https://proceedings.mlr.press/v235/zhu24d.html
- Chenhan Yuan, Fei Huang, Ru Peng, Keming Lu, Bowen Yu, Chang Zhou, Jingren Zhou, 20 Aug 2024, Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model, https://arxiv.org/abs/2408.10764 Code: https://github.com/chenhan97/Otter (Inference intervention in the decoding algorithm.)
- Yi Cheng, Xiao Liang, Yeyun Gong, Wen Xiao, Song Wang, Yuji Zhang, Wenjun Hou, Kaishuai Xu, Wenge Liu, Wenjie Li, Jian Jiao, Qi Chen, Peng Cheng, Wayne Xiong, Integrative Decoding: Improve Factuality via Implicit Self-consistency, 3 Oct 2024 (v2), https://arxiv.org/abs/2410.01556 (Prepends a previous response to improve decoding accuracy.)
- Xinyi Zeng, Yuying Shang, Yutao Zhu, Jiawei Chen, Yu Tian, 9 Oct 2024, Root Defence Strategies: Ensuring Safety of LLM at the Decoding Level, https://arxiv.org/abs/2410.06809
- K Ahmed, KW Chang, G Van den Broeck, Oct 2024, Controllable Generation via Locally Constrained Resampling, Neurips Safe Generative AI Workshop 2024, https://openreview.net/pdf?id=v091fzXTu0
- Yuxuan Liu, Wenyuan Li, Laizhong Cui, Hailiang Yang, 17 Oct 2024, Cerberus: Efficient Inference with Adaptive Parallel Decoding and Sequential Knowledge Enhancement, https://arxiv.org/abs/2410.13344
- Rongxiang Wang and Felix Xiaozhu Lin. 2024. Turbocharge Speech Understanding with Pilot Inference. In Proceedings of the 30th Annual International Conference on Mobile Computing and Networking (ACM MobiCom '24). Association for Computing Machinery, New York, NY, USA, 1299–1313. https://doi.org/10.1145/3636534.3690694 https://dl.acm.org/doi/abs/10.1145/3636534.3690694 https://dl.acm.org/doi/pdf/10.1145/3636534.3690694 ("Pilot inference" is a specialized mix of caching, computation reuse, and backtracking in beam search for speech understanding, and is somewhat related to speculative decoding, and similar to continual inference for processing a stream.)
- Yixiong Fang, Ziran Yang, Zhaorun Chen, Zhuokai Zhao, Jiawei Zhou, 9 Dec 2024, From Uncertainty to Trust: Enhancing Reliability in Vision-Language Models with Uncertainty-Guided Dropout Decoding, https://arxiv.org/abs/2412.06474
- Xuezhi Wang, Denny Zhou, 23 May 2024 (v2), Chain-of-Thought Reasoning Without Prompting, https://arxiv.org/abs/2402.10200 ("CoT decoding" is examining the alternative paths in the decoding algorithm, which is somewhat similar to Chain-of-Thought reasoning.)
- Y Li, K Livescu, J Zhou, Dec 2024, Beyond Token Generation: Adaptive Chunk-Distilled Language Modeling, 38th Conference on Neural Information Processing Systems (NeurIPS 2024), https://neurips2024-enlsp.github.io/papers/paper_90.pdf (Generate multiple tokens in decoding by inserting RAG chunks directly into the decoding output.)
- Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
- Mehul Damani, Idan Shenfeld, Andi Peng, Andreea Bobu, Jacob Andreas, 7 Oct 2024, Learning How Hard to Think: Input-Adaptive Allocation of LM Computation, https://arxiv.org/abs/2410.04707
- Jianyi Zhang, Da-Cheng Juan, Cyrus Rashtchian, Chun-Sung Ferng, Heinrich Jiang, Yiran Chen, 27 Nov 2024 (v2), SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models, https://arxiv.org/abs/2411.02433 https://jayzhang42.github.io/sled_page/ (Decoding algorithm that compares logit values in the final layer with those from earlier layers.)
- Yuval Shalev, Amir Feder, Ariel Goldstein, 19 Jun 2024, Distributional reasoning in LLMs: Parallel reasoning processes in multi-hop reasoning, https://arxiv.org/abs/2406.13858 (Using embeddings from intermediate model layers in decoding to mimic reasoning pathways.)
- Eden Biran, Daniela Gottesman, Sohee Yang, Mor Geva, Amir Globerson, 14 Oct 2024 (v2), Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries, https://arxiv.org/abs/2406.12775 (Backpatching prior layers using embeddings from the current activations to mimic multi-step reasoning.)
- Jacob Pfau, William Merrill, Samuel R. Bowman, 24 Apr 2024, Let's Think Dot by Dot: Hidden Computation in Transformer Language Models, https://arxiv.org/abs/2404.15758 (Use of dummy "filler tokens" similar to "pause tokens" or "reasoning tokens" to aid multi-step reasoning in decoding.)
- Haoran Wang, Kai Shu, Jan 2025, MakeEveryTokenCount: ASystematic Survey on Decoding Methods for Foundation Model, https://www.researchgate.net/profile/Haoran-Wang-96/publication/387703971_Make_Every_Token_Count_A_Systematic_Survey_on_Decoding_Methods_for_Foundation_Models/links/67784c8ce74ca64e1f49eb15/Make-Every-Token-Count-A-Systematic-Survey-on-Decoding-Methods-for-Foundation-Models.pdf https://github.com/wang2226/Awesome-LLM-Decoding
- Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F. Wong, and Lidia S. Chao. 2019, Learning deep transformer models for machine translation. In Proc. of ACL, 2019. https://arxiv.org/abs/1906.01787
- Xuezhe Ma, Chunting Zhou, Xian Li, Graham Neubig, and Eduard H. Hovy. FlowSeq: Non-autoregressive conditional sequence generation with generative flow. In Proc. of EMNLP, 2019. https://arxiv.org/abs/1909.02480.
- Raphael Shu, Jason Lee, Hideki Nakayama, and Kyunghyun Cho. 2020, Latent-variable non-autoregressive neural machine translation with deterministic inference using a delta posterior. In Proc. of AAAI, 2020. https://arxiv.org/abs/1908.07181
- Huan Ma, Jingdong Chen, Guangyu Wang, Changqing Zhang, 1 Feb 2025, Estimating LLM Uncertainty with Logits, https://arxiv.org/abs/2502.00290
- Zeyu Tang, Zhenhao Chen, Loka Li, Xiangchen Song, Yunlong Deng, Yifan Shen, Guangyi Chen, Peter Spirtes, Kun Zhang, 5 Feb 2025, Reflection-Window Decoding: Text Generation with Selective Refinement, https://arxiv.org/abs/2502.03678 (Combination of sliding window attention with pausing.)
- Weihua Du, Yiming Yang, Sean Welleck, 7 Feb 2025, Optimizing Temperature for Language Models with Multi-Sample Inference, https://arxiv.org/abs/2502.05234 https://github.com/StigLidu/TURN
- Jacob Trauger, Ambuj Tewari, 16 May 2025, On Next-Token Prediction in LLMs: How End Goals Determine the Consistency of Decoding Algorithms, https://arxiv.org/abs/2505.11183
- Zhibin Wang, Rui Ning, Chao Fang, Zhonghui Zhang, Xi Lin, Shaobo Ma, Mo Zhou, Xue Li, Zhongfeng Wang, Chengying Huan, Rong Gu, Kun Yang, Guihai Chen, Sheng Zhong, Chen Tian, 23 May 2025, FlashForge: Ultra-Efficient Prefix-Aware Attention for LLM Decoding, https://arxiv.org/abs/2505.17694
- Niels M\"undler and Jasper Dekoninck and Martin Vechev, 13 Aug 2025, Constrained Decoding of Diffusion LLMs with Context-Free Grammars, https://arxiv.org/abs/2508.10111
- Haonan Ge, Yiwei Wang, Ming-Hsuan Yang, Yujun Cai, 14 Aug 2025, MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs, https://arxiv.org/abs/2508.10264
- Timon Merk, Saeed Salehi, Richard M. Koehler, Qiming Cui, Maria Olaru, Amelia Hahn, Nicole R. Provenza, Simon Little, Reza Abbasi-Asl, Phil A. Starr, Wolf-Julian Neumann, 13 Aug 2025, Pre-trained Transformer-models using chronic invasive electrophysiology for symptom decoding without patient-individual training, https://arxiv.org/abs/2508.10160
- Keyu Chen, Zhifeng Shen, Daohai Yu, Haoqian Wu, Wei Wen, Jianfeng He, Ruizhi Qiao, Xing Sun, 14 Aug 2025, ASPD: Unlocking Adaptive Serial-Parallel Decoding by Exploring Intrinsic Parallelism in LLMs, https://arxiv.org/abs/2508.08895
- Ran Wang, Xiaoxuan Liu, Hao Ren, Gang Chen, Fanchao Qi, Maosong Sun, 22 Jul 2025, WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding, https://arxiv.org/abs/2507.16768
- Sijin Yu, Zijiao Chen, Wenxuan Wu, Shengxian Chen, Zhongliang Liu, Jingxin Nie, Xiaofen Xing, Xiangmin Xu, Xin Zhang, 22 Jul 2025, From Flat to Round: Redefining Brain Decoding with Surface-Based fMRI and Cortex Structure, https://arxiv.org/abs/2507.16389
- Yuxi Lin and Yaxue Fang and Zehong Zhang and Zhouwu Liu and Siyun Zhong and Fulong Yu, 22 Jul 2025, Decoding Translation-Related Functional Sequences in 5'UTRs Using Interpretable Deep Learning Models, https://arxiv.org/abs/2507.16801
- Arindam Ghosh, Mark Fuhs, Bongjun Kim, Anurag Chowdhury, Monika Woszczyna, 14 Jul 2025, ASR-Guided Speaker-Role Diarization and Diarization-Guided ASR Decoding, https://arxiv.org/abs/2507.17765
- Milad Taghipour, Bane Vasic, 23 Jul 2025, Action-List Reinforcement Learning Syndrome Decoding for Binary Linear Block Codes, https://arxiv.org/abs/2507.17893
- Alex Liu, Lief Esbenshade, Shawon Sarkar, Victor Tian, Zachary Zhang, Kevin He, Min Sun, 23 Jul 2025, Decoding Instructional Dialogue: Human-AI Collaborative Analysis of Teacher Use of AI Tool at Scale, https://arxiv.org/abs/2507.17985
- Anushka Tiwari, Sayantan Pal, Rohini K. Srihari, Kaiyi Ji, 19 Jul 2025, Task-Agnostic Continual Prompt Tuning with Gradient-Based Selection and Decoding, https://arxiv.org/abs/2507.14725
- Xiaojuan Zhang and Tianyu Jiang and Haoxiang Zong and Chen Zhang and Chendan Li and Marta Molinas, 13 Jul 2025, AI-Based Impedance Encoding-Decoding Method for Online Impedance Network Construction of Wind Farms, https://arxiv.org/abs/2507.14187
- Donghoon Kim, Minji Bae, Kyuhong Shim, Byonghyo Shim, 21 Jul 2025, Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models, https://arxiv.org/abs/2505.08622
- Taeyoun Kwon, Junhyuk Ahn, Taegeun Yun, Heeju Jwa, Yoonchae Choi, Siwon Park, Nam-Joon Kim, Jangchan Kim, Hyun Gon Ryu, and Hyuk-Jae Lee, 9 Aug 2025, Whisfusion: Parallel ASR Decoding via a Diffusion Transformer, https://arxiv.org/abs/2508.07048
- Lilit Grigoryan, Vladimir Bataev, Nikolay Karpov, Andrei Andrusenko, Vitaly Lavrukhin, Boris Ginsburg, 10 Aug 2025, FlexCTC: GPU-powered CTC Beam Decoding with advanced Contextual Abilities, https://arxiv.org/abs/2508.07315
- Hao Yang, Qinghua Zhao, Lei Li, 28 Jul 2025, How Chain-of-Thought Works? Tracing Information Flow from Decoding, Projection, and Activation, https://arxiv.org/abs/2507.20758
- David Ye, Jan Williams, Mars Gao, Stefano Riva, Matteo Tomasetto, David Zoro, J. Nathan Kutz, 28 Jul 2025, PySHRED: A Python package for SHallow REcurrent Decoding for sparse sensing, model reduction and scientific discovery, https://arxiv.org/abs/2507.20954
- Jinzhou Wu, Baoping Tang, Qikang Li, Yi Wang, Cheng Li, Shujian Yu, 28 Jul 2025, When Brain Foundation Model Meets Cauchy-Schwarz Divergence: A New Framework for Cross-Subject Motor Imagery Decoding, https://arxiv.org/abs/2507.21037
- Max Peeperkorn, Tom Kouwenhoven, Dan Brown and Anna Jordanous, 28 Jul 2025, Mind the Gap: Conformative Decoding to Improve Output Diversity of Instruction-Tuned Large Language Models, https://arxiv.org/abs/2507.20956
- Ningyuan Xi, Xiaoyu Wang, Yetao Wu, Teng Chen, Qingqing Gu, Yue Zhao, Jinxian Qu, Zhonglin Jiang, Yong Chen, Luo Ji, 26 Jul 2025, MeTHanol: Modularized Thinking Language Models with Intermediate Layer Thinking, Decoding and Bootstrapping Reasoning, https://arxiv.org/abs/2409.12059
- Vishal Raman, Vijai Aravindh R, 29 Jul 2025, Evo-DKD: Dual-Knowledge Decoding for Autonomous Ontology Evolution in Large Language Models, https://arxiv.org/abs/2507.21438
- Dian Chen, Yansong Qu, Xinyang Li, Ming Li, Shengchuan Zhang, 31 Jul 2025, XSpecMesh: Quality-Preserving Auto-Regressive Mesh Generation Acceleration via Multi-Head Speculative Decoding, https://arxiv.org/abs/2507.23777
- Shukai Gong, Yiyang Fu, Fengyuan Ran, Quyu Kong, Feng Zhou, 31 Jul 2025, TPP-SD: Accelerating Transformer Point Process Sampling with Speculative Decoding, https://arxiv.org/abs/2507.09252
- Songsheng Wang, Rucheng Yu, Zhihang Yuan, Chao Yu, Feng Gao, Yu Wang and Derek F. Wong, 30 Jul 2025, Spec-VLA: Speculative Decoding for Vision-Language-Action Models with Relaxed Acceptance, https://arxiv.org/abs/2507.22424
- Woojae Jeong, Aditya Kommineni, Kleanthis Avramidis, Colin McDaniel, Donald Berry, Myzelle Hughes, Thomas McGee, Elsi Kaiser, Dani Byrd, Assal Habibi, B. Rael Cahn, Idan A. Blank, Kristina Lerman, Dimitrios Pantazis, Sudarsana R. Kadiri, Takfarinas Medani, Shrikanth Narayanan, and Richard M. Leahy, 30 Jul 2025, Decoding Neural Signatures of Semantic Evaluations in Depression and Suicidality, https://arxiv.org/abs/2507.22313
- Ramchalam Kinattinkara Ramakrishnan, Zhaocong Yuan, Shaojie Zhuo, Chen Feng, Yicheng Lin, Chenzheng Su, Xiaopeng Zhang, 31 Jul 2025, OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding, https://arxiv.org/abs/2507.02659
- Manh Nguyen, Sunil Gupta and Hung Le, 4 Aug 2025, CAAD: Context-Aware Adaptive Decoding for Truthful Text Generation, https://arxiv.org/abs/2508.02184
- Yike Zhang and Zhiyuan He and Huiqiang Jiang and Chengruidong Zhang and Yuqing Yang and Jianyong Wang and Lili Qiu, 4 Aug 2025, LeanK: Learnable K Cache Channel Pruning for Efficient Decoding, https://arxiv.org/abs/2508.02215
- Taehan Lee, Hyukjun Lee, 3 Aug 2025, Token Pruning in Audio Transformers: Optimizing Performance and Decoding Patch Importance, https://arxiv.org/abs/2504.01690
- Bolian Li, Yifan Wang, Anamika Lochab, Ananth Grama, Ruqi Zhang, 3 Aug 2025, Cascade Reward Sampling for Efficient Decoding-Time Alignment, https://arxiv.org/abs/2406.16306
- Fatih Gulec, Hamdan Awan, Nigel Wallbridge, Andrew W. Eckford, 5 Aug 2025, Decoding and Engineering the Phytobiome Communication for Smart Agriculture, https://arxiv.org/abs/2508.03584
- Jilong Li, Zhenxi Song, Jiaqi Wang, Meishan Zhang, Honghai Liu, Min Zhang, Zhiguo Zhang, 5 Aug 2025, BrainECHO: Semantic Brain Signal Decoding through Vector-Quantized Spectrogram Reconstruction for Whisper-Enhanced Text Generation, https://arxiv.org/abs/2410.14971
- Md Raisul Kibria, S\'ebastien Lafond, Janan Arslan, 6 Aug 2025, Decoding the Multimodal Maze: A Systematic Review on the Adoption of Explainability in Multimodal Attention-based Models, https://arxiv.org/abs/2508.04427
- Enyu Zhou, Kai Sheng, Hao Chen, Xin He, 6 Aug 2025, CARD: Cache-Assisted Parallel Speculative Decoding for Efficient Large Language Model Inference, https://arxiv.org/abs/2508.04462
- Shunqi Mao, Chaoyi Zhang, Weidong Cai, 6 Aug 2025, Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding, https://arxiv.org/abs/2503.10183
- Kang Liu and Zhuoqi Ma and Zikang Fang and Yunan Li and Kun Xie and Qiguang Miao, 7 Aug 2025, PriorRG: Prior-Guided Contrastive Pre-training and Coarse-to-Fine Decoding for Chest X-ray Report Generation, https://arxiv.org/abs/2508.05353
- Jungbin Cho, Junwan Kim, Jisoo Kim, Minseo Kim, Mingu Kang, Sungeun Hong, Tae-Hyun Oh, Youngjae Yu, 7 Aug 2025, DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding, https://arxiv.org/abs/2411.19527
- Hossein Entezari Zarch, Lei Gao, Chaoyi Jiang, Murali Annavaram, 7 Aug 2025, DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding, https://arxiv.org/abs/2504.05598
- Woojeong Kim, Junxiong Wang, Jing Nathan Yan, Mohamed Abdelfattah, Alexander M. Rush, 11 Aug 2025, OverFill: Two-Stage Models for Efficient Language Model Decoding, https://arxiv.org/abs/2508.08446
- Ziqi Wang, Hailiang Zhao, Cheng Bao, Wenzhuo Qian, Yuhao Yang, Xueqiang Sun, Shuiguang Deng, 1 Aug 2025, XFMNet: Decoding Cross-Site and Nonstationary Water Patterns via Stepwise Multimodal Fusion for Long-Term Water Quality Forecasting, https://arxiv.org/abs/2508.08279
- Lingzhe Zhang, Liancheng Fang, Chiming Duan, Minghua He, Leyi Pan, Pei Xiao, Shiyu Huang, Yunpeng Zhai, Xuming Hu, Philip S. Yu, Aiwei Liu, 12 Aug 2025, A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models, https://arxiv.org/abs/2508.08712
- Xingyou Song, Dara Bahri, 12 Aug 2025, Decoding-based Regression, https://arxiv.org/abs/2501.19383
- Qiaoqiao Ren, Remko Proesmans, Yuanbo Hou, Francis wyffels, and Tony Belpaeme, 12 Aug 2025, Touch and Tell: Multimodal Decoding of Human Emotions and Social Gestures for Robots, https://arxiv.org/abs/2412.03300
- Changhong Jing, Yan Liu, Shuqiang Wang, Bruce X.B. Yu, Gong Chen, Zhejing Hu, Zhi Zhang, Yanyan Shen, 15 Aug 2025, PTSM: Physiology-aware and Task-invariant Spatio-temporal Modeling for Cross-Subject EEG Decoding, https://arxiv.org/abs/2508.11357
- Oscar Ma\~nas, Pierluca D'Oro, Koustuv Sinha, Adriana Romero-Soriano, Michal Drozdzal, Aishwarya Agrawal, 15 Aug 2025, Controlling Multimodal LLMs via Reward-guided Decoding, https://arxiv.org/abs/2508.11616
- Pengcheng Huang, Shuhao Liu, Zhenghao Liu, Yukun Yan, Shuo Wang, Zulong Chen, Tong Xiao, 18 Aug 2025, PC-Sampler: Position-Aware Calibration of Decoding Bias in Masked Diffusion Models, https://arxiv.org/abs/2508.13021
- Jihoon Park, Seungeun Oh, and Seong-Lyun Kim, 18 Aug 2025, Energy-Efficient Wireless LLM Inference via Uncertainty and Importance-Aware Speculative Decoding, https://arxiv.org/abs/2508.12590
- Yuanhao Li, Badong Chen, Wenjun Bai, Yasuharu Koike, Okito Yamashita, 5 Aug 2025, Robust Sparse Bayesian Learning Based on Minimum Error Entropy for Noisy High-Dimensional Brain Activity Decoding, https://arxiv.org/abs/2508.11657
- Dylan Cope, Peter McBurney, 18 Aug 2025, Decoding Communications with Partial Information, https://arxiv.org/abs/2508.13326
- Oriana Presacan, Alireza Nik, Vajira Thambawita, Bogdan Ionescu, Michael Riegler, 19 Aug 2025, A Comparative Study of Decoding Strategies in Medical Text Generation, https://arxiv.org/abs/2508.13580
- Sanggeon Yun, Raheeb Hassan, Ryozo Masukawa, Mohsen Imani, 20 Aug 2025, MissionHD: Data-Driven Refinement of Reasoning Graph Structure through Hyperdimensional Causal Path Encoding and Decoding, https://arxiv.org/abs/2508.14746
- Majid Daliri, Christopher Musco, Ananda Theertha Suresh, 20 Aug 2025, Coupling without Communication and Drafter-Invariant Speculative Decoding, https://arxiv.org/abs/2408.07978
- Julian Oestreich and Lydia M\"uller, 21 Aug 2025, Evaluating Structured Decoding for Text-to-Table Generation: Evidence from Three Datasets, https://arxiv.org/abs/2508.15910
- Yicheng Ji, Jun Zhang, Heming Xia, Jinpeng Chen, Lidan Shou, Gang Chen, Huan Li, 22 Aug 2025, SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning, https://arxiv.org/abs/2508.16201
- Lingxiao Li, Salar Rahili, Yiwei Zhao, 20 Aug 2025, Correctness-Guaranteed Code Generation via Constrained Decoding, https://arxiv.org/abs/2508.15866
- Jungyoub Cha, Hyunjong Kim, Sungzoon Cho, 22 Aug 2025, SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences, https://arxiv.org/abs/2505.20776
- Xuekang Wang, Shengyu Zhu, Xueqi Cheng, 25 Aug 2025, Speculative Safety-Aware Decoding, https://arxiv.org/abs/2508.17739
- Jaydip Sen, Subhasis Dasgupta, Hetvi Waghela, 21 Aug 2025, Confidence-Modulated Speculative Decoding for Large Language Models, https://arxiv.org/abs/2508.15371
- Abdul Rehman Akbar, Usama Sajjad, Ziyu Su, Wencheng Li, Fei Xing, Jimmy Ruiz, Wei Chen, Muhammad Khalid Khan Niazi, 22 Aug 2025, CellEcoNet: Decoding the Cellular Language of Pathology with Deep Learning for Invasive Lung Adenocarcinoma Recurrence Prediction, https://arxiv.org/abs/2508.16742
- Ziyin Zhang and Jiahao Xu and Tian Liang and Xingyu Chen and Zhiwei He and Rui Wang and Zhaopeng Tu, 24 Aug 2025, Draft Model Knows When to Stop: Self-Verification Speculative Decoding for Long-Form Generation, https://arxiv.org/abs/2411.18462
- Itai Gat, Heli Ben-Hamu, Marton Havasi, Daniel Haziza, Jeremy Reizenstein, Gabriel Synnaeve, David Lopez-Paz, Brian Karrer, Yaron Lipman, 4 Sep 2025, Set Block Decoding is a Language Model Inference Accelerator, https://arxiv.org/abs/2509.04185
- Iro Lim, Haein Ji, and Byungjun Kim, 4 Sep 2025, Decoding the Poetic Language of Emotion in Korean Modern Poetry: Insights from a Human-Labeled Dataset and AI Modeling, https://arxiv.org/abs/2509.03932
- Shengyin Sun and Yiming Li and Xing Li and Yingzhao Lian and Weizhe Lin and Hui-Ling Zhen and Zhiyuan Yang and Chen Chen and Xianzhi Yu and Mingxuan Yuan and Chen Ma, 30 Aug 2025, Scaling Up, Speeding Up: A Benchmark of Speculative Decoding for Efficient LLM Test-Time Scaling, https://arxiv.org/abs/2509.04474
- Bruno Aristimunha, Dung Truong, Pierre Guetschel, Seyed Yahya Shirazi, Isabelle Guyon, Alexandre R. Franco, Michael P. Milham, Aviv Dotan, Scott Makeig, Alexandre Gramfort, Jean-Remi King, Marie-Constance Corsi, Pedro A. Vald\'es-Sosa, Amit Majumdar, Alan Evans, Terrence J Sejnowski, Oren Shriki, Sylvain Chevallier, Arnaud Delorme, 5 Sep 2025, EEG Foundation Challenge: From Cross-Task to Cross-Subject EEG Decoding, https://arxiv.org/abs/2506.19141
- Dylan Cutler, Arun Kandoor, Nishanth Dikkala, Nikunj Saunshi, Xin Wang, Rina Panigrahy, 26 Aug 2025, StagFormer: Time Staggering Transformer Decoding for RunningLayers In Parallel, https://arxiv.org/abs/2501.15665
- Sining Zhoubian, Dan Zhang, Yuxiao Dong, Jie Tang, 27 Aug 2025, ReST-RL: Achieving Accurate Code Reasoning of LLMs with Optimized Self-Training and Decoding, https://arxiv.org/abs/2508.19576
- Afrar Jahin, Yi Pan, Yingfeng Wang, Tianming Liu, Wei Zhang, 26 Aug 2025, Quantum-Classical Hybrid Molecular Autoencoder for Advancing Classical Decoding, https://arxiv.org/abs/2508.19394
- Yang Sun, Lixin Zou, Dan Luo, Zhiyong Xie, Long Zhang, Liming Dong, Yunwei Zhao, Xixun Lin, Yanxiong Lu, Chenliang Li, 27 Aug 2025, LFD: Layer Fused Decoding to Exploit External Knowledge in Retrieval-Augmented Generation, https://arxiv.org/abs/2508.19614
- Pengxiang Li, Yefan Zhou, Dilxat Muhtar, Lu Yin, Shilin Yan, Li Shen, Yi Liang, Soroush Vosoughi, Shiwei Liu, 27 Aug 2025, Diffusion Language Models Know the Answer Before Decoding, https://arxiv.org/abs/2508.19982
- Zhixuan Liang, Yizhuo Li, Tianshuo Yang, Chengyue Wu, Sitong Mao, Liuao Pei, Xiaokang Yang, Jiangmiao Pang, Yao Mu, Ping Luo, 27 Aug 2025, Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies, https://arxiv.org/abs/2508.20072
- Seongwan Park, Taeklim Kim, Youngjoong Ko, 27 Aug 2025, Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval, https://arxiv.org/abs/2506.00041
- Zhuoran Yu and Yong Jae Lee, 27 Aug 2025, How Multimodal LLMs Solve Image Tasks: A Lens on Visual Grounding, Task Reasoning, and Answer Decoding, https://arxiv.org/abs/2508.20279
- Weizhi Gao, Xiaorui Liu, Feiyi Wang, Dan Lu, Junqi Yin, 28 Aug 2025, Decoding Memories: An Efficient Pipeline for Self-Consistency Hallucination Detection, https://arxiv.org/abs/2508.21228
- Haofei Yin, Mengbai Xiao, Tinghong Li, Xiao Zhang, Dongxiao Yu, Guanghui Zhang, 29 Aug 2025, SpecPipe: Accelerating Pipeline Parallelism-based LLM Inference with Speculative Decoding, https://arxiv.org/abs/2504.04104
- Mingyu Yang, Jae-Young Choi, Kihyo Moon, Minsung Jang, and Eunjoo Joen, 1 Sep 2025, DSDE: Dynamic Speculative Decoding with KLD Stability for Real-World Serving, https://arxiv.org/abs/2509.01083
- Xiaoqiang Lin, Aritra Ghosh, Bryan Kian Hsiang Low, Anshumali Shrivastava, Vijai Mohan, 1 Sep 2025, REFRAG: Rethinking RAG based Decoding, https://arxiv.org/abs/2509.01092
- Kyeongman Park, Nakyeong Yang, Kyomin Jung, 2 Sep 2025, Avoidance Decoding for Diverse Multi-Branch Story Generation, https://arxiv.org/abs/2509.02170
- Erfan Baghaei Potraghloo, Seyedarmin Azizi, Souvik Kundu, Massoud Pedram, 2 Sep 2025, Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation, https://arxiv.org/abs/2509.02510
- Parv Kapoor, Akila Ganlath, Changliu Liu, Sebastian Scherer, Eunsuk Kang, 1 Sep 2025, Constrained Decoding for Robotics Foundation Models, https://arxiv.org/abs/2509.01728
- Minxu Liu, Donghai Guan, Chuhang Zheng, Chunwei Tian, Jie Wen, Qi Zhu, 2 Sep 2025, ViEEG: Hierarchical Visual Neural Representation for EEG Brain Decoding, https://arxiv.org/abs/2505.12408
- GodsGift Uzor, Tania-Amanda Nkoyo Fredrick Eneye, Chukwuebuka Ijezue, 5 Sep 2025, Advanced Brain Tumor Segmentation Using EMCAD: Efficient Multi-scale Convolutional Attention Decoding, https://arxiv.org/abs/2509.05431
- Ishaan Verma, 6 Sep 2025, Decoding Latent Attack Surfaces in LLMs: Prompt Injection via HTML in Web Summarization, https://arxiv.org/abs/2509.05831
- Jipeng Li, Zeyu Gao, Yubin Qi, Hande Dong, Weijian Chen, Qiang Lin, 9 Sep 2025, Unleashing the True Potential of LLMs: A Feedback-Triggered Self-Correction with Long-Term Multipath Decoding, https://arxiv.org/abs/2509.07676
- Xiaomeng Hu, Fei Huang, Chenhan Yuan, Junyang Lin, Tsung-Yi Ho, 1 Sep 2025, CARE: Decoding Time Safety Alignment via Rollback and Introspection Intervention, https://arxiv.org/abs/2509.06982
- Tom Kempton and Stuart Burrell, 9 Sep 2025, Local Normalization Distortion and the Thermodynamic Formalism of Decoding Strategies for Large Language Models, https://arxiv.org/abs/2503.21929
- Hoshitaro Ohnishi and Hideo Mukai, 12 Sep 2025, A Symmetry-Integrated Approach to Surface Code Decoding, https://arxiv.org/abs/2509.10164
- Xing Gao, Zherui Huang, Weiyao Lin, Xiao Sun, 11 Sep 2025, ProgD: Progressive Multi-scale Decoding with Dynamic Graphs for Joint Multi-agent Motion Forecasting, https://arxiv.org/abs/2509.09210
- Weibin Feng, Ran Tao, John Cartlidge, Jin Zheng, 18 Sep 2025, VMDNet: Time Series Forecasting with Leakage-Free Samplewise Variational Mode Decomposition and Multibranch Decoding, https://arxiv.org/abs/2509.15394
- Senkang Hu, Xudong Han, Jinqi Jiang, Yihang Tao, Zihan Fang, Sam Tak Wu Kwong, Yuguang Fang, 19 Sep 2025, Distribution-Aligned Decoding for Efficient LLM Task Adaptation, https://arxiv.org/abs/2509.15888
- Wei Zhong, Manasa Bharadwaj, Yixiao Wang, Nikhil Verma, Yipeng Ji, Chul Lee, 19 Sep 2025, Cross-Attention Speculative Decoding, https://arxiv.org/abs/2505.24544
- Sudeshna Jana, Manjira Sinha and Tirthankar Dasgupta, 14 Sep 2025, Decoding Plastic Toxicity: An Intelligent Framework for Conflict-Aware Relational Metapath Extraction from Scientific Abstracts, https://arxiv.org/abs/2509.11330
- Shanmuka Sadhu, Arca Baran, Preeti Pandey, and Ayush Kumar, 15 Sep 2025, Task Decoding based on Eye Movements using Synthetic Data Augmentation, https://arxiv.org/abs/2509.11547
- Cheng-Yang Tsai, Tzu-Wei Huang, Shao-Yu Wei, Guan-Wei Chen, Hung-Ying Chu, Yu-Cheng Lin, 14 Sep 2025, Decoding Musical Origins: Distinguishing Human and AI Composers, https://arxiv.org/abs/2509.11369
- Wei-Hsin Yeh, Yu-An Su, Chih-Ning Chen, Yi-Hsueh Lin, Calvin Ku, Wen-Hsin Chiu, Min-Chun Hu, Lun-Wei Ku, 15 Sep 2025, CoachMe: Decoding Sport Elements with a Reference-Based Coaching Instruction Generation Model, https://arxiv.org/abs/2509.11698
- Yudong Shen, Wenyu Wu, Jiali Mao, Yixiao Tong, Guoping Liu, Chaoya Wang, 15 Sep 2025, Bridging the Gap Between Sparsity and Redundancy: A Dual-Decoding Framework with Global Context for Map Inference, https://arxiv.org/abs/2509.11731
- Haiduo Huang, Fuwei Yang, Zhenhua Liu, Xuanwu Yin, Dong Li, Pengju Ren, Emad Barsoum, 15 Sep 2025, SpecVLM: Fast Speculative Decoding in Vision-Language Models, https://arxiv.org/abs/2509.11815
- Hongxiang Zhang, Hao Chen, Muhao Chen, Tianyi Zhang, 15 Sep 2025, Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation, https://arxiv.org/abs/2505.23657
- Yurui Chang, Bochuan Cao, Lu Lin, 13 Sep 2025, Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation, https://arxiv.org/abs/2503.03106
- Yeongbin Seo and Dongha Lee and Jaehyung Kim and Jinyoung Yeo, 18 Sep 2025, Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning, https://arxiv.org/abs/2509.15188
- Matan Avitan, Moran Baruch, Nir Drucker, Itamar Zimerman, Yoav Goldberg, 10 Sep 2025, Efficient Decoding Methods for Language Models on Encrypted Data, https://arxiv.org/abs/2509.08383
- Ethan G. Rogers, Cheng Wang, 1 Oct 2025, Ultra-Efficient Decoding for End-to-End Neural Compression and Reconstruction, https://arxiv.org/abs/2510.01407
- Avinash Kumar, Sujay Sanghavi, Poulami Das, 1 Oct 2025, HiSpec: Hierarchical Speculative Decoding for LLMs, https://arxiv.org/abs/2510.01336
- Jameson Sandler, Ahmet \"Ust\"un, Marco Romanelli, Sara Hooker, Ferdinando Fioretto, 2 Oct 2025, The Disparate Impacts of Speculative Decoding, https://arxiv.org/abs/2510.02128
- Shenxu Chang, Junchi Yu, Weixing Wang, Yongqiang Chen, Jialin Yu, Philip Torr, Jindong Gu, 30 Sep 2025, TraceDet: Hallucination Detection from the Decoding Trace of Diffusion Large Language Models, https://arxiv.org/abs/2510.01274
- Juntao Zhao, Wenhao Lu, Sheng Wang, Lingpeng Kong, Chuan Wu, 2 Oct 2025, QSpec: Speculative Decoding with Complementary Quantization Schemes, https://arxiv.org/abs/2410.11305
- Xuan Luo, Weizhi Wang, Xifeng Yan, 13 Oct 2025, Direct Multi-Token Decoding, https://arxiv.org/abs/2510.11958
- Shutong Wu and Jiawei Zhang, 30 Sep 2025, Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models, https://arxiv.org/abs/2510.00294
- Runyan Tan, Shuang Wu, Phillip Howard, 30 Sep 2025, $p$-less Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding, https://arxiv.org/abs/2509.23234
- Kuiye Ding and Fanda Fan and Chunyi Hou and Zheya Wang and Lei Wang and Zhengxin Yang and Jianfeng Zhan, 23 Sep 2025, TimeMosaic: Temporal Heterogeneity Guided Time Series Forecasting via Adaptive Granularity Patch and Segment-wise Decoding, https://arxiv.org/abs/2509.19406
- Ruanjun Li, Ziheng Liu, Yuanming Shi, Jiawei Shao, Chi Zhang, Xuelong Li, 19 Sep 2025, Pipeline Parallelism is All You Need for Optimized Early-Exit Based Self-Speculative Decoding, https://arxiv.org/abs/2509.19368
- Haiduo Huang, Jiangcheng Song, Yadong Zhang, Pengju Ren, 28 Oct 2025, SpecKD: Speculative Decoding for Effective Knowledge Distillation of LLMs, https://arxiv.org/abs/2510.24021
- Yangchao Wu, Zongyue Qin, Alex Wong, Stefano Soatto, 27 Oct 2025, STree: Speculative Tree Decoding for Hybrid State-Space Models, https://arxiv.org/abs/2505.14969
- Sibo Xiao, Jinyuan Fu, Zhongle Xie, Lidan Shou, 28 Oct 2025, TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs, https://arxiv.org/abs/2510.15545
- Roman Garipov, Fedor Velikonivtsev, Ivan Ermakov, Ruslan Svirschevski, Vage Egiazarian, Max Ryabinin, 28 Oct 2025, AutoJudge: Judge Decoding Without Manual Annotation, https://arxiv.org/abs/2504.20039
- Hongyi Liu, Jiaji Huang, Zhen Jia, Youngsuk Park, Yu-Xiang Wang, 22 Oct 2025, Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs, https://arxiv.org/abs/2510.20064
- Georgios Mentzelopoulos, Ioannis Asmanis, Konrad P. Kording, Eva L. Dyer, Kostas Daniilidis, Flavia Vitale, 23 Oct 2025, A Scalable, Causal, and Energy Efficient Framework for Neural Decoding with Spiking Neural Networks, https://arxiv.org/abs/2510.20683
- Jan Sobotka, Luca Baroni, J\'an Antol\'ik, 23 Oct 2025, MEIcoder: Decoding Visual Stimuli from Neural Activity by Leveraging Most Exciting Inputs, https://arxiv.org/abs/2510.20762
- Zhiyu Lin, Jingwen Yang, Jiale Zhao, Meng Liu, Sunzhu Li, Benyou Wang, 23 Oct 2025, Decoding the Ear: A Framework for Objectifying Expressiveness from Human Preference Through Efficient Alignment, https://arxiv.org/abs/2510.20513
- Clara Mohri, Haim Kaplan, Tal Schuster, Yishay Mansour, Amir Globerson, 23 Oct 2025, Fast Inference via Hierarchical Speculative Decoding, https://arxiv.org/abs/2510.19705
- Chang Wu, Zhiyuan Liu, Wen Shu, Liang Wang, Yanchen Luo, Wenqiang Lei, Yatao Bian, Junfeng Fang, Xiang Wang, 19 Oct 2025, 3D-GSRD: 3D Molecular Graph Auto-Encoder with Selective Re-mask Decoding, https://arxiv.org/abs/2510.16780
- Zheng Huang, Enpei Zhang, Yinghao Cai, Weikang Qiu, Carl Yang, Elynn Chen, Xiang Zhang, Rex Ying, Dawei Zhou, Yujun Yan, 17 Oct 2025, Seeing Through the Brain: New Insights from Decoding Visual Stimuli with fMRI, https://arxiv.org/abs/2510.16196
- Shijing Hu, Jingyang Li, Xingyu Xie, Zhihui Lu, Kim-Chuan Toh and Pan Zhou, 19 Oct 2025, GRIFFIN: Effective Token Alignment for Faster Speculative Decoding, https://arxiv.org/abs/2502.11018
- Yangxuan Zhou, Sha Zhao, Jiquan Wang, Haiteng Jiang, Shijian Li, Tao Li, Gang Pan, 22 Sep 2025, SPICED: A Synaptic Homeostasis-Inspired Framework for Unsupervised Continual EEG Decoding, https://arxiv.org/abs/2509.17439
- Sudhanshu Agrawal, Risheek Garrepalli, Raghavv Goel, Mingu Lee, Christopher Lott, Fatih Porikli, 22 Sep 2025, Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding, https://arxiv.org/abs/2509.18085
- Byeongho Yu, Changhun Lee, Jungyu Jin, Eunhyeok Park, 20 Sep 2025, PruneCD: Contrasting Pruned Self Model to Improve Decoding Factuality, https://arxiv.org/abs/2509.16598
- Wataru Hashimoto, Hidetaka Kamigaito, Taro Watanabe, 20 Sep 2025, Decoding Uncertainty: The Impact of Decoding Strategies for Uncertainty Estimation in Large Language Models, https://arxiv.org/abs/2509.16696
- Yun-Shiuan Chuang, Nikunj Harlalka, Sameer Narendran, Alexander Cheung, Sizhe Gao, Siddharth Suresh, Junjie Hu, Timothy T. Rogers, 20 Sep 2025, Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding, https://arxiv.org/abs/2501.17310
- Ziwei Wang, Hongbin Wang, Tianwang Jia, Xingyi He, Siyang Li, and Dongrui Wu, 19 Sep 2025, DBConformer: Dual-Branch Convolutional Transformer for EEG Decoding, https://arxiv.org/abs/2506.21140
- Divya Jyoti Bajpai and Manjesh Kumar Hanawal, 26 Oct 2025, FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference, https://arxiv.org/abs/2510.22641
- Chenheng Zhang, Tianqi Du, Jizhe Zhang, Mingqing Xiao, Yifei Wang, Yisen Wang and Zhouchen Lin, 23 Oct 2025, Language Ranker: A Lightweight Ranking framework for LLM Decoding, https://arxiv.org/abs/2510.21883
- Ranran Haoran Zhang, Soumik Dey, Ashirbad Mishra, Hansi Wu, Binbin Li, and Rui Zhang, 26 Oct 2025, Batch Speculative Decoding Done Right, https://arxiv.org/abs/2510.22876
- Shayne Longpre, Sneha Kudugunta, Niklas Muennighoff, I-Hung Hsu, Isaac Caswell, Alex Pentland, Sercan Arik, Chen-Yu Lee, Sayna Ebrahimi, 24 Oct 2025, ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality, https://arxiv.org/abs/2510.22037
- Eun Woo Im, Muhammad Kashif Ali, Vivek Gupta, 15 Oct 2025, Self-Augmented Visual Contrastive Decoding, https://arxiv.org/abs/2510.13315
- Yiming Wang, Pei Zhang, Siyuan Huang, Baosong Yang, Zhuosheng Zhang, Fei Huang, Rui Wang, 15 Oct 2025, Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding, https://arxiv.org/abs/2503.01422
- Thomas Walton, Darin Tsui, Aryan Musharaf, Amirali Aghazadeh, 25 Sep 2025, SpecMER: Fast Protein Generation with K-mer Guided Speculative Decoding, https://arxiv.org/abs/2509.21689
- Yizhou Zhang, Ning Lv, Teng Wang, Jisheng Dang, 26 Sep 2025, FastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft Learning, https://arxiv.org/abs/2509.21792
- Linxiao Zeng, Haoyun Deng, Kangyuan Shu, Shizhen Wang, 26 Sep 2025, Self-Speculative Biased Decoding for Faster Live Translation, https://arxiv.org/abs/2509.21740
- Aravindhan G, Yuvaraj Govindarajulu, Parin Shah, 26 Sep 2025, Decoding Deception: Understanding Automatic Speech Recognition Vulnerabilities in Evasion and Poisoning Attacks, https://arxiv.org/abs/2509.22060
- Shijing Hu, Jingyang Li, Zhihui Lu and Pan Zhou, 26 Sep 2025, Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding, https://arxiv.org/abs/2509.22134
- Hyungjune Bu, Chanjoo Jung, Minjae Kang, Jaehyung Kim, 26 Sep 2025, Personalized LLM Decoding via Contrasting Personal Preference, https://arxiv.org/abs/2506.12109
- Yann Bellec, 3 Oct 2025, Dream2Image : An Open Multimodal EEG Dataset for Decoding and Visualizing Dreams with Artificial Intelligence, https://arxiv.org/abs/2510.06252
- Xiangjun Mi and Frank Mueller, 5 Oct 2025, Toward Uncertainty-Aware and Generalizable Neural Decoding for Quantum LDPC Codes, https://arxiv.org/abs/2510.06257
- Mingyang Yao, Ke Chen, Shlomo Dubnov, Taylor Berg-Kirkpatrick, 8 Oct 2025, BACHI: Boundary-Aware Symbolic Chord Recognition Through Masked Iterative Decoding on Pop and Classical Music, https://arxiv.org/abs/2510.06528
- Yuxiang Wei, Yanteng Zhang, Xi Xiao, Tianyang Wang, Xiao Wang, Vince D. Calhoun, 7 Oct 2025, MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding, https://arxiv.org/abs/2505.15946
- Daniel Melcer, Sujan Gonugondla, Pramuditha Perera, Haifeng Qian, Wen-Hao Chiang, Yanjun Wang, Nihal Jain, Pranav Garg, Xiaofei Ma, Anoop Deoras, 7 Oct 2025, Approximately Aligned Decoding, https://arxiv.org/abs/2410.01103
- Gabriele Oliaro, Zhihao Jia, Daniel Campos and Aurick Qiao, 7 Oct 2025, SuffixDecoding: Extreme Speculative Decoding for Emerging AI Applications, https://arxiv.org/abs/2411.04975
- Kanghoon Yoon, Minsub Kim, Sungjae Lee, Joonhyung Lee, Sunghyeon Woo, Yeonjun In, Se Jung Kwon, Chanyoung Park, Dongsoo Lee, 26 Sep 2025, SelfJudge: Faster Speculative Decoding via Self-Supervised Judge Verification, https://arxiv.org/abs/2510.02329
- Guanghao Li, Zhihui Fu, Min Fang, Qibin Zhao, Ming Tang, Chun Yuan, Jun Wang, 28 Sep 2025, DiffuSpec: Unlocking Diffusion Language Models for Speculative Decoding, https://arxiv.org/abs/2510.02358
- Jingyuan Deng, Yujiu Yang, 3 Oct 2025, MaskCD: Mitigating LVLM Hallucinations by Image Head Masked Contrastive Decoding, https://arxiv.org/abs/2510.02790
- Jingze Zhu, Yongliang Wu, Wenbo Zhu, Jiawang Cao, Yanqiang Zheng, Jiawei Chen, Xu Yang, Bernt Schiele, Jonas Fischer, Xinting Hu, 3 Oct 2025, LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers, https://arxiv.org/abs/2507.04404
- Shirin Tavakoli Kafiabad and Andrea Schiffauerova and Ashkan Ebadi, 21 Oct 2025, Decoding Funded Research: Comparative Analysis of Topic Models and Uncovering the Effect of Gender and Geographic Location, https://arxiv.org/abs/2510.18803
- Zheyuan Lin, Siqi Cai, Haizhou Li, 17 Oct 2025, Decoding Listeners Identity: Person Identification from EEG Signals Using a Lightweight Spiking Transformer, https://arxiv.org/abs/2510.17879
- Yoshinari Fujinuma, 21 Oct 2025, Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge, https://arxiv.org/abs/2510.18196
- Sangyoon Bae, Mehdi Azabou, Jiook Cha, Blake Richards, 21 Oct 2025, Decoding Dynamic Visual Experience from Calcium Imaging via Cell-Pattern-Aware SSL, https://arxiv.org/abs/2510.18516
- Haiduo Huang, Jiangcheng Song, Wenzhe Zhao, Pengju Ren, 24 Sep 2025, FastEagle: Cascaded Drafting for Accelerating Speculative Decoding, https://arxiv.org/abs/2509.20416
- Cfir Avraham Hadar, Omer Shubi, Yoav Meiri, Amit Heshes, Yevgeni Berzak, 25 Sep 2025, Decoding Open-Ended Information Seeking Goals from Eye Movements in Reading, https://arxiv.org/abs/2505.02872
- Yueming Sun, Long Yang, 29 Sep 2025, Spatial-Functional awareness Transformer-based graph archetype contrastive learning for Decoding Visual Neural Representations from EEG, https://arxiv.org/abs/2509.24761
- Zhinan Xie, Peisong Wang, Jian Cheng, 28 Sep 2025, HiViS: Hiding Visual Tokens from the Drafter for Speculative Decoding in Vision-Language Models, https://arxiv.org/abs/2509.23928
- Xi Zhang, Zaiqiao Meng, Jake Lever, Edmond S. L. Ho, 27 Sep 2025, CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding, https://arxiv.org/abs/2509.23379
- Rajaa El Hamdani, Samy Haffoudhi, Nils Holzenberger, Fabian Suchanek, Thomas Bonald, and Fragkiskos D. Malliaros, 27 Sep 2025, Retrieval-Constrained Decoding Reveals Underestimated Parametric Knowledge in Language Models, https://arxiv.org/abs/2509.23417
- Jingyi Yang, Guanxu Chen, Xuhao Hu, Jing Shao, 28 Sep 2025, Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Step, https://arxiv.org/abs/2509.23924
- Nhan T. Luu, Duong T. Luu, Pham Ngoc Nam, Truong Cong Thang, 29 Sep 2025, Hybrid Layer-Wise ANN-SNN With Surrogate Spike Encoding-Decoding Structure, https://arxiv.org/abs/2509.24411
- Marco Bronzini, Carlo Nicolini, Bruno Lepri, Jacopo Staiano, Andrea Passerini, 29 Sep 2025, Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures, https://arxiv.org/abs/2509.25045
- Raghavv Goel, Sudhanshu Agrawal, Mukul Gagrani, Junyoung Park, Yifan Zao, He Zhang, Tian Liu, Yiping Yang, Xin Yuan, Jiuyan Lu, Chris Lott, Mingu Lee, 28 Jun 2025, VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs, https://arxiv.org/abs/2506.22694
- Michele Romani, Francesco Paissan, Andrea Foss\`a, Elisabetta Farella, 27 Sep 2025, Explicit modelling of subject dependency in BCI decoding, https://arxiv.org/abs/2509.23247
- Jiankun Wei, Abdulrahman Abdulrazzag, Tianchen Zhang, Adel Muursepp, Gururaj Saileshwar, 26 Sep 2025, When Speculation Spills Secrets: Side Channels via Speculative Decoding In LLMs, https://arxiv.org/abs/2411.01076
- Weijie Shi, Yue Cui, Yaguang Wu, Jingzhi Fang, Shibo Zhang, Mengze Li, Sirui Han, Jia Zhu, Jiajie Xu, Xiaofang Zhou, 28 Sep 2025, Semantic-guided Diverse Decoding for Large Language Model, https://arxiv.org/abs/2506.23601
- Wenquan Lu, Yuechuan Yang, Kyle Lee, Yanshu Li, Enqi Liu, 28 Sep 2025, Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer, https://arxiv.org/abs/2507.02199
- Hang Lv, Sheng Liang, Hao Wang, Hongchao Gu, Yaxiong Wu, Wei Guo, Defu Lian, Yong Liu, Enhong Chen, 29 Sep 2025, CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering, https://arxiv.org/abs/2507.04756
- Shuntaro Suzuki, Shunya Nagashima, Masayuki Hirata, Komei Sugiura, 17 Oct 2025, Cortical-SSM: A Deep State Space Model for EEG and ECoG Motor Imagery Decoding, https://arxiv.org/abs/2510.15371
- Peng Ren and Hai Yang, 17 Oct 2025, LILAC: Long-sequence Incremental Low-latency Arbitrary Motion Stylization via Streaming VAE-Diffusion with Causal Decoding, https://arxiv.org/abs/2510.15392
- Jingxiang Zhang, Lujia Zhong, 5 Oct 2025, Decoding Emotion in the Deep: A Systematic Study of How LLMs Represent, Retain, and Express Emotion, https://arxiv.org/abs/2510.04064
- Guofu Xie, Chen Zhang, Xiao Zhang, Yunsheng Shi, Ting Yao and Jun Xu, 4 Oct 2025, Merge and Guide: Unifying Model Merging and Guided Decoding for Controllable Multi-Objective Generation, https://arxiv.org/abs/2510.03782
- Wonjun Kang, Kevin Galim, Seunghyuk Oh, Minjae Lee, Yuchen Zeng, Shuibai Zhang, Coleman Hooper, Yuezhou Hu, Hyung Il Koo, Nam Ik Cho, Kangwook Lee, 6 Oct 2025, ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs, https://arxiv.org/abs/2510.04767
- Alireza Nik, Michael A. Riegler, P{\aa}l Halvorsen, 6 Oct 2025, Energy-Conscious LLM Decoding: Impact of Text Generation Strategies on GPU Energy Consumption, https://arxiv.org/abs/2502.11723
- Zongle Huang, Lei Zhu, Zongyuan Zhan, Ting Hu, Weikai Mao, Xianzhi Yu, Yongpan Liu, Tianyu Zhang, 6 Oct 2025, MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE, https://arxiv.org/abs/2505.19645
- Zhenhua Liu, Lijun Li, Ruizhe Chen, Yuxian Jiang, Tong Zhu, Zhaochen Su, Wenliang Chen, Jing Shao, 4 Oct 2025, Evolutionary Guided Decoding: Iterative Value Refinement for LLMs, https://arxiv.org/abs/2503.02368
- Alexandre M\"osching, Housen Li, Axel Munk, 4 Oct 2025, Quick Adaptive Ternary Segmentation: An Efficient Decoding Procedure For Hidden Markov Models, https://arxiv.org/abs/2305.18578
- Atul Shree, Harshith Jupuru, 10 Oct 2025, FLToP CTC: Frame-Level Token Pruning via Relative Threshold for Efficient and Memory-Saving Decoding on Diverse Platforms, https://arxiv.org/abs/2510.09085
- Linfeng Wang, Susana Campino, Taane G. Clark, Jody E. Phelan, 9 Oct 2025, Decoding Positive Selection in Mycobacterium tuberculosis with Phylogeny-Guided Graph Attention Models, https://arxiv.org/abs/2510.08703
- Marco Siino and Giuseppe Bonomo and Rosario Sorbello and Ilenia Tinnirello, 10 Oct 2025, Investigating the Impact of Rational Dilated Wavelet Transform on Motor Imagery EEG Decoding with Deep Learning Models, https://arxiv.org/abs/2510.09242
- Feihan Feng, Jingxin Nie, 10 Oct 2025, Brain2Text Decoding Model Reveals the Neural Mechanisms of Visual Semantic Processing, https://arxiv.org/abs/2503.22697
- Enshu Liu, Qian Chen, Xuefei Ning, Shengen Yan, Guohao Dai, Zinan Lin, Yu Wang, 23 Oct 2025, Distilled Decoding 2: One-step Sampling of Image Auto-regressive Models with Conditional Score Distillation, https://arxiv.org/abs/2510.21003
- Enshu Liu, Xuefei Ning, Yu Wang, Zinan Lin, 23 Oct 2025, Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching, https://arxiv.org/abs/2412.17153
- Payel Bhattacharjee, Fengwei Tian, Meiyu Zhong, Guangyi Zhang, Osvaldo Simeone, Ravi Tandon, 11 Oct 2025, Conformal Sparsification for Bandwidth-Efficient Edge-Cloud Speculative Decoding, https://arxiv.org/abs/2510.09942
- James Y. Huang, Sailik Sengupta, Daniele Bonadiman, Yi-An Lai, Arshit Gupta, Nikolaos Pappas, Saab Mansour, Katrin Kirchhoff, Dan Roth, 12 Oct 2025, DeAL: Decoding-time Alignment for Large Language Models, https://arxiv.org/abs/2402.06147
- Beomsik Cho, Jaehyung Kim, 11 Oct 2025, Revisit What You See: Disclose Language Prior in Vision Tokens for LVLM Decoding, https://arxiv.org/abs/2506.09522
- Faruk Alpay, Hamdi Alakkad, 3 Oct 2025, Truth-Aware Decoding: A Program-Logic Approach to Factual Language Generation, https://arxiv.org/abs/2510.07331
- Timon Klein, Piotr Minakowski and Sebastian Sager, 9 Oct 2025, Mitigating Subject Dependency in EEG Decoding with Subject-Specific Low-Rank Adapters, https://arxiv.org/abs/2510.08059
- Jaeseong Lee, seung-won hwang, Aurick Qiao, Gabriele Oliaro, Ye Wang, Samyam Rajbhandari, 8 Oct 2025, OWL: Overcoming Window Length-Dependence in Speculative Decoding for Long-Context Inputs, https://arxiv.org/abs/2510.07535
- Jannek Ulm, Kevin Du, V\'esteinn Sn{\ae}bjarnarson, 9 Oct 2025, Contrastive Decoding for Synthetic Data Generation in Low-Resource Language Modeling, https://arxiv.org/abs/2510.08245
- Shawnak Shivakumar, Jefferson Hernandez, 7 Oct 2025, Decoding the dark proteome: Deep learning-enabled discovery of druggable enzymes in Wuchereria bancrofti, https://arxiv.org/abs/2510.07337
- Zheyuan Liu, Zhangchen Xu, Guangyao Dou, Xiangchi Yuan, Zhaoxuan Tan, Radha Poovendran, Meng Jiang, 23 Sep 2025, Steering Multimodal Large Language Models Decoding for Context-Aware Safety, https://arxiv.org/abs/2509.19212
- Yuhao Shen, Junyi Shen, Quan Kong, Tianyu Liu, Yao Lu and Cong Wang, 23 Sep 2025, Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism, https://arxiv.org/abs/2506.01979
- Yuu Jinnai, 22 Oct 2025, Re-evaluating Minimum Bayes Risk Decoding for Automatic Speech Recognition, https://arxiv.org/abs/2510.19471
- Fr\'ed\'eric Berdoz, Luca A. Lanzend\"orfer, Ren\'e Caky, Roger Wattenhofer, 30 Sep 2025, Alignment-Aware Decoding, https://arxiv.org/abs/2509.26169
- Piotr Komorowski, Elena Golimblevskaia, Reduan Achtibat, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek, 30 Sep 2025, Attribution-Guided Decoding, https://arxiv.org/abs/2509.26307
- Paloma Garc\'ia-de-Herreros and Philipp Slusallek and Dietrich Klakow and Vagrant Gautam, 6 Oct 2025, Decoding Partial Differential Equations: Cross-Modal Adaptation of Decoder-only Models to PDEs, https://arxiv.org/abs/2510.05278
- Rohan Arni and Carlos Blanco, 6 Oct 2025, Physics-Informed Neural Networks with Fourier Features and Attention-Driven Decoding, https://arxiv.org/abs/2510.05385
- Shrenik Bhansali, Larry Heck, 6 Oct 2025, Draft, Verify, and Improve: Toward Training-Aware Speculative Decoding, https://arxiv.org/abs/2510.05421
- Xueyan Li, Guinan Su, Mrinmaya Sachan, Jonas Geiping, 7 Oct 2025, Sample Smart, Not Hard: Correctness-First Decoding for Better Reasoning in LLMs, https://arxiv.org/abs/2510.05987
- Qi Li, Runpeng Yu, Haiquan Lu, Xinchao Wang, 2 Oct 2025, Every Step Counts: Decoding Trajectories as Authorship Fingerprints of dLLMs, https://arxiv.org/abs/2510.05148
- Kangyu Wang, Zhiyun Jiang, Haibo Feng, Weijia Zhao, Lin Liu, Jianguo Li, Zhenzhong Lan, Weiyao Lin, 7 Oct 2025, CreditDecoding: Accelerating Parallel Decoding in Diffusion Large Language Models with Trace Credits, https://arxiv.org/abs/2510.06133
- Chenghao Yang, Lin Gui, Chenxiao Yang, Victor Veitch, Lizhu Zhang, Zhuokai Zhao, 6 Oct 2025, Let it Calm: Exploratory Annealed Decoding for Verifiable Reinforcement Learning, https://arxiv.org/abs/2510.05251
- Hao Yin, Guangzong Si, Zilei Wang, 7 Oct 2025, The Mirage of Performance Gains: Why Contrastive Decoding Fails to Mitigate Object Hallucinations in MLLMs?, https://arxiv.org/abs/2504.10020
- Kyungryul Back, Seongbeom Park, Milim Kim, Mincheol Kwon, SangHyeok Lee, Hyunyoung Lee, Junhee Cho, Seunghyun Park, Jinkyu Kim, 16 Oct 2025, Watermarking for Factuality: Guiding Vision-Language Models Toward Truth via Tri-layer Contrastive Decoding, https://arxiv.org/abs/2510.14304
More Research on Decoding Algorithms
- Decoding algorithms (overview)
— Non-autoregressive decoding
— Greedy decoding
— Top-k decoding
— Top-p decoding
— Min-P Sampling
— Flash decoding
— Beam search decoding
— Edit decoding
— Contrastive decoding
— Constrained decoding - Parallel decoding (overview)
— Blockwise parallel decoding
— n-gram parallel decoding
— Lookahead decoding
— Medusa decoding
— Consensus decoding - Speculative decoding (overview)
— Generalized speculative decoding
— Aggressive decoding
— Lookup decoding
— Retrieval lookup decoding
— Prompt lookup decoding
— Self speculative decoding
— Tree speculative decoding
— Superposed decoding
— Hierarchical speculative decoding
— Heuristic speculative decoding
— Multi-token speculative decoding
— Sequential speculative decoding
AI Books from Aussie AI
|
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |
|
RAG Optimization: Accurate and Efficient LLM Applications:
new book on RAG architectures:
Get your copy from Amazon: RAG Optimization |
|
Generative AI Applications book:
Get your copy from Amazon: Generative AI Applications |
|
Generative AI programming book:
Get your copy from Amazon: Generative AI in C++ |
|
CUDA C++ Optimization book:
Get your copy from Amazon: CUDA C++ Optimization |
|
CUDA C++ Debugging book:
Get your copy from Amazon: CUDA C++ Debugging |
More AI Research
Read more about: