Aussie AI

Non-Autoregression Optimizations

Last Updated 22 October, 2025

by David Spuler, Ph.D.

One of the biggest obstacles to fast inference of Large Language Models (LLMs) is that they emit one token at a time (e.g. one word at a time). This limits parallelism and means that the entire model must be re-run multiple times, once for each word (or subword token).

Why Autoregression?

The reason for this limitation is that the next word to output inherently relies on the prior word, which is kind of an unavoidable property of human language. But in LLM coding circles, this is called the "autoregression" problem, possibly because researchers tend to like big words.

Because of this issue, the LLM is designed so that when it emits a word, that word is then input into the model's next iteration to help it emit the next word. And that's slow because of multiple reasons:

The model runs for every token.
The model never produces 2 tokens (or more) in parallel.
The model cannot start working on the 2nd token before finishing the 1st token, which limits pipelining (a type of parallelism).

There is various research on fixing this latency problem, and achieving more parallelism. The research area is called "non-autoregression" optimizations.

Tokens and Non-Autoregression

Although much of the research into autoregression is major surgery to the LLM architecture, there's a simpler way to mitigate the inefficiency: bigger tokens. If the tokens are longer, then fewer are emitted for each piece of work done by the AI engine. So the model can run faster in terms of fewer iterations if the tokenizer chooses whole words rather than sub-words, or maybe even handles two-word common phrases as separate single tokens (i.e. multi-word tokens). Longer tokens therefore reduce inefficiencies from autoregression, but also reduce the total length of the input sequence, which also further reduces model execution (the transformer's attention algorithm is well-known to be quadratic in the size of the input sequence).

The downside to this is that it means more unique tokens, which increases the vocabulary size. And the model's complexity is somewhat dependent on the vocabulary size, so this increase with longer tokens means that the whole model is larger, and it runs slower.

Therefore, longer tokens reduce the latency time in terms of reducing the autoregression issue, but increase latency time by making the model larger overall. Maybe there's some happy trade-off here? Most of the current models seem to use a vocabulary of around 50,000 words. The vocabulary size becomes one of the meta-parameters of the model.

Research on Non-Autoregression Optimizations

Various strategies have been researched for improving the autoregressive bottleneck. Some example strategies include:

Token pruning
Token merging
Shallow decoder
Attention optimization (e.g. FlashAttention)
Parallel decoding
Aggressive decoding
Speculative decoding
Bidirectional decoding

General research papers on autoregression improvements include:

Marjan Ghazvininejad, Omer Levy, Yinhan Liu, and Luke Zettlemoyer, Mask predict: Parallel decoding of conditional masked language models, arXiv preprint arXiv:1904.09324, 2019, https://arxiv.org/abs/1904.09324
Jiatao Gu, James Bradbury, Caiming Xiong, Victor OK Li, and Richard Socher, Non-autoregressive neural machine translation, arXiv preprint arXiv:1711.02281, 2017, https://arxiv.org/abs/1711.02281
Junliang Guo, Linli Xu, and Enhong Chen, Jointly masked sequence-to-sequence model for non-autoregressive neural machine translation, In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 376–385, 2020, https://aclanthology.org/2020.acl-main.36/
Jason Lee, Elman Mansimov, and Kyunghyun Cho, Deterministic non-autoregressive neural sequence modeling by iterative refinement, arXiv preprint arXiv:1802.06901, 2018, https://arxiv.org/abs/1802.06901
Zhuohan Li, Zi Lin, Di He, Fei Tian, Tao Qin, Liwei Wang, and Tie-Yan Liu, Hint-based training for non-autoregressive machine translation, arXiv preprint arXiv:1909.06708, 2019, https://arxiv.org/abs/1909.06708
Chenze Shao, Jinchao Zhang, Yang Feng, Fandong Meng, and Jie Zhou. Minimizing the bag-of-ngrams difference for non-autoregressive neural machine translation. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 198–205, 2020, https://arxiv.org/abs/1911.09320
Zhiqing Sun, Zhuohan Li, Haoqing Wang, Di He, Zi Lin, and Zhihong Deng. Fast structured decoding for sequence models. Advances in Neural Information Processing Systems, 32, 2019, https://arxiv.org/abs/1910.11555
Yiren Wang, Fei Tian, Di He, Tao Qin, ChengXiang Zhai, and Tie-Yan Liu, Non-autoregressive machine translation with auxiliary regularization, In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 5377–5384, 2019, https://arxiv.org/abs/1902.10245
Bingzhen Wei, Mingxuan Wang, Hao Zhou, Junyang Lin, Jun Xie, and Xu Sun. Imitation learning for non-autoregressive neural machine translation. arXiv preprint arXiv:1906.02041, 2019, https://arxiv.org/abs/1906.02041
Michiel de Jong, Yury Zemlyanskiy, Joshua Ainslie, Nicholas FitzGerald, Sumit Sanghai, Fei Sha, and William Cohen. Fido: Fusion-in-decoder optimized for stronger performance and faster inference. arXiv preprint arXiv:2212.08153, Dec 2022, https://arxiv.org/abs/2212.08153
Sehoon Kim, Coleman Hooper, Thanakul Wattanawong, Minwoo Kang, Ruohan Yan, Hasan Genc, Grace Dinh, Qijing Huang, Kurt Keutzer, Michael W Mahoney, et al. Full stack optimization of transformer inference: a survey. arXiv preprint arXiv:2302.14017, 2023, https://arxiv.org/abs/2302.14017
Chitwan Saharia, William Chan, Saurabh Saxena, and Mohammad Norouzi. 2020. Non-autoregressive machine translation with latent alignments. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1098–1108, Online. Association for Computational Linguistics. https://arxiv.org/abs/2004.07437
Marjan Ghazvininejad, Vladimir Karpukhin, Luke Zettlemoyer, and Omer Levy. 2020. Aligned cross entropy for non-autoregressive machine translation. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 3515–3523. PMLR. http://proceedings.mlr.press/v119/ghazvininejad20a.html
Marjan Ghazvininejad, Omer Levy, and Luke Zettlemoyer. 2020. Semi-autoregressive training improves mask-predict decoding. arXiv preprint arXiv:2001.08785, https://arxiv.org/abs/2001.08785
Lihua Qian, Hao Zhou, Yu Bao, Mingxuan Wang, Lin Qiu, Weinan Zhang, Yong Yu, and Lei Li. 2020. Glancing transformer for non-autoregressive neural machine translation. arXiv preprint arXiv:2008.07905, https://arxiv.org/abs/2008.07905
Jiatao Gu, Xiang Kong, Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade, Dec 2020, https://arxiv.org/abs/2012.15833
Jason D. Lee, Elman Mansimov, and Kyunghyun Cho. Deterministic non-autoregressive neural sequence modeling by iterative refinement. In Proc. of EMNLP, 2018. https://arxiv.org/abs/1802.06901
Marjan Ghazvininejad, Omer Levy, Yinhan Liu, and Luke S. Zettlemoyer. Mask-predict: Parallel decoding of conditional masked language models. In Proc. of EMNLP, 2019. https://arxiv.org/abs/1904.09324
Chen, C., Borgeaud, S., Irving, G., Lespiau, J.-B., Sifre, L., and Jumper, J., Feb 2023, Accelerating large language model decoding with speculative sampling, arXiv preprint arXiv:2302.01318, https://arxiv.org/abs/2302.01318
Leviathan, Y., Kalman, M., and Matias, Y., Fast inference from transformers via speculative decoding, May 2023, https://arxiv.org/abs/2211.17192
Stern, M., Shazeer, N., and Uszkoreit, J., Nov 2018, Blockwise parallel decoding for deep autoregressive models, Advances in Neural Information Processing Systems, 31, https://arxiv.org/abs/1811.03115
Sotiris Anagnostidis, Dario Pavllo, Luca Biggio, Lorenzo Noci, Aurelien Lucchi, Thomas Hofmann, Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers, arXiv preprint, 2023, https://arxiv.org/abs/2305.15805
Xin Sun, Tao Ge, Furu Wei, and Houfeng Wang. Instantaneous grammatical error correction with shallow aggressive decoding. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 5937–5947, 2021. https://arxiv.org/abs/2106.04970, Code: https://github.com/AutoTemp/Shallow-Aggressive-Decoding (Aggressive decoding emits as many tokens as possible, combined with a shallow decoder architecture here.)
T. Ge, H. Xia, X. Sun, S. Chen, and F. Wei. Lossless acceleration for seq2seq generation with aggressive decoding. ArXiv, abs/2205.10350, 2022. https://arxiv.org/abs/2205.10350, Code: https://github.com/microsoft/unilm/tree/master/decoding (Aggressive decoding means emitting multiple tokens at a time, reducing autoregression; has a generalization that is similar to speculative decoding here.)
Marjan Ghazvininejad, Omer Levy, Yinhan Liu, and Luke Zettlemoyer. 2019. Mask-predict: Parallel decoding of conditional masked language models. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6114–6123. https://arxiv.org/abs/1904.09324 (Parallel decoding or "bidirectional" decoding, rather than left-to-right generation of tokens.)
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.0480 https://arxiv.org/abs/1810.04805 (Rather than left-to-right, uses "bidirectional" decoding)
Johannes von Oswald, Eyvind Niklasson, Maximilian Schlegel, Seijin Kobayashi, Nicolas Zucchet, Nino Scherrer, Nolan Miller, Mark Sandler, Blaise Agüera y Arcas, Max Vladymyrov, Razvan Pascanu, João Sacramento, Sep 2023, Uncovering mesa-optimization algorithms in Transformers, https://arxiv.org/abs/2309.05858 (Uses linear attention algorithm.)
X Li, S Chen, S Zhang, L Hou, Y Zhu, Z Xiao, 2023, Human Activity Recognition Using IR-UWB Radar: A Lightweight Transformer Approach, IEEE Geoscience and Remote Sensing Letters (Early Access), https://ieeexplore.ieee.org/document/10247554 (Linear attention.)
J Kasai, 2023, Towards Efficient, Customizable, and Communal Natural Language Processing, Ph.D. thesis, Computer Science and Engineering, University of Washington, https://www.proquest.com/openview/604084b574dcd05e41eb6e33682a3537/1 (More about shallow decoders.)
Y Chen, Y Li, A Xu, Q Sun, X Chen, C Xu, 2023, WAG-NAT: Window Attention and Generator Based Non-Autoregressive Transformer for Time Series Forecasting, ICANN 2023: Artificial Neural Networks and Machine Learning, pp. 293–304, https://link.springer.com/chapter/10.1007/978-3-031-44223-0_24, Code: https://github.com/cybisolated/WAG-NAT
S Bae, J Ko, H Song, SY Yun, Oct 2023, Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding, arXiv preprint arXiv:2310.05424, https://arxiv.org/pdf/2310.05424.pdf (Combination of early-exit with a "shallow-deep module" and parallel decoding.)
Nikolay Savinov, Junyoung Chung, Mikolaj Binkowski, Erich Elsen, and Aaron van den Oord. 2022. Step-unrolled denoising autoencoders for text generation. International Conference on Learning Representations. https://arxiv.org/abs/2112.06749
Y Zhang, Y Zhang, L Cui, G Fu, Oct 2023, Non-autoregressive Text Editing with Copy-aware Latent Alignments, arXiv preprint arXiv:2310.07821, https://arxiv.org/pdf/2310.07821.pdf
S Ren, Q Jia, KQ Zhu, arXiv preprint arXiv:2310.08152, Context Compression for Auto-regressive Transformers with Sentinel Tokens, Oct 2023, https://arxiv.org/pdf/2310.08152.pdf, Code: https://github.com/DRSY/KV_Compression
Tri Dao, Daniel Haziza, Francisco Massa, Grigory Sizov, October 13, 2023, Flash-Decoding for long-context inference, PyTorch Blog, https://pytorch.org/blog/flash-decoding/
Jesse Mu, Xiang Lisa Li, and Noah Goodman. July 2023. Learning to compress prompts with gist tokens. arXiv preprint arXiv:2304.08467. https://arxiv.org/abs/2304.08467 (Prompt compression.)
Yassir Fathullah, Puria Radmard, Adian Liusie, Mark J. F. Gales, 2024, Who Needs Decoders? Efficient Estimation of Sequence-Level Attributes with Proxies, Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics Volume 1: Long Papers, pages 1478–1496 March 17-22, 2024, https://aclanthology.org/2024.eacl-long.89.pdf (Non-autoregressive decoding methods in special use cases such as machine language translation.)
Ruchao Fan, 2024, Improving the Accuracy and Inference Efficiency for Low-resource Automatic Speech Recognition, Ph.D thesis, Electrical and Computer Engineering, University of California Los Angeles, https://escholarship.org/content/qt9281v84q/qt9281v84q_noSplash_28de3ba38c8c7a613d2fa945d28c1613.pdf (Uses bidirectional autoregressive predicting encoding for speech recognition.)
Zanlin Ni, Yulin Wang, Renping Zhou, Jiayi Guo, Jiny iHu, Zhiyuan Liu, Shiji Song, Yuan Yao, Gao Huang, 2024, Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis, https://openaccess.thecvf.com/content/CVPR2024/papers/Ni_Revisiting_Non-Autoregressive_Transformers_for_Efficient_Image_Synthesis_CVPR_2024_paper.pdf Code: https://github.com/LeapLabTHU/ImprovedNAT
Hanling Yi, Feng Lin, Hongbin Li, Peiyang Ning, Xiaotian Yu, Rong Xiao, 16 Apr 2024 (v2)], Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding, https://arxiv.org/abs/2402.11809 (Semi-autoregressive draft model with parallel verification.)
Feng Li,Jingxian Chen, Xuejun Zhang, 2023, A Survey of Non-Autoregressive Neural Machine Translation, Electronics 2023, 12(13), 2980, https://doi.org/10.3390/electronics12132980, https://www.mdpi.com/2079-9292/12/13/2980 https://www.mdpi.com/2079-9292/12/13/2980/pdf?version=1688953962 (A survey of language translation with non-autoregressive architectures.)
Yufan Jiang, Qiaozhi He, Xiaomin Zhuang, Zhihua Wu, Kunpeng Wang, Wenlai Zhao, Guangwen Yang, 8 Aug 2023 (v2), RecycleGPT: An Autoregressive Language Model with Recyclable Module, https://arxiv.org/abs/2308.03421 (Uses the idea of guessing the next token based on only a few preceding tokens with extra layers inside a Transformer.)
Aishwarya P S, Pranav Ajit Nair, Yashas Samaga, Toby Boyd, Sanjiv Kumar, Prateek Jain, Praneeth Netrapalli, 26 Mar 2024 (v3), Tandem Transformers for Inference Efficient LLMs, https://arxiv.org/abs/2402.08644 (A two-model architecture with a small autoregressive model and a larger model with non-autoregressive block decoding, which is similar to big-little inference and speculative decoding methods.)
Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Defossez, Jade Copet, Gabriel Synnaeve, Yossi Adi, Jan 2024, Masked Audio Generation using a Single Non-Autoregressive Transformer https://pages.cs.huji.ac.il/adiyoss-lab/MAGNeT/ https://arxiv.org/pdf/2401.04577.pdf Code: https://github.com/facebookresearch/audiocraft/blob/main/docs/MAGNET.md
Qi Zhang, Tianqi Du, Haotian Huang, Yifei Wang, Yisen Wang, 2024, Look Ahead or Look Around? ATheoretical Comparison Between Autoregressive and Masked Pretraining, https://openreview.net/pdf?id=2rPoTgEmjV Code: https://github.com/PKU-ML/LookAheadLookAround (Evaluates autoregressive and masked methods in training.)
Y Lin, Oct 2023, ProNet: Progressive Neural Network for Multi-Horizon Time Series Forecasting, arXiv preprint arXiv:2310.19322, https://arxiv.org/pdf/2310.19322.pdf
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V Le, and Ruslan Salakhutdinov. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860, 2019. https://arxiv.org/abs/1901.02860
Ashwin K. Vijayakumar, Michael Cogswell, Ramprasaath R. Selvaraju, Qing Sun, Stefan Lee, David J. Crandall, and Dhruv Batra. 2018. Diverse beam search for improved description of complex scenes. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 7371–7379. AAAI Press. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17329
Xiaohui Wang, Ying Xiong, Yang Wei, Mingxuan Wang, Lei Li Apr 2021, LightSeq: A High Performance Inference Library for Transformers, https://arxiv.org/pdf/2010.13887.pdf
Yiren Wang, Fei Tian, Di He, Tao Qin, ChengXiang Zhai, and Tie-Yan Liu. 2019, Non-autoregressive machine translation with auxiliary regularization. In Proc. of AAAI, 2019, https://arxiv.org/abs/1902.10245.
Xuezhe Ma, Chunting Zhou, Xian Li, Graham Neubig, and Eduard H. Hovy. FlowSeq: Non-autoregressive conditional sequence generation with generative flow. In Proc. of EMNLP, 2019. https://arxiv.org/abs/1909.02480.
Xiaosong Jia, Shaoshuai Shi, Zijun Chen, Li Jiang, Wenlong Liao, Tao He, Junchi Yan, 21 Mar 2024, AMP: Autoregressive Motion Prediction Revisited with Next Token Prediction for Autonomous Driving, https://arxiv.org/abs/2403.13331
David Spuler, March 2024, Chapter 26. Decoding Algorithms, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
Z Wang, L Wang, J Su, J Yao, Z Tu, 2023, Revisiting Non-Autoregressive Translation at Scale, https://arxiv.org/abs/2305.16155
S Norouzi, R Hosseinzadeh, F Perez, 2023, DiMS: Distilling Multiple Steps of Iterative Non-Autoregressive Transformers for Machine Translation, https://aclanthology.org/2023.findings-acl.542/
Raphael Shu, Jason Lee, Hideki Nakayama, and Kyunghyun Cho. 2020, Latent-variable non-autoregressive neural machine translation with deterministic inference using a delta posterior. In Proc. of AAAI, 2020. https://arxiv.org/abs/1908.07181
Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn, Oct 2022, EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start, https://arxiv.org/abs/2205.12209
Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. Generating long sequences with sparse transformers. https://openai.com/blog/sparse-transformers, 2019, https://arxiv.org/abs/1904.10509
Yisheng Xiao, Lijun Wu, Junliang Guo, Juntao Li, Min Zhang, Tao Qin, Tie-yan Liu, 6 Jul 2023 (v2), A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond, https://arxiv.org/pdf/2204.09269.pdf
Mahsa Khoshnoodi, Vinija Jain, Mingye Gao, Malavika Srikanth, Aman Chadha, 24 May 2024 (v2), A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models, https://arxiv.org/abs/2405.13019
Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer, 1 May 2024 (v6), LLM Inference Unveiled: Survey and Roofline Model Insights, https://arxiv.org/abs/2402.16363 Code: https://github.com/hahnyuan/LLM-Viewer
Li, S., Unanue, I.J., Piccardi, M. (2024). LayerGLAT: A Flexible Non-autoregressive Transformer for Single-Pass and Multi-pass Prediction. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14942. Springer, Cham. https://doi.org/10.1007/978-3-031-70344-7_14 https://link.springer.com/chapter/10.1007/978-3-031-70344-7_14 https://github.com/lsj72123/layer-GLAT
David Spuler, March 2024, Tokens and Non-Autoregression, in Generative AI in C++, https://www.aussieai.com/book/ch26-tokens-auto-regression
Du Cunxiao, 2024, Towards Faster Inference of Transformers: Strategies for Accelerating Decoding Processes, Ph.D. thesis, Computer Science, School of Computing and Information Systems, Singapore Management University, https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1611&context=etd_coll (Examines non-autoregressive decoding, speculative decoding and attention optimizations.)
Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, Tianqi Chen, Zhihao Jia, 23 Dec 2023, Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems, https://arxiv.org/abs/2312.15234
Ziqi Pang, Tianyuan Zhang, Fujun Luan, Yunze Man, Hao Tan, Kai Zhang, William T. Freeman, Yu-Xiong Wang, 2 Dec 2024, RandAR: Decoder-only Autoregressive Visual Generation in Random Orders, https://arxiv.org/abs/2412.01827 https://rand-ar.github.io/ (Attempt to parallelize image generation decoding by randomizing the order at which to create patches of an image.)
Yefei He, Feng Chen, Yuanyu He, Shaoxuan He, Hong Zhou, Kaipeng Zhang, Bohan Zhuang, 5 Dec 2024, ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality, https://arxiv.org/abs/2412.04062
Yuqing Wang, Shuhuai Ren, Zhijie Lin, Yujin Han, Haoyuan Guo, Zhenheng Yang, Difan Zou, Jiashi Feng, Xihui Liu, 19 Dec 2024, Parallelized Autoregressive Visual Generation, https://arxiv.org/abs/2412.15119 https://epiphqny.github.io/PAR-project
Z Ni, Y Wang, R Zhou, J Guo, J Hu, 2024, Revisiting non-autoregressive transformers for efficient image synthesis, http://openaccess.thecvf.com/content/CVPR2024/html/Ni_Revisiting_Non-Autoregressive_Transformers_for_Efficient_Image_Synthesis_CVPR_2024_paper.html
Alokesh Manna and Sujit K. Ghosh, 12 Aug 2025, Bayesian Models for Joint Selection of Features and Auto-Regressive Lags: Theory and Applications in Environmental and Financial Forecasting, https://arxiv.org/abs/2508.10055
Mihir Prabhudesai, Menging Wu, Amir Zadeh, Katerina Fragkiadaki, Deepak Pathak, 24 Jul 2025, Diffusion Beats Autoregressive in Data-Constrained Settings, https://arxiv.org/abs/2507.15857
Quang-Binh Nguyen, Minh Luu, Quang Nguyen, Anh Tran, Khoi Nguyen, 18 Jul 2025, CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models, https://arxiv.org/abs/2507.13984
Sunwoong Yang, Ricardo Vinuesa, Namwoo Kang, 18 Jul 2025, AI-Accelerated Flow Simulation: A Robust Auto-Regressive Framework for Long-Term CFD Forecasting, https://arxiv.org/abs/2412.05657
Haoyuan Wu, Haisheng Zheng, Shoubo Hu, Zhuolun He, Bei Yu, 18 Jul 2025, Architect of the Bits World: Masked Autoregressive Modeling for Circuit Generation Guided by Truth Table, https://arxiv.org/abs/2502.12751
Dario Coscia, Max Welling, Nicola Demo, Gianluigi Rozza, 18 Jul 2025, BARNN: A Bayesian Autoregressive and Recurrent Neural Network, https://arxiv.org/abs/2501.18665
Nirmit Joshi, Gal Vardi, Adam Block, Surbhi Goel, Zhiyuan Li, Theodor Misiakiewicz, Nathan Srebro, 11 Aug 2025, A Theory of Learning with Autoregressive Chain of Thought, https://arxiv.org/abs/2503.07932
Zhuokun Chen, Jugang Fan, Zhuowei Yu, Bohan Zhuang, Mingkui Tan, 28 Jul 2025, Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis, https://arxiv.org/abs/2507.20454
Kaif Shaikh, Franziska Boenisch, Adam Dziedzic, 28 Jul 2025, Implementing Adaptations for Vision AutoRegressive Model, https://arxiv.org/abs/2507.11441
Jiayu Zhang, Zhiyu Zhu, Xinyi Wang, Silin Liao, Zhibo Jin, Flora D. Salim, Huaming Chen, 29 Jul 2025, PAR-AdvGAN: Improving Adversarial Attack Capability with Progressive Auto-Regression AdvGAN, https://arxiv.org/abs/2502.12207
Ridvan Yesiloglu, Wei Peng, Md Tauhidul Islam, Ehsan Adeli, 29 Jul 2025, Neural Autoregressive Modeling of Brain Aging, https://arxiv.org/abs/2507.22954
Dian Chen, Yansong Qu, Xinyang Li, Ming Li, Shengchuan Zhang, 31 Jul 2025, XSpecMesh: Quality-Preserving Auto-Regressive Mesh Generation Acceleration via Multi-Head Speculative Decoding, https://arxiv.org/abs/2507.23777
Xincheng Yao, Yijun Yang, Kangwei Guo, Ruiqiang Xiao, Haipeng Zhou, Haisu Tao, Jian Yang and Lei Zhu, 31 Jul 2025, HRVVS: A High-resolution Video Vasculature Segmentation Network via Hierarchical Autoregressive Residual Priors, https://arxiv.org/abs/2507.22530
Saba Ahmadi, Rabiul Awal, Ankur Sikarwar, Amirhossein Kazemnejad, Ge Ya Luo, Juan A. Rodriguez, Sai Rajeswar, Siva Reddy, Christopher Pal, Benno Krojer, Aishwarya Agrawal, 1 Aug 2025, The Promise of RL for Autoregressive Image Editing, https://arxiv.org/abs/2508.01119
Antonio A. Ginart, Naveen Kodali, Jason Lee, Caiming Xiong, Silvio Savarese, John R. Emmons, 2 Aug 2025, LZ Penalty: An information-theoretic repetition penalty for autoregressive language models, https://arxiv.org/abs/2504.20131
Andrea Coccaro and Marco Letizia and Humberto Reyes-Gonzalez and Riccardo Torre, 4 Aug 2025, Comparison of Affine and Rational Quadratic Spline Coupling and Autoregressive Flows through Robust Statistical Tests, https://arxiv.org/abs/2302.12024
Xiuyu Yang, Shuhan Tan, Philipp Kr\"ahenb\"uhl, 5 Aug 2025, Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation, https://arxiv.org/abs/2506.17213
Faruk Alpay, Bugra Kilictas, Hamdi Alakkad, 6 Aug 2025, A Reproducible, Scalable Pipeline for Synthesizing Autoregressive Model Literature, https://arxiv.org/abs/2508.04612
Mathias Rose Bjare, Stefan Lattner, Gerhard Widmer, 7 Aug 2025, Estimating Musical Surprisal from Audio in Autoregressive Diffusion Model Noise Spaces, https://arxiv.org/abs/2508.05306
Preslav Aleksandrov, Meghdad Kurmanji, Fernando Garcia Redondo, David O'Shea, William Shen, Alex Iacob, Lorenzo Sani, Xinchi Qiu, Nicola Cancedda, Nicholas D. Lane, 7 Aug 2025, AbbIE: Autoregressive Block-Based Iterative Encoder for Efficient Sequence Modeling, https://arxiv.org/abs/2507.08567
Wouter M. Kouw, 13 Aug 2025, Bayesian autoregression to optimize temporal Mat\'ern kernel Gaussian process hyperparameters, https://arxiv.org/abs/2508.09792
Xiaojiao Xiao, Jianfeng Zhao, Qinmin Vivian Hu, Guanghui Wang, 13 Aug 2025, T-CACE: A Time-Conditioned Autoregressive Contrast Enhancement Multi-Task Framework for Contrast-Free Liver MRI Synthesis, Segmentation, and Diagnosis, https://arxiv.org/abs/2508.09919
Xingshan Zeng, Weiwen Liu, Lingzhi Wang, Liangyou Li, Fei Mi, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu, 18 Aug 2025, ToolACE-MT: Non-Autoregressive Generation for Agentic Multi-Turn Interaction, https://arxiv.org/abs/2508.12685
Beilong Tang, Bang Zeng, Ming Li, 16 Aug 2025, LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models, https://arxiv.org/abs/2504.07402
Dong Liu, Yanxuan Yu, 16 Aug 2025, QuickMerge++: Fast Token Merging with Autoregressive Prior, https://arxiv.org/abs/2508.13204
Mayank Nagda, Jephte Abijuru, Phil Ostheimer, Marius Kloft, Sophie Fellenz, 22 Aug 2025, PIANO: Physics Informed Autoregressive Network, https://arxiv.org/abs/2508.16235
Yinda Chen, Haoyuan Shi, Xiaoyu Liu, Te Shi, Ruobing Zhang, Dong Liu, Zhiwei Xiong, Feng Wu, 25 Aug 2025, TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation, https://arxiv.org/abs/2405.16847
Or Tal and Felix Kreuk and Yossi Adi, 4 Sep 2025, Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation, https://arxiv.org/abs/2506.08570
Yuzhu Chen and Yingjie Wang and Shunyu Liu and Yongcheng Jing and Dacheng Tao, 5 Sep 2025, CoVeR: Conformal Calibration for Versatile and Reliable Autoregressive Next-Token Prediction, https://arxiv.org/abs/2509.04733
Hao Zhou, Sibo Cheng, 25 Aug 2025, Improving Long-term Autoregressive Spatiotemporal Predictions: A Proof of Concept with Fluid Dynamics, https://arxiv.org/abs/2508.18565
Ming Chen, Liyuan Cui, Wenyuan Zhang, Haoxian Zhang, Yan Zhou, Xiaohan Li, Xiaoqiang Liu, Pengfei Wan, 26 Aug 2025, MIDAS: Multimodal Interactive Digital-human Synthesis via Real-time Autoregressive Video Generation, https://arxiv.org/abs/2508.19320
Zeyi Sun, Ziyang Chu, Pan Zhang, Tong Wu, Xiaoyi Dong, Yuhang Zang, Yuanjun Xiong, Dahua Lin, Jiaqi Wang, 27 Aug 2025, X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models, https://arxiv.org/abs/2412.01824
Hongju Su, Ke Li, Lan Yang, Honggang Zhang, Yi-Zhe Song, 28 Aug 2025, Amadeus: Autoregressive Model with Bidirectional Attribute Modelling for Symbolic Music, https://arxiv.org/abs/2508.20665
Han Yang, Jian Lan, Yihong Liu, Hinrich Sch\"utze, Thomas Seidl, 28 Aug 2025, Enhancing Robustness of Autoregressive Language Models against Orthographic Attacks via Pixel-based Approach, https://arxiv.org/abs/2508.21206
Tongtian Yue and Xuange Gao and Shuning Xue and Yepeng Tang and Longteng Guo and Jie Jiang and Jing Liu, 29 Aug 2025, BrainGPT: Unleashing the Potential of EEG Generalist Foundation Model by Autoregressive Pre-training, https://arxiv.org/abs/2410.19779
Zhongpan Tang, 29 Aug 2025, From TLinFormer to TConstFormer: The Leap to Constant-Time Transformer Attention: Achieving O(1) Computation and O(1) KV Cache during Autoregressive Inference, https://arxiv.org/abs/2509.00202
Aditya Kasliwal, Franziska Boenisch, Adam Dziedzic, 30 Aug 2025, Localizing and Mitigating Memorization in Image Autoregressive Models, https://arxiv.org/abs/2509.00488
Daehoon Gwak, Minseo Jung, Junwoo Park, Minho Park, ChaeHun Park, Junha Hyung, Jaegul Choo, 31 Aug 2025, Reward-Weighted Sampling: Enhancing Non-Autoregressive Characteristics in Masked Diffusion LLMs, https://arxiv.org/abs/2509.00707
James Amarel, Nicolas Hengartner, Robyn Miller, Kamaljeet Singh, Siddharth Mansingh, Arvind Mohan, Benjamin Migliori, Emily Casleton, Alexei Skurikhin, Earl Lawrence, Gerd J. Kunde, 18 Aug 2025, Generalization vs. Memorization in Autoregressive Deep Learning: Or, Examining Temporal Decay of Gradient Coherence, https://arxiv.org/abs/2509.00024
Chen Zeng, Tiehang Xu and Qiao Wang, 3 Sep 2025, AR-KAN: Autoregressive-Weight-Enhanced Kolmogorov-Arnold Network for Time Series Forecasting, https://arxiv.org/abs/2509.02967
Siyi Zhou, Yiquan Zhou, Yi He, Xun Zhou, Jinchao Wang, Wei Deng, Jingchen Shu, 3 Sep 2025, IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech, https://arxiv.org/abs/2506.21619
Daksh Mittal, Shunri Zheng, Jing Dong, Hongseok Namkoong, 6 Sep 2025, Data-Driven Stochastic Modeling Using Autoregressive Sequence Models: Translating Event Tables to Queueing Dynamics, https://arxiv.org/abs/2509.05839
Austin H. Cheng, Chong Sun, Al\'an Aspuru-Guzik, 8 Sep 2025, Scalable Autoregressive 3D Molecule Generation, https://arxiv.org/abs/2505.13791
Myung Jin Kim, YeongHyeon Park, Il Dong Yun, 12 Sep 2025, ARMA Block: A CNN-Based Autoregressive and Moving Average Module for Long-Term Time Series Forecasting, https://arxiv.org/abs/2509.10324
Chengze li, Yitong Zhang, Jia Li, Liyi Cai, Ge Li, 14 Sep 2025, Beyond Autoregression: An Empirical Study of Diffusion Large Language Models for Code Generation, https://arxiv.org/abs/2509.11252
Zirui Zheng, Takashi Isobe, Tong Shen, Xu Jia, Jianbin Zhao, Xiaomin Li, Mengmeng Ge, Baolu Li, Qinghe Wang, Dong Li, Dong Zhou, Yunzhi Zhuge, Huchuan Lu, Emad Barsoum, 15 Sep 2025, Layout-Conditioned Autoregressive Text-to-Image Generation via Structured Masking, https://arxiv.org/abs/2509.12046
Mika Sipil\"a, Klaus Nordhausen and Sara Taskinen, 15 Sep 2025, Identifiable Autoregressive Variational Autoencoders for Nonlinear and Nonstationary Spatio-Temporal Blind Source Separation, https://arxiv.org/abs/2509.11962
Muhammad Bilal Shahid, Cody Fleming, 10 Sep 2025, HopCast: Calibration of Autoregressive Dynamics Models, https://arxiv.org/abs/2501.16587
Jia Wang, Xinfeng Zhang, Gai Zhang, Jun Zhu, Lv Tang, and Li Zhang, 10 Sep 2025, UAR-NVC: A Unified AutoRegressive Framework for Memory-Efficient Neural Video Compression, https://arxiv.org/abs/2503.02733