Aussie AI

Non-Autoregression Optimizations

  • Last Updated 27 August, 2025
  • by David Spuler, Ph.D.

One of the biggest obstacles to fast inference of Large Language Models (LLMs) is that they emit one token at a time (e.g. one word at a time). This limits parallelism and means that the entire model must be re-run multiple times, once for each word (or subword token).

Why Autoregression?

The reason for this limitation is that the next word to output inherently relies on the prior word, which is kind of an unavoidable property of human language. But in LLM coding circles, this is called the "autoregression" problem, possibly because researchers tend to like big words.

Because of this issue, the LLM is designed so that when it emits a word, that word is then input into the model's next iteration to help it emit the next word. And that's slow because of multiple reasons:

  • The model runs for every token.
  • The model never produces 2 tokens (or more) in parallel.
  • The model cannot start working on the 2nd token before finishing the 1st token, which limits pipelining (a type of parallelism).

There is various research on fixing this latency problem, and achieving more parallelism. The research area is called "non-autoregression" optimizations.

Tokens and Non-Autoregression

Although much of the research into autoregression is major surgery to the LLM architecture, there's a simpler way to mitigate the inefficiency: bigger tokens. If the tokens are longer, then fewer are emitted for each piece of work done by the AI engine. So the model can run faster in terms of fewer iterations if the tokenizer chooses whole words rather than sub-words, or maybe even handles two-word common phrases as separate single tokens (i.e. multi-word tokens). Longer tokens therefore reduce inefficiencies from autoregression, but also reduce the total length of the input sequence, which also further reduces model execution (the transformer's attention algorithm is well-known to be quadratic in the size of the input sequence).

The downside to this is that it means more unique tokens, which increases the vocabulary size. And the model's complexity is somewhat dependent on the vocabulary size, so this increase with longer tokens means that the whole model is larger, and it runs slower.

Therefore, longer tokens reduce the latency time in terms of reducing the autoregression issue, but increase latency time by making the model larger overall. Maybe there's some happy trade-off here? Most of the current models seem to use a vocabulary of around 50,000 words. The vocabulary size becomes one of the meta-parameters of the model.

Research on Non-Autoregression Optimizations

Various strategies have been researched for improving the autoregressive bottleneck. Some example strategies include:

General research papers on autoregression improvements include:

  • Marjan Ghazvininejad, Omer Levy, Yinhan Liu, and Luke Zettlemoyer, Mask predict: Parallel decoding of conditional masked language models, arXiv preprint arXiv:1904.09324, 2019, https://arxiv.org/abs/1904.09324
  • Jiatao Gu, James Bradbury, Caiming Xiong, Victor OK Li, and Richard Socher, Non-autoregressive neural machine translation, arXiv preprint arXiv:1711.02281, 2017, https://arxiv.org/abs/1711.02281
  • Junliang Guo, Linli Xu, and Enhong Chen, Jointly masked sequence-to-sequence model for non-autoregressive neural machine translation, In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 376–385, 2020, https://aclanthology.org/2020.acl-main.36/
  • Jason Lee, Elman Mansimov, and Kyunghyun Cho, Deterministic non-autoregressive neural sequence modeling by iterative refinement, arXiv preprint arXiv:1802.06901, 2018, https://arxiv.org/abs/1802.06901
  • Zhuohan Li, Zi Lin, Di He, Fei Tian, Tao Qin, Liwei Wang, and Tie-Yan Liu, Hint-based training for non-autoregressive machine translation, arXiv preprint arXiv:1909.06708, 2019, https://arxiv.org/abs/1909.06708
  • Chenze Shao, Jinchao Zhang, Yang Feng, Fandong Meng, and Jie Zhou. Minimizing the bag-of-ngrams difference for non-autoregressive neural machine translation. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 198–205, 2020, https://arxiv.org/abs/1911.09320
  • Zhiqing Sun, Zhuohan Li, Haoqing Wang, Di He, Zi Lin, and Zhihong Deng. Fast structured decoding for sequence models. Advances in Neural Information Processing Systems, 32, 2019, https://arxiv.org/abs/1910.11555
  • Yiren Wang, Fei Tian, Di He, Tao Qin, ChengXiang Zhai, and Tie-Yan Liu, Non-autoregressive machine translation with auxiliary regularization, In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 5377–5384, 2019, https://arxiv.org/abs/1902.10245
  • Bingzhen Wei, Mingxuan Wang, Hao Zhou, Junyang Lin, Jun Xie, and Xu Sun. Imitation learning for non-autoregressive neural machine translation. arXiv preprint arXiv:1906.02041, 2019, https://arxiv.org/abs/1906.02041
  • Michiel de Jong, Yury Zemlyanskiy, Joshua Ainslie, Nicholas FitzGerald, Sumit Sanghai, Fei Sha, and William Cohen. Fido: Fusion-in-decoder optimized for stronger performance and faster inference. arXiv preprint arXiv:2212.08153, Dec 2022, https://arxiv.org/abs/2212.08153
  • Sehoon Kim, Coleman Hooper, Thanakul Wattanawong, Minwoo Kang, Ruohan Yan, Hasan Genc, Grace Dinh, Qijing Huang, Kurt Keutzer, Michael W Mahoney, et al. Full stack optimization of transformer inference: a survey. arXiv preprint arXiv:2302.14017, 2023, https://arxiv.org/abs/2302.14017
  • Chitwan Saharia, William Chan, Saurabh Saxena, and Mohammad Norouzi. 2020. Non-autoregressive machine translation with latent alignments. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1098–1108, Online. Association for Computational Linguistics. https://arxiv.org/abs/2004.07437
  • Marjan Ghazvininejad, Vladimir Karpukhin, Luke Zettlemoyer, and Omer Levy. 2020. Aligned cross entropy for non-autoregressive machine translation. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 3515–3523. PMLR. http://proceedings.mlr.press/v119/ghazvininejad20a.html
  • Marjan Ghazvininejad, Omer Levy, and Luke Zettlemoyer. 2020. Semi-autoregressive training improves mask-predict decoding. arXiv preprint arXiv:2001.08785, https://arxiv.org/abs/2001.08785
  • Lihua Qian, Hao Zhou, Yu Bao, Mingxuan Wang, Lin Qiu, Weinan Zhang, Yong Yu, and Lei Li. 2020. Glancing transformer for non-autoregressive neural machine translation. arXiv preprint arXiv:2008.07905, https://arxiv.org/abs/2008.07905
  • Jiatao Gu, Xiang Kong, Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade, Dec 2020, https://arxiv.org/abs/2012.15833
  • Jason D. Lee, Elman Mansimov, and Kyunghyun Cho. Deterministic non-autoregressive neural sequence modeling by iterative refinement. In Proc. of EMNLP, 2018. https://arxiv.org/abs/1802.06901
  • Marjan Ghazvininejad, Omer Levy, Yinhan Liu, and Luke S. Zettlemoyer. Mask-predict: Parallel decoding of conditional masked language models. In Proc. of EMNLP, 2019. https://arxiv.org/abs/1904.09324
  • Chen, C., Borgeaud, S., Irving, G., Lespiau, J.-B., Sifre, L., and Jumper, J., Feb 2023, Accelerating large language model decoding with speculative sampling, arXiv preprint arXiv:2302.01318, https://arxiv.org/abs/2302.01318
  • Leviathan, Y., Kalman, M., and Matias, Y., Fast inference from transformers via speculative decoding, May 2023, https://arxiv.org/abs/2211.17192
  • Stern, M., Shazeer, N., and Uszkoreit, J., Nov 2018, Blockwise parallel decoding for deep autoregressive models, Advances in Neural Information Processing Systems, 31, https://arxiv.org/abs/1811.03115
  • Sotiris Anagnostidis, Dario Pavllo, Luca Biggio, Lorenzo Noci, Aurelien Lucchi, Thomas Hofmann, Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers, arXiv preprint, 2023, https://arxiv.org/abs/2305.15805
  • Xin Sun, Tao Ge, Furu Wei, and Houfeng Wang. Instantaneous grammatical error correction with shallow aggressive decoding. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 5937–5947, 2021. https://arxiv.org/abs/2106.04970, Code: https://github.com/AutoTemp/Shallow-Aggressive-Decoding (Aggressive decoding emits as many tokens as possible, combined with a shallow decoder architecture here.)
  • T. Ge, H. Xia, X. Sun, S. Chen, and F. Wei. Lossless acceleration for seq2seq generation with aggressive decoding. ArXiv, abs/2205.10350, 2022. https://arxiv.org/abs/2205.10350, Code: https://github.com/microsoft/unilm/tree/master/decoding (Aggressive decoding means emitting multiple tokens at a time, reducing autoregression; has a generalization that is similar to speculative decoding here.)
  • Marjan Ghazvininejad, Omer Levy, Yinhan Liu, and Luke Zettlemoyer. 2019. Mask-predict: Parallel decoding of conditional masked language models. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6114–6123. https://arxiv.org/abs/1904.09324 (Parallel decoding or "bidirectional" decoding, rather than left-to-right generation of tokens.)
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.0480 https://arxiv.org/abs/1810.04805 (Rather than left-to-right, uses "bidirectional" decoding)
  • Johannes von Oswald, Eyvind Niklasson, Maximilian Schlegel, Seijin Kobayashi, Nicolas Zucchet, Nino Scherrer, Nolan Miller, Mark Sandler, Blaise Agüera y Arcas, Max Vladymyrov, Razvan Pascanu, João Sacramento, Sep 2023, Uncovering mesa-optimization algorithms in Transformers, https://arxiv.org/abs/2309.05858 (Uses linear attention algorithm.)
  • X Li, S Chen, S Zhang, L Hou, Y Zhu, Z Xiao, 2023, Human Activity Recognition Using IR-UWB Radar: A Lightweight Transformer Approach, IEEE Geoscience and Remote Sensing Letters (Early Access), https://ieeexplore.ieee.org/document/10247554 (Linear attention.)
  • J Kasai, 2023, Towards Efficient, Customizable, and Communal Natural Language Processing, Ph.D. thesis, Computer Science and Engineering, University of Washington, https://www.proquest.com/openview/604084b574dcd05e41eb6e33682a3537/1 (More about shallow decoders.)
  • Y Chen, Y Li, A Xu, Q Sun, X Chen, C Xu, 2023, WAG-NAT: Window Attention and Generator Based Non-Autoregressive Transformer for Time Series Forecasting, ICANN 2023: Artificial Neural Networks and Machine Learning, pp. 293–304, https://link.springer.com/chapter/10.1007/978-3-031-44223-0_24, Code: https://github.com/cybisolated/WAG-NAT
  • S Bae, J Ko, H Song, SY Yun, Oct 2023, Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding, arXiv preprint arXiv:2310.05424, https://arxiv.org/pdf/2310.05424.pdf (Combination of early-exit with a "shallow-deep module" and parallel decoding.)
  • Nikolay Savinov, Junyoung Chung, Mikolaj Binkowski, Erich Elsen, and Aaron van den Oord. 2022. Step-unrolled denoising autoencoders for text generation. International Conference on Learning Representations. https://arxiv.org/abs/2112.06749
  • Y Zhang, Y Zhang, L Cui, G Fu, Oct 2023, Non-autoregressive Text Editing with Copy-aware Latent Alignments, arXiv preprint arXiv:2310.07821, https://arxiv.org/pdf/2310.07821.pdf
  • S Ren, Q Jia, KQ Zhu, arXiv preprint arXiv:2310.08152, Context Compression for Auto-regressive Transformers with Sentinel Tokens, Oct 2023, https://arxiv.org/pdf/2310.08152.pdf, Code: https://github.com/DRSY/KV_Compression
  • Tri Dao, Daniel Haziza, Francisco Massa, Grigory Sizov, October 13, 2023, Flash-Decoding for long-context inference, PyTorch Blog, https://pytorch.org/blog/flash-decoding/
  • Jesse Mu, Xiang Lisa Li, and Noah Goodman. July 2023. Learning to compress prompts with gist tokens. arXiv preprint arXiv:2304.08467. https://arxiv.org/abs/2304.08467 (Prompt compression.)
  • Yassir Fathullah, Puria Radmard, Adian Liusie, Mark J. F. Gales, 2024, Who Needs Decoders? Efficient Estimation of Sequence-Level Attributes with Proxies, Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics Volume 1: Long Papers, pages 1478–1496 March 17-22, 2024, https://aclanthology.org/2024.eacl-long.89.pdf (Non-autoregressive decoding methods in special use cases such as machine language translation.)
  • Ruchao Fan, 2024, Improving the Accuracy and Inference Efficiency for Low-resource Automatic Speech Recognition, Ph.D thesis, Electrical and Computer Engineering, University of California Los Angeles, https://escholarship.org/content/qt9281v84q/qt9281v84q_noSplash_28de3ba38c8c7a613d2fa945d28c1613.pdf (Uses bidirectional autoregressive predicting encoding for speech recognition.)
  • Zanlin Ni, Yulin Wang, Renping Zhou, Jiayi Guo, Jiny iHu, Zhiyuan Liu, Shiji Song, Yuan Yao, Gao Huang, 2024, Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis, https://openaccess.thecvf.com/content/CVPR2024/papers/Ni_Revisiting_Non-Autoregressive_Transformers_for_Efficient_Image_Synthesis_CVPR_2024_paper.pdf Code: https://github.com/LeapLabTHU/ImprovedNAT
  • Hanling Yi, Feng Lin, Hongbin Li, Peiyang Ning, Xiaotian Yu, Rong Xiao, 16 Apr 2024 (v2)], Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding, https://arxiv.org/abs/2402.11809 (Semi-autoregressive draft model with parallel verification.)
  • Feng Li,Jingxian Chen, Xuejun Zhang, 2023, A Survey of Non-Autoregressive Neural Machine Translation, Electronics 2023, 12(13), 2980, https://doi.org/10.3390/electronics12132980, https://www.mdpi.com/2079-9292/12/13/2980 https://www.mdpi.com/2079-9292/12/13/2980/pdf?version=1688953962 (A survey of language translation with non-autoregressive architectures.)
  • Yufan Jiang, Qiaozhi He, Xiaomin Zhuang, Zhihua Wu, Kunpeng Wang, Wenlai Zhao, Guangwen Yang, 8 Aug 2023 (v2), RecycleGPT: An Autoregressive Language Model with Recyclable Module, https://arxiv.org/abs/2308.03421 (Uses the idea of guessing the next token based on only a few preceding tokens with extra layers inside a Transformer.)
  • Aishwarya P S, Pranav Ajit Nair, Yashas Samaga, Toby Boyd, Sanjiv Kumar, Prateek Jain, Praneeth Netrapalli, 26 Mar 2024 (v3), Tandem Transformers for Inference Efficient LLMs, https://arxiv.org/abs/2402.08644 (A two-model architecture with a small autoregressive model and a larger model with non-autoregressive block decoding, which is similar to big-little inference and speculative decoding methods.)
  • Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Defossez, Jade Copet, Gabriel Synnaeve, Yossi Adi, Jan 2024, Masked Audio Generation using a Single Non-Autoregressive Transformer https://pages.cs.huji.ac.il/adiyoss-lab/MAGNeT/ https://arxiv.org/pdf/2401.04577.pdf Code: https://github.com/facebookresearch/audiocraft/blob/main/docs/MAGNET.md
  • Qi Zhang, Tianqi Du, Haotian Huang, Yifei Wang, Yisen Wang, 2024, Look Ahead or Look Around? ATheoretical Comparison Between Autoregressive and Masked Pretraining, https://openreview.net/pdf?id=2rPoTgEmjV Code: https://github.com/PKU-ML/LookAheadLookAround (Evaluates autoregressive and masked methods in training.)
  • Y Lin, Oct 2023, ProNet: Progressive Neural Network for Multi-Horizon Time Series Forecasting, arXiv preprint arXiv:2310.19322, https://arxiv.org/pdf/2310.19322.pdf
  • Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V Le, and Ruslan Salakhutdinov. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860, 2019. https://arxiv.org/abs/1901.02860
  • Ashwin K. Vijayakumar, Michael Cogswell, Ramprasaath R. Selvaraju, Qing Sun, Stefan Lee, David J. Crandall, and Dhruv Batra. 2018. Diverse beam search for improved description of complex scenes. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 7371–7379. AAAI Press. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17329
  • Xiaohui Wang, Ying Xiong, Yang Wei, Mingxuan Wang, Lei Li Apr 2021, LightSeq: A High Performance Inference Library for Transformers, https://arxiv.org/pdf/2010.13887.pdf
  • Yiren Wang, Fei Tian, Di He, Tao Qin, ChengXiang Zhai, and Tie-Yan Liu. 2019, Non-autoregressive machine translation with auxiliary regularization. In Proc. of AAAI, 2019, https://arxiv.org/abs/1902.10245.
  • Xuezhe Ma, Chunting Zhou, Xian Li, Graham Neubig, and Eduard H. Hovy. FlowSeq: Non-autoregressive conditional sequence generation with generative flow. In Proc. of EMNLP, 2019. https://arxiv.org/abs/1909.02480.
  • Xiaosong Jia, Shaoshuai Shi, Zijun Chen, Li Jiang, Wenlong Liao, Tao He, Junchi Yan, 21 Mar 2024, AMP: Autoregressive Motion Prediction Revisited with Next Token Prediction for Autonomous Driving, https://arxiv.org/abs/2403.13331
  • David Spuler, March 2024, Chapter 26. Decoding Algorithms, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
  • Z Wang, L Wang, J Su, J Yao, Z Tu, 2023, Revisiting Non-Autoregressive Translation at Scale, https://arxiv.org/abs/2305.16155
  • S Norouzi, R Hosseinzadeh, F Perez, 2023, DiMS: Distilling Multiple Steps of Iterative Non-Autoregressive Transformers for Machine Translation, https://aclanthology.org/2023.findings-acl.542/
  • Raphael Shu, Jason Lee, Hideki Nakayama, and Kyunghyun Cho. 2020, Latent-variable non-autoregressive neural machine translation with deterministic inference using a delta posterior. In Proc. of AAAI, 2020. https://arxiv.org/abs/1908.07181
  • Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn, Oct 2022, EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start, https://arxiv.org/abs/2205.12209
  • Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. Generating long sequences with sparse transformers. https://openai.com/blog/sparse-transformers, 2019, https://arxiv.org/abs/1904.10509
  • Yisheng Xiao, Lijun Wu, Junliang Guo, Juntao Li, Min Zhang, Tao Qin, Tie-yan Liu, 6 Jul 2023 (v2), A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond, https://arxiv.org/pdf/2204.09269.pdf
  • Mahsa Khoshnoodi, Vinija Jain, Mingye Gao, Malavika Srikanth, Aman Chadha, 24 May 2024 (v2), A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models, https://arxiv.org/abs/2405.13019
  • Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer, 1 May 2024 (v6), LLM Inference Unveiled: Survey and Roofline Model Insights, https://arxiv.org/abs/2402.16363 Code: https://github.com/hahnyuan/LLM-Viewer
  • Li, S., Unanue, I.J., Piccardi, M. (2024). LayerGLAT: A Flexible Non-autoregressive Transformer for Single-Pass and Multi-pass Prediction. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14942. Springer, Cham. https://doi.org/10.1007/978-3-031-70344-7_14 https://link.springer.com/chapter/10.1007/978-3-031-70344-7_14 https://github.com/lsj72123/layer-GLAT
  • David Spuler, March 2024, Tokens and Non-Autoregression, in Generative AI in C++, https://www.aussieai.com/book/ch26-tokens-auto-regression
  • Du Cunxiao, 2024, Towards Faster Inference of Transformers: Strategies for Accelerating Decoding Processes, Ph.D. thesis, Computer Science, School of Computing and Information Systems, Singapore Management University, https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1611&context=etd_coll (Examines non-autoregressive decoding, speculative decoding and attention optimizations.)
  • Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, Tianqi Chen, Zhihao Jia, 23 Dec 2023, Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems, https://arxiv.org/abs/2312.15234
  • Ziqi Pang, Tianyuan Zhang, Fujun Luan, Yunze Man, Hao Tan, Kai Zhang, William T. Freeman, Yu-Xiong Wang, 2 Dec 2024, RandAR: Decoder-only Autoregressive Visual Generation in Random Orders, https://arxiv.org/abs/2412.01827 https://rand-ar.github.io/ (Attempt to parallelize image generation decoding by randomizing the order at which to create patches of an image.)
  • Yefei He, Feng Chen, Yuanyu He, Shaoxuan He, Hong Zhou, Kaipeng Zhang, Bohan Zhuang, 5 Dec 2024, ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality, https://arxiv.org/abs/2412.04062
  • Yuqing Wang, Shuhuai Ren, Zhijie Lin, Yujin Han, Haoyuan Guo, Zhenheng Yang, Difan Zou, Jiashi Feng, Xihui Liu, 19 Dec 2024, Parallelized Autoregressive Visual Generation, https://arxiv.org/abs/2412.15119 https://epiphqny.github.io/PAR-project
  • Z Ni, Y Wang, R Zhou, J Guo, J Hu, 2024, Revisiting non-autoregressive transformers for efficient image synthesis, http://openaccess.thecvf.com/content/CVPR2024/html/Ni_Revisiting_Non-Autoregressive_Transformers_for_Efficient_Image_Synthesis_CVPR_2024_paper.html
  • Alokesh Manna and Sujit K. Ghosh, 12 Aug 2025, Bayesian Models for Joint Selection of Features and Auto-Regressive Lags: Theory and Applications in Environmental and Financial Forecasting, https://arxiv.org/abs/2508.10055
  • Mihir Prabhudesai, Menging Wu, Amir Zadeh, Katerina Fragkiadaki, Deepak Pathak, 24 Jul 2025, Diffusion Beats Autoregressive in Data-Constrained Settings, https://arxiv.org/abs/2507.15857
  • Quang-Binh Nguyen, Minh Luu, Quang Nguyen, Anh Tran, Khoi Nguyen, 18 Jul 2025, CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models, https://arxiv.org/abs/2507.13984
  • Sunwoong Yang, Ricardo Vinuesa, Namwoo Kang, 18 Jul 2025, AI-Accelerated Flow Simulation: A Robust Auto-Regressive Framework for Long-Term CFD Forecasting, https://arxiv.org/abs/2412.05657
  • Haoyuan Wu, Haisheng Zheng, Shoubo Hu, Zhuolun He, Bei Yu, 18 Jul 2025, Architect of the Bits World: Masked Autoregressive Modeling for Circuit Generation Guided by Truth Table, https://arxiv.org/abs/2502.12751
  • Dario Coscia, Max Welling, Nicola Demo, Gianluigi Rozza, 18 Jul 2025, BARNN: A Bayesian Autoregressive and Recurrent Neural Network, https://arxiv.org/abs/2501.18665
  • Nirmit Joshi, Gal Vardi, Adam Block, Surbhi Goel, Zhiyuan Li, Theodor Misiakiewicz, Nathan Srebro, 11 Aug 2025, A Theory of Learning with Autoregressive Chain of Thought, https://arxiv.org/abs/2503.07932
  • Zhuokun Chen, Jugang Fan, Zhuowei Yu, Bohan Zhuang, Mingkui Tan, 28 Jul 2025, Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis, https://arxiv.org/abs/2507.20454
  • Kaif Shaikh, Franziska Boenisch, Adam Dziedzic, 28 Jul 2025, Implementing Adaptations for Vision AutoRegressive Model, https://arxiv.org/abs/2507.11441
  • Jiayu Zhang, Zhiyu Zhu, Xinyi Wang, Silin Liao, Zhibo Jin, Flora D. Salim, Huaming Chen, 29 Jul 2025, PAR-AdvGAN: Improving Adversarial Attack Capability with Progressive Auto-Regression AdvGAN, https://arxiv.org/abs/2502.12207
  • Ridvan Yesiloglu, Wei Peng, Md Tauhidul Islam, Ehsan Adeli, 29 Jul 2025, Neural Autoregressive Modeling of Brain Aging, https://arxiv.org/abs/2507.22954
  • Dian Chen, Yansong Qu, Xinyang Li, Ming Li, Shengchuan Zhang, 31 Jul 2025, XSpecMesh: Quality-Preserving Auto-Regressive Mesh Generation Acceleration via Multi-Head Speculative Decoding, https://arxiv.org/abs/2507.23777
  • Xincheng Yao, Yijun Yang, Kangwei Guo, Ruiqiang Xiao, Haipeng Zhou, Haisu Tao, Jian Yang and Lei Zhu, 31 Jul 2025, HRVVS: A High-resolution Video Vasculature Segmentation Network via Hierarchical Autoregressive Residual Priors, https://arxiv.org/abs/2507.22530
  • Saba Ahmadi, Rabiul Awal, Ankur Sikarwar, Amirhossein Kazemnejad, Ge Ya Luo, Juan A. Rodriguez, Sai Rajeswar, Siva Reddy, Christopher Pal, Benno Krojer, Aishwarya Agrawal, 1 Aug 2025, The Promise of RL for Autoregressive Image Editing, https://arxiv.org/abs/2508.01119
  • Antonio A. Ginart, Naveen Kodali, Jason Lee, Caiming Xiong, Silvio Savarese, John R. Emmons, 2 Aug 2025, LZ Penalty: An information-theoretic repetition penalty for autoregressive language models, https://arxiv.org/abs/2504.20131
  • Andrea Coccaro and Marco Letizia and Humberto Reyes-Gonzalez and Riccardo Torre, 4 Aug 2025, Comparison of Affine and Rational Quadratic Spline Coupling and Autoregressive Flows through Robust Statistical Tests, https://arxiv.org/abs/2302.12024
  • Xiuyu Yang, Shuhan Tan, Philipp Kr\"ahenb\"uhl, 5 Aug 2025, Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation, https://arxiv.org/abs/2506.17213
  • Faruk Alpay, Bugra Kilictas, Hamdi Alakkad, 6 Aug 2025, A Reproducible, Scalable Pipeline for Synthesizing Autoregressive Model Literature, https://arxiv.org/abs/2508.04612
  • Mathias Rose Bjare, Stefan Lattner, Gerhard Widmer, 7 Aug 2025, Estimating Musical Surprisal from Audio in Autoregressive Diffusion Model Noise Spaces, https://arxiv.org/abs/2508.05306
  • Preslav Aleksandrov, Meghdad Kurmanji, Fernando Garcia Redondo, David O'Shea, William Shen, Alex Iacob, Lorenzo Sani, Xinchi Qiu, Nicola Cancedda, Nicholas D. Lane, 7 Aug 2025, AbbIE: Autoregressive Block-Based Iterative Encoder for Efficient Sequence Modeling, https://arxiv.org/abs/2507.08567
  • Wouter M. Kouw, 13 Aug 2025, Bayesian autoregression to optimize temporal Mat\'ern kernel Gaussian process hyperparameters, https://arxiv.org/abs/2508.09792
  • Xiaojiao Xiao, Jianfeng Zhao, Qinmin Vivian Hu, Guanghui Wang, 13 Aug 2025, T-CACE: A Time-Conditioned Autoregressive Contrast Enhancement Multi-Task Framework for Contrast-Free Liver MRI Synthesis, Segmentation, and Diagnosis, https://arxiv.org/abs/2508.09919
  • Xingshan Zeng, Weiwen Liu, Lingzhi Wang, Liangyou Li, Fei Mi, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu, 18 Aug 2025, ToolACE-MT: Non-Autoregressive Generation for Agentic Multi-Turn Interaction, https://arxiv.org/abs/2508.12685
  • Beilong Tang, Bang Zeng, Ming Li, 16 Aug 2025, LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models, https://arxiv.org/abs/2504.07402
  • Dong Liu, Yanxuan Yu, 16 Aug 2025, QuickMerge++: Fast Token Merging with Autoregressive Prior, https://arxiv.org/abs/2508.13204
  • Mayank Nagda, Jephte Abijuru, Phil Ostheimer, Marius Kloft, Sophie Fellenz, 22 Aug 2025, PIANO: Physics Informed Autoregressive Network, https://arxiv.org/abs/2508.16235
  • Yinda Chen, Haoyuan Shi, Xiaoyu Liu, Te Shi, Ruobing Zhang, Dong Liu, Zhiwei Xiong, Feng Wu, 25 Aug 2025, TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation, https://arxiv.org/abs/2405.16847

More Research on Decoding Algorithms

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research

Read more about: