Aussie AI
Attention Steering
-
Last Updated 17 November, 2025
-
by David Spuler, Ph.D.
What is Attention Steering?
Attention steering is a method to "steer" or focus the LLM attention algorithm onto a particular subset of the tokens. This aims for more accurate and faster attention computations.
Research on Attention Steering
Research papers on attention steering:- Zhuohan Gu, Jiayi Yao, Kuntai Du, Junchen Jiang, 21 Nov 2024 (v2), LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts, https://arxiv.org/abs/2411.13009
- Qingru Zhang, Chandan Singh, Liyuan Liu, Xiaodong Liu, Bin Yu, Jianfeng Gao, Tuo Zhao, 1 Oct 2024 (v2), Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs, https://arxiv.org/abs/2311.02262 https://github.com/QingruZhang/PASTA
- Baifeng Shi, Siyu Gai, Trevor Darrell, Xin Wang, 11 Jul 2023 (v2), TOAST: Transfer Learning via Attention Steering, https://arxiv.org/abs/2305.15542 https://github.com/bfshi/TOAST
- Yibin Wang, Weizhong Zhang, Jianwei Zheng, Cheng Jin, 20 Aug 2024 (v3), PrimeComposer: Faster Progressively Combined Diffusion for Image Composition with Attention Steering, https://arxiv.org/abs/2403.05053 https://github.com/CodeGoat24/PrimeComposer
- Haoran Wang, Kai Shu, Jan 2025, MakeEveryTokenCount: ASystematic Survey on Decoding Methods for Foundation Model, https://www.researchgate.net/profile/Haoran-Wang-96/publication/387703971_Make_Every_Token_Count_A_Systematic_Survey_on_Decoding_Methods_for_Foundation_Models/links/67784c8ce74ca64e1f49eb15/Make-Every-Token-Count-A-Systematic-Survey-on-Decoding-Methods-for-Foundation-Models.pdf https://github.com/wang2226/Awesome-LLM-Decoding
- Kyle O'Brien, David Majercak, Xavier Fernandes, Richard Edgar, Jingya Chen, Harsha Nori, Dean Carignan, Eric Horvitz, Forough Poursabzi-Sangde, 18 Nov 2024, Steering Language Model Refusal with Sparse Autoencoders, https://arxiv.org/abs/2411.11296
- Xintong Wang, Jingheng Pan, Longqin Jiang, Liang Ding, Xingshan Li, Chris Biemann, 23 Oct 2024, CogSteer: Cognition-Inspired Selective Layer Intervention for Efficient Semantic Steering in Large Language Models, https://arxiv.org/abs/2410.17714
- Neel Nanda, 8th Jul 2024, An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2, https://www.alignmentforum.org/posts/NfFST5Mio7BCAQHPA/an-extremely-opinionated-annotated-list-of-my-favourite
- Yixin Ji, Juntao Li, Hai Ye, Kaixin Wu, Jia Xu, Linjian Mo, Min Zhang, 5 Jan 2025, Test-time Computing: from System-1 Thinking to System-2 Thinking, https://arxiv.org/abs/2501.02497
- Hanyu Zhang, Xiting Wang, Chengao Li, Xiang Ao, Qing He, 10 Jan 2025, Controlling Large Language Models Through Concept Activation Vectors, https://arxiv.org/abs/2501.05764 (Training a vector used to control the model on certain attributes.)
- Qi Sun, Edoardo Cetin, Yujin Tang, 14 Jan 2025 (v2), Transformer2: Self-adaptive LLMs, https://arxiv.org/abs/2501.06252 (Using a vector to fine-tuning dynamically.)
- Liu Yang, Ziqian Lin, Kangwook Lee, Dimitris Papailiopoulos, Robert Nowak, 16 Jan 2025, Task Vectors in In-Context Learning: Emergence, Formation, and Benefit, https://arxiv.org/abs/2501.09240
- Dan Zhang, Tao Feng, Lilong Xue, Yuandong Wang, Yuxiao Dong, Jie Tang, 23 Jan 2025, Parameter-Efficient Fine-Tuning for Foundation Models, https://arxiv.org/abs/2501.13787
- Xinyu Ma, Yifeng Xu, Yang Lin, Tianlong Wang, Xu Chu, Xin Gao, Junfeng Zhao, Yasha Wang, 24 Jan 2025, DRESSing Up LLM: Efficient Stylized Question-Answering via Style Subspace Editing, https://arxiv.org/abs/2501.14371 https://github.com/ArthurLeoM/DRESS-LLM
- Peixuan Han, Cheng Qian, Xiusi Chen, Yuji Zhang, Denghui Zhang, Heng Ji, 4 Feb 2025 (v2), Internal Activation as the Polar Star for Steering Unsafe LLM Behavior, https://arxiv.org/abs/2502.01042
- Daniel Beaglehole, Adityanarayanan Radhakrishnan, Enric Boix-Adserà, Mikhail Belkin, 6 Feb 2025, Aggregate and conquer: detecting and steering LLM concepts by combining nonlinear predictors over multiple layers,0 https://arxiv.org/abs/2502.03708 https://github.com/dmbeaglehole/neural_controllers
- Nikhil Anand, Dec 20, 2024, Understanding “steering” in LLMs And how simple math can solve global problems. https://ai.gopubby.com/understanding-steering-in-llms-96faf6e0bee7
- Somnath Banerjee, Sayan Layek, Pratyush Chatterjee, Animesh Mukherjee, Rima Hazra, 16 Feb 2025, Soteria: Language-Specific Functional Parameter Steering for Multilingual Safety Alignment, https://arxiv.org/abs/2502.11244
- Lukasz Bartoszcze, Sarthak Munshi, Bryan Sukidi, Jennifer Yen, Zejia Yang, David Williams-King, Linh Le, Kosi Asuzu, Carsten Maple, 24 Feb 2025, Representation Engineering for Large-Language Models: Survey and Research Challenges,https://arxiv.org/abs/2502.17601
- Yingbing Huang, Deming Chen, Abhishek K. Umrawal, 28 Feb 2025, JAM: Controllable and Responsible Text Generation via Causal Reasoning and Latent Vector Manipulation, https://arxiv.org/abs/2502.20684
- Seongheon Park, Xuefeng Du, Min-Hsuan Yeh, Haobo Wang, Yixuan Li, 1 Mar 2025, How to Steer LLM Latents for Hallucination Detection? https://arxiv.org/abs/2503.01917
- Marco Scialanga, Thibault Laugel, Vincent Grari, Marcin Detyniecki, 3 Mar 2025, SAKE: Steering Activations for Knowledge Editing, https://arxiv.org/abs/2503.01751
- Kenneth J. K. Ong, Lye Jia Jun, Hieu Minh "Jord" Nguyen, Seong Hah Cho, Natalia Pérez-Campanero Antolín, 17 Mar 2025, Identifying Cooperative Personalities in Multi-agent Contexts through Personality Steering with Representation Engineering, https://arxiv.org/abs/2503.12722
- Moreno D'Incà, Elia Peruzzo, Xingqian Xu, Humphrey Shi, Nicu Sebe, Massimiliano Mancini, 14 Mar 2025, Safe Vision-Language Models via Unsafe Weights Manipulation, https://arxiv.org/abs/2503.11742
- Changho Shin, Xinya Yan, Suenggwan Jo, Sungjun Cho, Shourjo Aditya Chaudhuri, Frederic Sala, 25 Mar 2025 (v2), TARDIS: Mitigating Temporal Misalignment via Representation Steering, https://arxiv.org/abs/2503.18693
- Jingcheng Niu, Xingdi Yuan, Tong Wang, Hamidreza Saghir, Amir H. Abdi, 14 May 2025, Llama See, Llama Do: A Mechanistic Perspective on Contextual Entrainment and Distraction in LLMs, https://arxiv.org/abs/2505.09338
- Yao Huang, Huanran Chen, Shouwei Ruan, Yichi Zhang, Xingxing Wei, Yinpeng Dong, 28 May 2025, Mitigating Overthinking in Large Reasoning Models via Manifold Steering, https://arxiv.org/abs/2505.22411 https://github.com/Aries-iai/Manifold_Steering
- Runjin Chen, Andy Arditi, Henry Sleight, Owain Evans, Jack Lindsey, 29 Jul 2025, Persona Vectors: Monitoring and Controlling Character Traits in Language Models, https://arxiv.org/abs/2507.21509
- Zhang, Qingru, May 2025, On the Efficiency and Steerability of Self-Attention Mechanism of Large Language Models, Ph.D. Thesis, Georgia Institute of Technology, https://hdl.handle.net/1853/77839 https://repository.gatech.edu/entities/publication/d14aeab0-0189-42cb-9cbb-36eeb4434dcb (Coverage of efficiency with mixed attention span KV cache compression, and attention steering.)
- Yichen Li, Zhiting Fan, Ruizhe Chen, Xiaotang Gai, Luqi Gong, Yan Zhang, Zuozhu Liu, 5 Jul 2025 (v2), FairSteer: Inference Time Debiasing for LLMs with Dynamic Activation Steering, https://arxiv.org/abs/2504.14492
- Xinyan Jiang, Lin Zhang, Jiayi Zhang, Qingsong Yang, Guimin Hu, Di Wang, Lijie Hu, 14 Aug 2025, MSRS: Adaptive Multi-Subspace Representation Steering for Attribute Alignment in Large Language Models, https://arxiv.org/abs/2508.10599
- Helena Casademunt, Caden Juang, Adam Karvonen, Samuel Marks, Senthooran Rajamanoharan, Neel Nanda, 22 Jul 2025, Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning, https://arxiv.org/abs/2507.16795
- Duy Nguyen, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal, 24 Jul 2025, GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs, https://arxiv.org/abs/2507.18043
- Cheng-Ting Chou, George Liu, Jessica Sun, Cole Blondin, Kevin Zhu, Vasu Sharma, Sean O'Brien, 17 Jul 2025, Causal Language Control in Multilingual Transformers via Sparse Feature Steering, https://arxiv.org/abs/2507.13410
- Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, Rajesh Ranganath, 18 Jul 2025, A General Framework for Inference-time Scaling and Steering of Diffusion Models, https://arxiv.org/abs/2501.06848
- Constantin Venhoff, Iv\'an Arcuschin, Philip Torr, Arthur Conmy, Neel Nanda, 17 Jul 2025, Understanding Reasoning in Thinking Language Models via Steering Vectors, https://arxiv.org/abs/2506.18167
- Zhi Zhong, Akira Takahashi, Shuyang Cui, Keisuke Toyama, Shusuke Takahashi, Yuki Mitsufuji, 17 Jul 2025, SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet, https://arxiv.org/abs/2505.16195
- Simon Kohaut and Felix Divo and Navid Hamid and Benedict Flade and Julian Eggert and Devendra Singh Dhami and Kristian Kersting, 21 Jul 2025, The Constitutional Controller: Doubt-Calibrated Steering of Compliant Agents, https://arxiv.org/abs/2507.15478
- Anirudh Sundar, Sinead Williamson, Katherine Metcalf, Barry-John Theobald, Skyler Seto, Masha Fedzechkina, 21 Jul 2025, Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models, https://arxiv.org/abs/2502.15639
- Taewook Kim, Dhruv Agarwal, Jordan Ackerman, Manaswi Saha, 10 Aug 2025, Steering AI-Driven Personalization of Scientific Text for General Audiences, https://arxiv.org/abs/2411.09969
- Haiyan Zhao, Xuansheng Wu, Fan Yang, Bo Shen, Ninghao Liu, Mengnan Du, 29 Jul 2025, Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model Steering, https://arxiv.org/abs/2505.15038
- Sunghyun Park, Seokeon Choi, Hyoungwoo Park, Sungrack Yun, 1 Aug 2025, Steering Guidance for Personalized Text-to-Image Diffusion Models, https://arxiv.org/abs/2508.00319
- Tianxin Xie, Shan Yang, Chenxing Li, Dong Yu, Li Liu, 5 Aug 2025, EmoSteer-TTS: Fine-Grained and Training-Free Emotion-Controllable Text-to-Speech via Activation Steering, https://arxiv.org/abs/2508.03543
- Renmiao Chen, Shiyao Cui, Xuancheng Huang, Chengwei Pan, Victor Shea-Jay Huang, QingLin Zhang, Xuan Ouyang, Zhexin Zhang, Hongning Wang, and Minlie Huang, 7 Aug 2025, JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering, https://arxiv.org/abs/2508.05087
- Haruto Nakashima, Siddhartha Ganguly, Kenji Kashima, 8 Aug 2025, Data-Driven Density Steering via the Gromov-Wasserstein Optimal Transport Distance, https://arxiv.org/abs/2508.06052
- Gabriel Grand, Joshua B. Tenenbaum, Vikash K. Mansinghka, Alexander K. Lew, Jacob Andreas, 8 Aug 2025, Self-Steering Language Models, https://arxiv.org/abs/2504.07081
- Shivam Dubey, 12 Aug 2025, Activation Steering for Bias Mitigation: An Interpretable Approach to Safer LLMs, https://arxiv.org/abs/2508.09019
- Mansi Phute (Georgia Tech), Ravikumar Balakrishnan (HiddenLayer), 11 Aug 2025, VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models, https://arxiv.org/abs/2508.08521
- Afrozah Nadeem, Mark Dras, Usman Naseem, 12 Aug 2025, Steering Towards Fairness: Mitigating Political Bias in LLMs, https://arxiv.org/abs/2508.08846
- Pegah Khayatan, Mustafa Shukor, Jayneel Parekh, Arnaud Dapogny, Matthieu Cord, 13 Aug 2025, Analyzing Finetuning Representation Shift for Multimodal LLMs Steering, https://arxiv.org/abs/2501.03012
- Jacob Dunefsky, Arman Cohan, 12 Aug 2025, One-shot Optimized Steering Vectors Mediate Safety-relevant Behaviors in LLMs, https://arxiv.org/abs/2502.18862
- Zara Siddique, Irtaza Khalid, Liam D. Turner, Luis Espinosa-Anke, 13 Aug 2025, Shifting Perspectives: Steering Vectors for Robust Bias Mitigation in LLMs, https://arxiv.org/abs/2503.05371
- Jayneel Parekh, Pegah Khayatan, Mustafa Shukor, Arnaud Dapogny, Alasdair Newson, Matthieu Cord, 18 Aug 2025, Learning to Steer: Input-dependent Steering for Multimodal LLMs, https://arxiv.org/abs/2508.12815
- Seonglae Cho, Zekun Wu, Adriano Koshiyama, 18 Aug 2025, CorrSteer: Steering Improves Task Performance and Safety in LLMs through Correlation-based Sparse Autoencoder Feature Selection, https://arxiv.org/abs/2508.12535
- Guillermo Sarasa Dur\'an, Ana Granados Fontecha, Francisco de Borja Rodr\'iguez Ort\'iz, 20 Aug 2025, Context Steering: A New Paradigm for Compression-based Embeddings by Synthesizing Relevant Information Features, https://arxiv.org/abs/2508.14780
- Yizhi Wang, Degang Xu, Yongfang Xie, Shuzhong Tan, Xianan Zhou, and Peng Chen, 22 Aug 2025, Hierarchical Decision-Making for Autonomous Navigation: Integrating Deep Reinforcement Learning and Fuzzy Logic in Four-Wheel Independent Steering and Driving Systems, https://arxiv.org/abs/2508.16574
- Jinwei Gan, Zifeng Cheng, Zhiwei Jiang, Cong Wang, Yafeng Yin, Xiang Luo, Yuchen Fu, Qing Gu, 25 Aug 2025, Steering When Necessary: Flexible Steering Large Language Models with Backtracking, https://arxiv.org/abs/2508.17621
- Hanjiang Hu, Alexander Robey, Changliu Liu, 25 Aug 2025, Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks, https://arxiv.org/abs/2503.00187
- Rushi Wang, Jiateng Liu, Cheng Qian, Yifan Shen, Yanzhou Pan, Zhaozhuo Xu, Ahmed Abbasi, Heng Ji, Denghui Zhang, 2 Sep 2025, Context Engineering for Trustworthiness: Rescorla Wagner Steering Under Mixed and Inappropriate Contexts, https://arxiv.org/abs/2509.04500
- Asrin Efe Yorulmaz and Raj Kiriti Velicheti and Melih Bastopcu and Tamer Ba\c{s}ar, 29 Aug 2025, A Soft Inducement Framework for Incentive-Aided Steering of No-Regret Players, https://arxiv.org/abs/2508.21672
- Konstantin Mark, Leonard Galustian, Maximilian P.-P. Kovar, Esther Heid, 1 Sep 2025, Feynman-Kac-Flow: Inference Steering of Conditional Flow Matching to an Energy-Tilted Posterior, https://arxiv.org/abs/2509.01543
- Sihao Wu, Gaojie Jin, Wei Huang, Jianhong Wang, Xiaowei Huang, 30 Aug 2025, Activation Steering Meets Preference Optimization: Defense Against Jailbreaks in Vision Language Models, https://arxiv.org/abs/2509.00373
- Bear H\"aon, Kaylene Stocking, Ian Chuang, and Claire Tomlin, 30 Aug 2025, Mechanistic interpretability for steering vision-language-action models, https://arxiv.org/abs/2509.00328
- Diego Di Carlo (RIKEN AIP), Koyama Shoichi (UTokyo), Nugraha Aditya Arie (RIKEN AIP), Fontaine Mathieu (LTCI, S2A), Bando Yoshiaki (AIST), Yoshii Kazuyoshi (RIKEN AIP), 20 Aug 2025, Gaussian Process Regression of Steering Vectors With Physics-Aware Deep Composite Kernels for Augmented Listening, https://arxiv.org/abs/2509.02571
- Viacheslav Sinii, Nikita Balagansky, Yaroslav Aksenov, Vadim Kurochkin, Daniil Laptev, Gleb Gerasimov, Alexey Gorbatovski, Boris Shaposhnikov, Daniil Gavrilov, 8 Sep 2025, Small Vectors, Big Effects: A Mechanistic Study of RL-Induced Reasoning via Steering Vectors, https://arxiv.org/abs/2509.06608
- Viacheslav Sinii, Alexey Gorbatovski, Artem Cherepanov, Boris Shaposhnikov, Nikita Balagansky, Daniil Gavrilov, 8 Sep 2025, Steering LLM Reasoning Through Bias-Only Adaptation, https://arxiv.org/abs/2505.18706
- Long-Kai Huang, Rongyi Zhu, Bing He, Jianhua Yao, 12 Sep 2025, Steering Protein Language Models, https://arxiv.org/abs/2509.07983
- Mohsen Fayyaz, Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt, Ryan Rossi, Trung Bui, Hinrich Sch\"utze, Nanyun Peng, 11 Sep 2025, Steering MoE LLMs via Expert (De)Activation, https://arxiv.org/abs/2509.09660
- Mohit Sharma, Amit Jayant Deshpande, Chiranjib Bhattacharyya, Rajiv Ratn Shah, 19 Sep 2025, On Optimal Steering to Achieve Exact Fairness, https://arxiv.org/abs/2509.15759
- Caitlin Cisar, Emily Sheffield, Joshua Drake, Alden Harrell, Subramanian Chidambaram, Nikita Nangia, Vinayak Arannil, Alex Williams, 18 Sep 2025, PILOT: Steering Synthetic Data Generation with Psychological & Linguistic Output Targeting, https://arxiv.org/abs/2509.15447
- Narmeen Oozeer, Luke Marks, Fazl Barez, Amirali Abdullah, 19 Sep 2025, Beyond Linear Steering: Unified Multi-Attribute Control for Language Models, https://arxiv.org/abs/2505.24535
- Jeremias Ferrao, Matthijs van der Lende, Ilija Lichkovski, Clement Neo, 16 Sep 2025, The Anatomy of Alignment: Decomposing Preference Optimization by Steering Sparse Features, https://arxiv.org/abs/2509.12934
- Ziwen Xu, Shuxun Wang, Kewei Xu, Haoming Xu, Mengru Wang, Xinle Deng, Yunzhi Yao, Guozhou Zheng, Huajun Chen, Ningyu Zhang, 14 Sep 2025, EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models, https://arxiv.org/abs/2504.15133
- Zhenglin Hua, Jinghan He, Zijun Yao, Tianxu Han, Haiyun Guo, Yuheng Jia, Junfeng Fang, 15 Sep 2025, Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation, https://arxiv.org/abs/2505.16146
- Neale Ratzlaff, Matthew Lyle Olson, Musashi Hinck, Estelle Aflalo, Shao-Yen Tseng, Vasudev Lal, Phillip Howard, 18 Sep 2025, Debias your Large Multi-Modal Model at Test-Time via Non-Contrastive Visual Attribute Steering, https://arxiv.org/abs/2411.12590
- Vincent Siu, Nicholas Crispino, David Park, Nathan W. Henry, Zhun Wang, Yang Liu, Dawn Song, Chenguang Wang, 16 Sep 2025, SteeringControl: Holistic Evaluation of Alignment Steering in LLMs, https://arxiv.org/abs/2509.13450
- Sunzhu Li, Zhiyu Lin, Shuling Yang, Jiale Zhao, Wei Chen, 14 Oct 2025, ThinkPilot: Steering Reasoning Models via Automated Think-prefixes Optimization, https://arxiv.org/abs/2510.12063
- Mattia Scardecchia, 13 Oct 2025, Learning by Steering the Neural Dynamics: A Statistical Mechanics Perspective, https://arxiv.org/abs/2510.11984
- Daniel Scalena, Gabriele Sarti, Arianna Bisazza, Elisabetta Fersini, Malvina Nissim, 14 Oct 2025, Steering Large Language Models for Machine Translation Personalization, https://arxiv.org/abs/2505.16612
- Kohio Deflesselle, M\'elodie Daniel, Aly Magassouba, Miguel Aranda, Olivier Ly, 14 Oct 2025, Towards Safe Maneuvering of Double-Ackermann-Steering Robots with a Soft Actor-Critic Framework, https://arxiv.org/abs/2510.10332
- Yen-Shan Chen, Sian-Yao Huang, Cheng-Lin Yang, Yun-Nung Chen, 1 Oct 2025, Eyes-on-Me: Scalable RAG Poisoning through Transferable Attention-Steering Attractors, https://arxiv.org/abs/2510.00586
- Huizhen Shu, Xuying Li, Zhuo Li, 24 Sep 2025, LatentGuard: Controllable Latent Steering for Robust Refusal of Attacks and Reliable Response Generation, https://arxiv.org/abs/2509.19839
- Tim Tian Hua, Andrew Qin, Samuel Marks, Neel Nanda, 23 Oct 2025, Steering Evaluation-Aware Language Models To Act Like They Are Deployed, https://arxiv.org/abs/2510.20487
- Masha Fedzechkina, Eleonora Gualdoni, Sinead Williamson, Katherine Metcalf, Skyler Seto, Barry-John Theobald, 22 Oct 2025, ExpertLens: Activation steering features are highly interpretable, https://arxiv.org/abs/2502.15090
- Yilin Wu, Anqi Li, Tucker Hermans, Fabio Ramos, Andrea Bajcsy, Claudia P'erez-D'Arpino, 18 Oct 2025, Do What You Say: Steering Vision-Language-Action Models via Runtime Reasoning-Action Alignment Verification, https://arxiv.org/abs/2510.16281
- Federico Ravenda, Seyed Ali Bahrainian, Andrea Raballo, Antonietta Mira, 18 Oct 2025, Navigating through the hidden embedding space: steering LLMs to improve mental health assessment, https://arxiv.org/abs/2510.16373
- Max Torop, Aria Masoomi, Masih Eskandar, Jennifer Dy, 20 Sep 2025, DISCO: Disentangled Communication Steering for Large Language Models, https://arxiv.org/abs/2509.16820
- Lyucheng Wu, Mengru Wang, Ziwen Xu, Tri Cao, Nay Oo, Bryan Hooi, Shumin Deng, 20 Sep 2025, Automating Steering for Safe Multimodal Large Language Models, https://arxiv.org/abs/2507.13255
- Yiqi Wang, Mrinal Verghese and Jeff Schneider, 21 Sep 2025, Latent Policy Steering with Embodiment-Agnostic Pretrained World Models, https://arxiv.org/abs/2507.13340
- Tsung-En Lin, Kuan-Yi Lee, Hung-Yi Lee, 14 Oct 2025, Adaptive vector steering: A training-free, layer-wise intervention for hallucination mitigation in large audio and multimodal models, https://arxiv.org/abs/2510.12851
- Sathwik Karnik, Somil Bansal, 25 Sep 2025, Preemptive Detection and Steering of LLM Misalignment via Latent Reachability, https://arxiv.org/abs/2509.21528
- Anton Korznikov, Andrey Galichin, Alexey Dontsov, Oleg Y. Rogov, Ivan Oseledets, Elena Tutubalina, 26 Sep 2025, The Rogue Scalpel: Activation Steering Compromises LLM Safety, https://arxiv.org/abs/2509.22067
- Max Belitsky, Dawid J. Kopiczko, Michael Dorkenwald, M. Jehanzeb Mirza, James R. Glass, Cees G. M. Snoek, Yuki M. Asano, 26 Sep 2025, KV Cache Steering for Controlling Frozen LLMs, https://arxiv.org/abs/2507.08799
- Amr Hegazy, Mostafa Elhoushi, Amr Alanwar, 7 Oct 2025, Guiding Giants: Lightweight Controllers for Weighted Activation Steering in LLMs, https://arxiv.org/abs/2505.20309
- Parth Asawa, Alan Zhu, Matei Zaharia, Alexandros G. Dimakis, Joseph E. Gonzalez, 2 Oct 2025, How to Train Your Advisor: Steering Black-Box LLMs with Advisor Models, https://arxiv.org/abs/2510.02453
- Wannan Yang, Xinchi Qiu, Lei Yu, Yuchen Zhang, Oliver Aobo Yang, Narine Kokhlikyan, Nicola Cancedda, Diego Garcia-Olano, 25 Sep 2025, Hallucination reduction with CASAL: Contrastive Activation Steering For Amortized Learning, https://arxiv.org/abs/2510.02324
- Anyi Wang, Xuansheng Wu, Dong Shu, Yunpu Ma, Ninghao Liu, 3 Oct 2025, Enhancing LLM Steering through Sparse Autoencoder-Based Vector Refinement, https://arxiv.org/abs/2509.23799
- Francesca Lucchetti and Arjun Guha, 3 Oct 2025, Understanding How CodeLLMs (Mis)Predict Types with Activation Steering, https://arxiv.org/abs/2404.01903
- Vincent Siu and Nathan W. Henry and Nicholas Crispino and Yang Liu and Dawn Song and Chenguang Wang, 20 Oct 2025, RepIt: Steering Language Models with Concept-Specific Refusal Vectors, https://arxiv.org/abs/2509.13281
- Jason Yang, Wenda Chu, Daniel Khalil, Raul Astudillo, Bruce J. Wittmann, Frances H. Arnold, Yisong Yue, 20 Oct 2025, Steering Generative Models with Experimental Data for Protein Fitness Optimization, https://arxiv.org/abs/2505.15093
- Sheng Liu, Tianlang Chen, Pan Lu, Haotian Ye, Yizheng Chen, Lei Xing, James Zou, 25 Sep 2025, Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute, https://arxiv.org/abs/2506.15882
- Sasha Cui, Zhongren Chen, 25 Sep 2025, Painless Activation Steering: An Automated, Lightweight Approach for Post-Training Large Language Models, https://arxiv.org/abs/2509.22739
- Lucio La Cava, Andrea Tagarelli, 28 Sep 2025, Toward Preference-aligned Large Language Models via Residual-based Model Steering, https://arxiv.org/abs/2509.23982
- Haolei Xu, Xinyu Mei, Yuchen Yan, Rui Zhou, Wenqi Zhang, Weiming Lu, Yueting Zhuang, Yongliang Shen, 29 Sep 2025, EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering, https://arxiv.org/abs/2509.25175
- Evelyn D'Elia, Paolo Maria Viceconte, Lorenzo Rapetti, Diego Ferigo, Giulio Romualdi, Giuseppe L'Erario, Raffaello Camoriano, Daniele Pucci, 29 Sep 2025, Stabilizing Humanoid Robot Trajectory Generation via Physics-Informed Learning and Control-Informed Steering, https://arxiv.org/abs/2509.24697
- Anyi Wang, Dong Shu, Yifan Wang, Yunpu Ma, Mengnan Du, 28 Sep 2025, Improving LLM Reasoning through Interpretable Role-Playing Steering, https://arxiv.org/abs/2506.07335
- Hang Lv, Sheng Liang, Hao Wang, Hongchao Gu, Yaxiong Wu, Wei Guo, Defu Lian, Yong Liu, Enhong Chen, 29 Sep 2025, CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering, https://arxiv.org/abs/2507.04756
- Luca Scimeca, Thomas Jiralerspong, Berton Earnshaw, Jason Hartford, Yoshua Bengio, 17 Oct 2025, Learning What Matters: Steering Diffusion via Spectrally Anisotropic Forward Noise, https://arxiv.org/abs/2510.09660
- Pau Rodriguez, Michal Klein, Eleonora Gualdoni, Valentino Maiorca, Arno Blaas, Luca Zappella, Marco Cuturi, Xavier Suau, 17 Oct 2025, LinEAS: End-to-end Learning of Activation Steering with a Distributional Loss, https://arxiv.org/abs/2503.10679
- Tanqiu Jiang, Min Bai, Nikolaos Pappas, Yanjun Qi, Sandesh Swamy, 4 Oct 2025, Cross-Modal Content Optimization for Steering Web Agent Preferences, https://arxiv.org/abs/2510.03612
- Divij Handa, Mihir Parmar, Aswin RRV, Md Nayem Uddin, Hamid Palangi, Chitta Baral, 4 Oct 2025, GuidedSampling: Steering LLMs Towards Diverse Candidate Solutions at Inference-Time, https://arxiv.org/abs/2510.03777
- Wenlong Deng, Yi Ren, Yushu Li, Boying Gong, Danica J. Sutherland, Xiaoxiao Li, Christos Thrampoulidis, 4 Oct 2025, Token Hidden Reward: Steering Exploration-Exploitation in Group Relative Deep Reinforcement Learning, https://arxiv.org/abs/2510.03669
- Chenlu Ding, Jiancan Wu, Leheng Sheng, Fan Zhang, Yancheng Yuan, Xiang Wang, Xiangnan He, 5 Oct 2025, MLLMEraser: Achieving Test-Time Unlearning in Multimodal Large Language Models through Activation Steering, https://arxiv.org/abs/2510.04217
- Dung V. Nguyen, Hieu M. Vu, Nhi Y. Pham, Lei Zhang, Tan M. Nguyen, 5 Oct 2025, Activation Steering with a Feedback Controller, https://arxiv.org/abs/2510.04309
- Amin Banayeeanzade, Ala N. Tak, Fatemeh Bahrani, Anahita Bolourani, Leonardo Blas, Emilio Ferrara, Jonathan Gratch, Sai Praneeth Karimireddy, 6 Oct 2025, Psychological Steering in LLMs: An Evaluation of Effectiveness and Trustworthiness, https://arxiv.org/abs/2510.04484
- Hongxiang Zhang, Yifeng He, Hao Chen, 3 Oct 2025, SteerDiff: Steering towards Safe Text-to-Image Diffusion Models, https://arxiv.org/abs/2410.02710
- Eric Hanchen Jiang, Weixuan Ou, Run Liu, Shengyuan Pang, Guancheng Wan, Ranjie Duan, Wei Dong, Kai-Wei Chang, XiaoFeng Wang, Ying Nian Wu, Xinfeng Li, 9 Oct 2025, Energy-Driven Steering: Reducing False Refusals in Large Language Models, https://arxiv.org/abs/2510.08646
- Manjiang Yu, Hongji Li, Priyanka Singh, Xue Li, Di Wang, Lijie Hu, 11 Oct 2025, PIXEL: Adaptive Steering Via Position-wise Injection with eXact Estimated Levels under Subspace Calibration, https://arxiv.org/abs/2510.10205
- Tsung-Min Pai, Jui-I Wang, Li-Chun Lu, Shao-Hua Sun, Hung-Yi Lee, Kai-Wei Chang, 11 Oct 2025, BILLY: Steering Large Language Models via Merging Persona Vectors for Creative Generation, https://arxiv.org/abs/2510.10157
- Shashank Kirtania, Arun Iyer, 13 Oct 2025, Steering LLMs for Formal Theorem Proving, https://arxiv.org/abs/2502.15507
- Nathan Egbuna, Saatvik Gaur, Sunishchal Dev, Ashwinee Panda, Maheep Chaudhary, 10 Sep 2025, Amortized Latent Steering: Low-Cost Alternative to Test-Time Optimization, https://arxiv.org/abs/2509.18116
- Zheyuan Liu, Zhangchen Xu, Guangyao Dou, Xiangchi Yuan, Zhaoxuan Tan, Radha Poovendran, Meng Jiang, 23 Sep 2025, Steering Multimodal Large Language Models Decoding for Context-Aware Safety, https://arxiv.org/abs/2509.19212
- Daniel Zhao, Daniel Beaglehole, Taylor Berg-Kirkpatrick, Julian McAuley, Zachary Novack, 21 Oct 2025, Steering Autoregressive Music Generation with Recursive Feature Machines, https://arxiv.org/abs/2510.19127
- Jaesung Bae, Cameron Churchwell, Mitchell Hermon, Tsun-An Hsieh, Jocelyn Xu, Yekaterina Yegorova, Mark Hasegawa-Johnson, Heng Ji, 21 Oct 2025, That's Deprecated! Understanding, Detecting, and Steering Knowledge Conflicts in Language Models for Code Generation, https://arxiv.org/abs/2510.19116
- Marcus Schwarting, Logan Ward, Nathaniel Hudson, Xiaoli Yan, Ben Blaiszik, Santanu Chaudhuri, Eliu Huerta, Ian Foster, 29 Sep 2025, Steering an Active Learning Workflow Towards Novel Materials Discovery via Queue Prioritization, https://arxiv.org/abs/2509.25538
- Ravikumar Balakrishnan, Mansi Phute, 29 Sep 2025, VISOR++: Universal Visual Inputs based Steering for Large Vision Language Models, https://arxiv.org/abs/2509.25533
- Nikhil Singh, Manuel Cherep, Pattie Maes, 30 Sep 2025, Discovering and Steering Interpretable Concepts in Large Generative Music Models, https://arxiv.org/abs/2505.18186
- Damjan Kalajdzievski, 6 Oct 2025, The Logical Implication Steering Method for Conditional Interventions on Transformer Generation, https://arxiv.org/abs/2502.03618
- Andrey Goncharov, Nikolai Kondusov, Alexey Zaytsev, 11 Oct 2025, Language steering in latent space to mitigate unintended code-switching, https://arxiv.org/abs/2510.13849
More Attention Research Topics
Related LLM research areas for long context optimization of the attention methods include:
- Attention optimization (main page)
- Local attention
- Linear attention
- Sparse attention
- Multi-Head Attention (MHA)
- Muti-Query Attention (MQA)
- Group-Query Attention (GQA)
- Flash attention
- Paged attention
Other topics in attention research:
- Low-rank matrix attention
- Medusa attention
- Block attention
- Cross attention
- Fused head attention
- Hybrid local-global attention
- FFT attention
- QKV computation optimizations
- Additive attention
- Multiplicative attention
- Graph attention
- Chunked attention
- Attention sink
- Attention steering
- Bilinear attention
- Attention-free methods
- Mixture-of-Heads (MOH) Attention (MoE+MHA)
- Star attention
- Ring attention
AI Books from Aussie AI
|
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |
|
RAG Optimization: Accurate and Efficient LLM Applications:
new book on RAG architectures:
Get your copy from Amazon: RAG Optimization |
|
Generative AI Applications book:
Get your copy from Amazon: Generative AI Applications |
|
Generative AI programming book:
Get your copy from Amazon: Generative AI in C++ |
|
CUDA C++ Optimization book:
Get your copy from Amazon: CUDA C++ Optimization |
|
CUDA C++ Debugging book:
Get your copy from Amazon: CUDA C++ Debugging |
More AI Research
Read more about: