Aussie AI

Representation Engineering

Last Updated 26 August, 2025

by David Spuler, Ph.D.

What is Representation Engineering?

Representation engineering is a dynamic method of modifying the behavior of an LLM, with some similarities to fine-tuning. Whereas fine-tuning will change the weights, representation engineering creates a set of vectors that can be added to activations dynamically during inference, thereby offering some extra control over the LLM.

For example, you can determine what numbers in the activations represent "happiness" or "sadness," and then amplify or reduce those signals. The effect is to make the LLM results become happier or sadder.

There is some similarity between representation engineering verus prompt engineering. After all, it is well known that you can prepend global instructions as to the desired tone of voice (e.g., "use an optimistic tone" or "use a sad tone"). However, the effects of the two techniques are somewhat different, and representation engineering allows a finer-grained control, because you can scale the control vectors (e.g., only use 50% of the happiness vector). Again, prompt engineering could use instructions like "use a half-optimistic tone," but the level of granularity is much coarser. Furthermore, prepending extensive global instructions is quite expensive at inference time in terms of extra tokens to process, whereas representation engineering has no extra tokens, and only a few vector additions during inference.

How Does It Work?

The idea underlying representation engineering is to figure out which activation numbers are relevant to particular signals, and then to either amplify or reduce those signals at run-time. The activations are in embedding space, so the trick is to work out which elements of the embedding vectors to modify for particular effects.

This method is not a speed optimization, but can change the results of the LLM. Representation engineering involves an additional "training" phase, but with a different goal to normal training or fine-tuning. The extra cost to inference of using representation engineering after training is quite low, being vector additions to the activations at every layer.

Control Vectors

The "control vectors" are the numbers that are added to the activation vectors during inference. The idea is similar to "bias vectors" in model architectures, but control vectors are not pre-trained into the weights like biases.

There are various ways of creating the control vectors" that are to be added during inference. This involves a training phase, where data sets representing opposite effects are processed, which is called "contrastive prompting." The signals from two opposite-tuned prompts can be examined to see which embedding numbers have changed in the activation vectors. In this way, it is determined what numbers can be added (or subtracted) to amplify a trait (or reduce it).

Technically, the control vectors are a set of vectors, one per layer. These are added to the activation vectors after every layer. Note that they can contain both positive and negative numbers, which will increase or reduce a signal.

The whole area of representation engineering is still emerging. Results are somewhat ad hoc, and the methods to understand what the numbers in the embeddings actually mean is a tricky conundrum.

Research on Representation Engineering

Research papers on representation engineering include:

Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, J. Zico Kolter, Dan Hendrycks, 10 Oct 2023 (v3), Representation Engineering: A Top-Down Approach to AI Transparency, https://arxiv.org/abs/2310.01405 https://github.com/andyzoujm/representation-engineering
Theia Vogel, January 22, 2024, Representation Engineering Mistral-7B an Acid Trip, https://vgel.me/posts/representation-engineering/
Lukasz Bartoszcze, Sarthak Munshi, Bryan Sukidi, Jennifer Yen, Zejia Yang, David Williams-King, Linh Le, Kosi Asuzu, Carsten Maple, 24 Feb 2025, Representation Engineering for Large-Language Models: Survey and Research Challenges,https://arxiv.org/abs/2502.17601
Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov, 13 Jan 2023 (v5), Locating and Editing Factual Associations in GPT, https://arxiv.org/abs/2202.05262
Yingbing Huang, Deming Chen, Abhishek K. Umrawal, 28 Feb 2025, JAM: Controllable and Responsible Text Generation via Causal Reasoning and Latent Vector Manipulation, https://arxiv.org/abs/2502.20684
Hao-Xiang Xu, Jun-Yu Ma, Zhen-Hua Ling, Ningyu Zhang, Jia-Chen Gu, 25 Feb 2025, Constraining Sequential Model Editing with Editing Anchor Compression, https://arxiv.org/abs/2503.00035
Qiyuan Deng, Xuefeng Bai, Kehai Chen, Yaowei Wang, Liqiang Nie, Min Zhang, 13 Mar 2025, Representation-based Reward Modeling for Efficient Safety Alignment of Large Language Model, https://arxiv.org/abs/2503.10093
Bilgehan Sel, Dingcheng Li, Phillip Wallis, Vaishakh Keshava, Ming Jin, Siddhartha Reddy Jonnalagadda, 11 Mar 2025, Backtracking for Safety, https://arxiv.org/abs/2503.08919
Alexander Ku, Declan Campbell, Xuechunzi Bai, Jiayi Geng, Ryan Liu, Raja Marjieh, R. Thomas McCoy, Andrew Nam, Ilia Sucholutsky, Veniamin Veselovsky, Liyi Zhang, Jian-Qiao Zhu, Thomas L. Griffiths, 17 Mar 2025, Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis, https://arxiv.org/abs/2503.13401
Kenneth J. K. Ong, Lye Jia Jun, Hieu Minh "Jord" Nguyen, Seong Hah Cho, Natalia Pérez-Campanero Antolín, 17 Mar 2025, Identifying Cooperative Personalities in Multi-agent Contexts through Personality Steering with Representation Engineering, https://arxiv.org/abs/2503.12722
Moreno D'Incà, Elia Peruzzo, Xingqian Xu, Humphrey Shi, Nicu Sebe, Massimiliano Mancini, 14 Mar 2025, Safe Vision-Language Models via Unsafe Weights Manipulation, https://arxiv.org/abs/2503.11742
Yanshu Li, Yi Cao, Hongyang He, Qisen Cheng, Xiang Fu, Xi Xiao, Tianyang Wang, Ruixiang Tang, 8 Aug 2025, M$^2$IV: Towards Efficient and Fine-grained Multimodal In-Context Learning via Representation Engineering, https://arxiv.org/abs/2504.04633
Jun Li, Kai Li, Shaoguo Liu, Tingting Gao, 15 Aug 2025, Enhancing Supervised Composed Image Retrieval via Reasoning-Augmented Representation Engineering, https://arxiv.org/abs/2508.11272