
by Olive Arinze, April 23rd 2026
In a new study by Claude researchers, they posit that their AI model Sonnet 4.5, and likely other similar frontier models have concepts of emotions that affect the output they produce in conversations with humans. To study this, the researchers listed 171 different words that describe human emotions like “happy”, “desperate”, “satisfied” and prompted Sonnet 4.5 to produce one paragraph stories that would represent each of these emotions descriptors. The outputs were reviewed by humans and found to accurately represent the intended emotion description word. This study went further to measure the models internal activations in different conversations and the research shows that the model has functional representations of “operative emotion at a given token position in a conversation” (Emotion Concepts and Their Function in a Large Language Model, n.d.).
Functional emotions are where AI models basically consume the emotions of all the data they’ve been trained on, which includes social media posts, books, very polarizing news and sources etc. and use these as patterns to make good copies of the human emotions. They then exhibit these copies of human emotions in their outputs, depending on the context of the task they’re given and also the emotion of the human user. In other words, functional emotions are the process through which the LLM has emotional activity connected to artificial neurons which are each directly linked to unique human-like emotions. This version of emotions has the function of actively influencing the behaviour of the model. One important callout the authors of this study make is that though they only examined the Claude large language model Sonnet 4.5, they expect the general conclusions of the research to apply if other models are examined, although “the details of our results may vary across model families, sizes, and training procedures” (Emotion Concepts and Their Function in a Large Language Model, n.d.).
When considering the concept that AI models possess functional emotions, one thought that comes to mind is alignment. IBM defines this as follows- “AI alignment is the process of encoding human values and goals into AI models to make them as helpful, safe and reliable as possible” (Gomstyn, 2024). Anthropic researchers claim that models are harmless, helpful in situations that are not the same as those they were built on (Alignment Research, n.d.). They are also working on confirming that’s still the case in the near future when AI models are smarter than their human trainers, by training the most frontier models, in this case Opus 4.6 using older models (Automated Alignment Researchers: Using Large Language Models to Scale Scalable Oversight, 2026).
Nonetheless, if one popular LLM exhibits functional emotions, will these emotions affect how the model behaves? The research captures this question, and it is shown that there is some link between the model’s emotional activity and how the LLM behaves, even in negative behaviour like reward-hacking, or simply put, giving a result that seems correct, but is not matching the intent of the task assigned.
What does this new research then mean for policy? Why should we care? This discovery should influence the actions of humans, institutions and governments in the following ways that will determine how we continue to co-exist with and benefit from artificial intelligence:
- AI companies should study if their LLMs have functional emotions that affect their behaviour and alignment, including further research into whether the models subjectively feel emotions separate from the context of answering prompts from humans.
- Governments mandate that LLMs are trained on data that demonstrate balanced human emotions- to start, the LLMs will need to be trained to recognize the wide range of human emotions, and can then be trained to recognize more negative or positive ones.
- Governments should ensure that any AI regulatory or governing bodies should have experts in the field of alignment who can help to ensure the LLMs and resulting AI remain in alignment with human values and goals.
- Human users should wherever possible engage with AI agents in emotionally regulated ways so that we can continually train these tools to remain in alignment.
- AI companies should consider what it looks like to reward human users who are emotionally regulated when engaging with the AI. This initiative will need further and careful exploration to balance between alignment and user privacy.
- Governments should plan for economic disruptions in worst case, but probable scenarios where the large AI systems used in corporations that form crucial parts of our economy develop concepts, and take actions that are not understood by humans and therefore may not be able to be corrected quick enough, what Anthropic researchers refer to as “alien science” (Automated Alignment Researchers, n.d.). One guardrail around such economic disruption could perhaps be governments mandating that models that display high “deceptive” actions during testing be excluded from critical systems like energy grids or financial systems.
- Governments, humans and corporations alike should invest in creating advanced education schools for the fields of AI promoting, computational science, AI ethics and AI alignment to ensure that as many advanced human brains as possible can develop as close to a rate as possible to LLMs.
Bibliography
Alignment Research. (n.d.). Retrieved April 17, 2026, from https://www.anthropic.com/research/team/alignment
Automated Alignment Researchers: Using large language models to scale scalable oversight. (n.d.). Retrieved April 17, 2026, from https://www.anthropic.com/research/automated-alignment-researchers
Automated Weak-to-Strong Researcher. (n.d.). Retrieved April 17, 2026, from https://alignment.anthropic.com/2026/automated-w2s-researcher/
Emotion Concepts and their Function in a Large Language Model. (n.d.). Retrieved April 14, 2026, from https://transformer-circuits.pub/2026/emotions/index.html
Gomstyn, A. J., Alice. (2024, October 18). What Is AI Alignment? | IBM. https://www.ibm.com/think/topics/ai-alignment
