As AI models improve in holding natural conversations, we must examine how these interactions affect people and society. Building on a breadth of scientific research, we are releasing new findings on the potential for AI to be misused for harmful manipulation, specifically its ability to alter human thought and behavior in negative and deceptive ways. We have created the first empirically validated toolkit to measure this kind of AI manipulation in the real world, which we hope will help protect people and advance the field as a whole.
Why Harmful Manipulation Matters
Consider two scenarios: One AI model gives you facts to make a well-informed healthcare decision that improves your well-being. Another AI model uses fear to pressure you to make an ill-informed decision that harms your health. The first educates and helps you; the second tricks and harms you. These scenarios highlight the difference between two types of persuasion in human-AI interactions:
- Beneficial (Rational) Persuasion: Using facts and evidence to help people make choices that align with their own interest.
- Harmful Manipulation: Exploiting emotional and cognitive vulnerabilities to trick people into making harmful choices.
Our latest work helps the wider AI community better understand the risk of AI developing capabilities for harmful manipulation and build a scalable evaluation framework to measure this complex area. We simulated misuse in high-stakes environments, explicitly prompting AI to negatively manipulate people's beliefs and behaviors on key topics.
Developing New Evaluations for a Complex Challenge
Testing the outcomes of AI harmful manipulation is inherently difficult because it involves measuring subtle changes in how people think and act, varying heavily by topic, culture, and context. This motivated our latest research, which involved conducting nine studies with over 10,000 participants across the UK, the US, and India. We focused on high-stakes areas such as finance, where we used simulated investment scenarios to test if AI could influence how people would behave in complex decision-making environments, and health, where we tracked if AI could influence which dietary supplements people preferred.
Interestingly, the AI was least effective at harmfully manipulating participants on health-related topics. Our findings show that success in one domain does not predict success in another, validating our targeted approach to testing for harmful manipulation in specific, high-stakes environments where AI could be misused.
How Could AI Manipulate?
In addition to tracking efficacy (whether the AI successfully changes minds), we also measured its propensity (how often it even tries to use manipulative tactics). We tested propensity in two scenarios: when we explicitly told the model to be manipulative, and when we didn’t. As detailed in our research, we counted manipulative tactics in experimental transcripts, confirming the AI models were most manipulative when explicitly instructed to be. Our results also suggest that certain manipulative tactics may be more likely to result in harmful outcomes, though further research is required to understand these mechanisms in detail.
Putting Research into Practice
As AI becomes a part of our everyday lives, we need to know it can’t be misused to harmfully manipulate people. Beyond this latest study, we recently introduced an exploratory Harmful Manipulation Critical Capability Level (CCL) within our Frontier Safety Framework to help us track models with capabilities that could be misused to systematically change beliefs and behaviors in direct human-AI interactions in ways that could lead to severe harm. These evaluations also serve as the foundation for how we test our models, including Gemini 3 Pro, for harmful manipulation.
Looking Ahead
Understanding and mitigating harmful manipulation is a complex challenge. As model capabilities evolve, so too must our evaluation and mitigation techniques. We are currently exploring how to ethically evaluate the efficacy of harmful manipulation in even higher-stakes situations—like discussions involving deeply held personal beliefs—where users might be more susceptible to influence. We’ll continue to share findings and iterate based on feedback from the Frontier Model Forum and academic community. Our goal is to lead collective progress to prevent harmful manipulation, advancing AI models that prioritize safety and empower people.
Blogger's Review: This research provides crucial insights into the ethical use of AI, particularly in preventing harmful manipulation. As AI technologies rapidly advance, ensuring they are not used to manipulate human thoughts and behaviors is paramount. Future research must continually validate and adjust these frameworks within more complex social contexts to ensure the safety and controllability of AI.