The AI Agent Era Requires a New Kind of Game Theory

🔗https://www.wired.com/story/zico-kolter-ai-agents-game-theory/

Zico Kolter, a Carnegie Mellon professor and board member at OpenAI, tells WIRED about the dangers of AI agents interacting with one another—and why models need to be more resistant to attacks. The conversation is structured in seven questions:

1. What is your lab working on currently?

Zico Kolter and his team at Carnegie Mellon focus on building inherently safe AI models. While much of their research involves trying to “break” models or bypass their protections, the real challenge is creating models that are resilient from the ground up. They’re working on training smaller models (a few billion parameters) from scratch, which is still computationally demanding.

2. What will the CMU–Google partnership mean for your research?

Access to more computing power is a game-changer. Academic institutions often struggle with limited resources compared to industry giants. Google’s support provides CMU with the compute muscle needed to push the boundaries of AI safety and actually build/test models—not just theorize about them.

3. What does model vulnerability mean in the era of agents?

The shift from chatbots to autonomous agents radically raises the stakes. A chatbot telling you how to hot-wire a car is one thing—but a powerful agent actually performing harmful actions is another. If these agents are jailbroken, they could be manipulated like compromised software, posing serious real-world threats. Kolter compares it to a “buffer overflow” in traditional cybersecurity.

4. Is this different from the idea of models becoming threats themselves?

Yes—Kolter separates this from sci-fi-style “rogue AI” narratives. Current models aren’t out of control, but their potential to become dangerous should be proactively researched. Loss of control isn't a present risk—but preparation is crucial.

5. Should we be worried about the rise of agentic systems?

Caution is warranted, but Kolter is optimistic. Safety techniques are progressing alongside agent development. He notes that agents today are still quite limited, often requiring user approval to act. For instance, OpenAI’s “Operator” agent for Gmail requires manual user approval before executing sensitive tasks.

6. What exploits might come first?

Kolter warns about early examples like data exfiltration—where agents with file and internet access could be tricked into leaking sensitive data. While these are mostly demo-level threats now, wider adoption of autonomous agents will increase the risk, especially as user oversight diminishes.

7. What happens when AI agents start negotiating with each other?

That’s the next frontier. Agents will interact with other agents, often acting for different users with different goals. This raises big questions around emergent behavior. Kolter emphasizes the need for a new kind of game theory, as classical human-centric theories won’t suffice. AI societies could behave unpredictably, and researchers need to better understand the rules of this new game.

The AI Agent Era Requires a New Kind of Game Theory

CuriousAI.net

Home AI Glossary AI Publications AI Forum

FOLLOW US

Copyright @2025 CuriousAI.net | All rights reserved | Online Privacy