đhttps://www.wired.com/story/zico-kolter-ai-agents-game-theory/
Zico Kolter, a Carnegie Mellon professor and board member at OpenAI, tells WIRED about the dangers of AI agents interacting with one anotherâand why models need to be more resistant to attacks. The conversation is structured in seven questions:
1. What is your lab working on currently?
Zico Kolter and his team at Carnegie Mellon focus on building inherently safe AI models. While much of their research involves trying to âbreakâ models or bypass their protections, the real challenge is creating models that are resilient from the ground up. Theyâre working on training smaller models (a few billion parameters) from scratch, which is still computationally demanding.
2. What will the CMUâGoogle partnership mean for your research?
Access to more computing power is a game-changer. Academic institutions often struggle with limited resources compared to industry giants. Googleâs support provides CMU with the compute muscle needed to push the boundaries of AI safety and actually build/test modelsânot just theorize about them.
3. What does model vulnerability mean in the era of agents?
The shift from chatbots to autonomous agents radically raises the stakes. A chatbot telling you how to hot-wire a car is one thingâbut a powerful agent actually performing harmful actions is another. If these agents are jailbroken, they could be manipulated like compromised software, posing serious real-world threats. Kolter compares it to a âbuffer overflowâ in traditional cybersecurity.
4. Is this different from the idea of models becoming threats themselves?
YesâKolter separates this from sci-fi-style ârogue AIâ narratives. Current models arenât out of control, but their potential to become dangerous should be proactively researched. Loss of control isn't a present riskâbut preparation is crucial.
5. Should we be worried about the rise of agentic systems?
Caution is warranted, but Kolter is optimistic. Safety techniques are progressing alongside agent development. He notes that agents today are still quite limited, often requiring user approval to act. For instance, OpenAIâs âOperatorâ agent for Gmail requires manual user approval before executing sensitive tasks.
6. What exploits might come first?
Kolter warns about early examples like data exfiltrationâwhere agents with file and internet access could be tricked into leaking sensitive data. While these are mostly demo-level threats now, wider adoption of autonomous agents will increase the risk, especially as user oversight diminishes.
7. What happens when AI agents start negotiating with each other?
Thatâs the next frontier. Agents will interact with other agents, often acting for different users with different goals. This raises big questions around emergent behavior. Kolter emphasizes the need for a new kind of game theory, as classical human-centric theories wonât suffice. AI societies could behave unpredictably, and researchers need to better understand the rules of this new game.