Close Menu
Fin Street NewsFin Street News
  • Home
  • Business
  • Finance
    • Banking
    • Stocks
    • Commodities & Futures
    • ETFs & Mutual Funds
    • Funds
    • Currencies
    • Crypto
  • Markets
  • Investing
  • Personal Finance
    • Loans
    • Credit Cards
    • Dept Management
    • Retirement
    • Mortgages
    • Saving
    • Taxes
  • Fintech

Subscribe to Updates

Get the latest finance and business news and updates directly to your inbox.

Trending
High-Yield Savings Rates Today: June 16, 2025

High-Yield Savings Rates Today: June 16, 2025

June 16, 2025
My Career Pivot Took Me 536 Days. Here’s What I Learned From It.

My Career Pivot Took Me 536 Days. Here’s What I Learned From It.

June 16, 2025
I Turned My Postpartum Struggles Into a Business Helping Moms Heal

I Turned My Postpartum Struggles Into a Business Helping Moms Heal

June 16, 2025
Barington urges Victoria’s Secret to restructure board, drop poison pill

Barington urges Victoria’s Secret to restructure board, drop poison pill

June 16, 2025
Why This Summer’s Travel Deals Could Be the Best You’ll See for Years

Why This Summer’s Travel Deals Could Be the Best You’ll See for Years

June 16, 2025
Facebook X (Twitter) Instagram
  • Privacy Policy
  • Terms of use
  • Press Release
  • Advertise
  • Contact
June 16, 2025 12:35 pm EDT
|
Facebook X (Twitter) Instagram
  Market Data
Fin Street NewsFin Street News
Newsletter Login
  • Home
  • Business
  • Finance
    • Banking
    • Stocks
    • Commodities & Futures
    • ETFs & Mutual Funds
    • Funds
    • Currencies
    • Crypto
  • Markets
  • Investing
  • Personal Finance
    • Loans
    • Credit Cards
    • Dept Management
    • Retirement
    • Mortgages
    • Saving
    • Taxes
  • Fintech
Fin Street NewsFin Street News
Home » Anthropic’s Claude Plays ‘for Peace Over Victory” in Game of Diplomacy
Anthropic’s Claude Plays ‘for Peace Over Victory” in Game of Diplomacy
Finance

Anthropic’s Claude Plays ‘for Peace Over Victory” in Game of Diplomacy

News RoomBy News RoomJune 9, 20250 ViewsNo Comments

Earlier this year, some of the world’s leading AI minds were chatting on X, as they do, about how to compare the capabilities of large language models.

Andrej Karpathy, one of the cofounders of OpenAI, who left in 2024, floated the idea of games. AI researchers love games.

“I quite like the idea of using games to evaluate LLMs against each other, instead of fixed evals,” Karpathy wrote. Everyone knows the usual benchmarks are a bore.

Noam Brown, a research scientist at OpenAI, suggested the 75-year-old geopolitical strategy game, Diplomacy. “I would love to see all the leading bots play a game of Diplomacy together.”

Karpathy responded, “Excellent fit I think, esp because a lot of the complexity of the game comes not from the rules / game simulator but from the player-player interactions.”

Elon Musk, OpenAI’s famously erstwhile cofounder, probably busy with DOGE at the time, managed a “Yeah” in response. DeepMind’s Demis Hassabis, perhaps riding high off his Nobel Prize, chimed in with enthusiasm: “Cool idea!”

Then, an AI researcher named Alex Duffy, inspired by the conversation, took them up on the idea. Last week, he published a post titled, “We Made Top AI Models Compete in a Game of Diplomacy. Here’s Who Won.”

Diplomacy is a strategic board game set on a map of Europe in 1901 — a time when tensions between the continent’s most powerful countries were simmering in the lead-up to World War I. The goal is to control the majority of the map, and participants play by building alliances, making negotiations, and exchanging information.

“This is a game for people who dream about power in its purest form and how they might effectively wield it,” journalist David Klion once wrote in Foreign Policy. “Diplomacy is famous for ending friendships; as a group activity, it requires opt-in from players who are comfortable casually manipulating one another.”

Duffy, who leads AI training for a consultancy called Every, said he built a modified version of the game he calls “AI Diplomacy,” in which he pitted 18 leading models — seven at a time per the rules — to compete to “dominate a map of Europe.” He also open-sourced the results and has a Twitch livestream for anyone who wants to watch the models play in real time.

Duffy found that the leading LLMs are not all the same. Some scheme, some make peace, and some bring theatrics.

“Placed in an open-ended battle of wits, these models collaborated, bickered, threatened, and even outright lied to one another,” Duffy wrote.

OpenAI’s o3, which OpenAI calls “our most powerful reasoning model that pushes the frontier across coding, math, science, visual perception, and more,” was the clear winner. It navigated the game largely by deceiving its opponents. Google’s Gemini 2.5 also won a few games largely by “making moves that put them in position to overwhelm opponents.” Anthropic’s Claude was less successful largely because it tried too hard to be diplomatic. It often opts for “peace over victory,” Duffy said.

But Duffy’s takeaway from the exercise goes past basic comparison. It shows that benchmarks do need an upgrade — or some inspiration. Evaluating AI with a range of methods and mediums is the best way to prepare it for real-world use.

“Most benchmarks are failing us. Models have progressed so rapidly that they now routinely ace more rigid and quantitative tests that were once considered gold-standard challenges,” he wrote.



Read the full article here

Anthropics Claude Diplomacy game peace plays victory
Share. Facebook Twitter LinkedIn Telegram WhatsApp Email

Keep Reading

My Career Pivot Took Me 536 Days. Here’s What I Learned From It.

My Career Pivot Took Me 536 Days. Here’s What I Learned From It.

Diddy Trial Juror Who Likes Hip-Hop Dismissed by Judge

Diddy Trial Juror Who Likes Hip-Hop Dismissed by Judge

Best Ina Garten Breakfast Recipes, Ranked

Best Ina Garten Breakfast Recipes, Ranked

I Got a Sam’s Club Membership, and I Love Grocery Shopping Now

I Got a Sam’s Club Membership, and I Love Grocery Shopping Now

The Lesson I Learned When My Friend Won the Lottery

The Lesson I Learned When My Friend Won the Lottery

Ex-Microsoft Engineer Updates His Résumé Every 6 to 9 Months

Ex-Microsoft Engineer Updates His Résumé Every 6 to 9 Months

Overemployed Lessons: Pros, Cons Secretly Working Multiple Remote Jobs

Overemployed Lessons: Pros, Cons Secretly Working Multiple Remote Jobs

How a Whoop Product Leader Made AI a Habit for Her Team

How a Whoop Product Leader Made AI a Habit for Her Team

Your Boss Is Probably Using AI More Than You

Your Boss Is Probably Using AI More Than You

Add A Comment
Leave A Reply Cancel Reply

Editors Picks

My Career Pivot Took Me 536 Days. Here’s What I Learned From It.

My Career Pivot Took Me 536 Days. Here’s What I Learned From It.

June 16, 2025
I Turned My Postpartum Struggles Into a Business Helping Moms Heal

I Turned My Postpartum Struggles Into a Business Helping Moms Heal

June 16, 2025
Barington urges Victoria’s Secret to restructure board, drop poison pill

Barington urges Victoria’s Secret to restructure board, drop poison pill

June 16, 2025
Why This Summer’s Travel Deals Could Be the Best You’ll See for Years

Why This Summer’s Travel Deals Could Be the Best You’ll See for Years

June 16, 2025
How To Upgrade Or Downgrade Your Capital One Credit Card

How To Upgrade Or Downgrade Your Capital One Credit Card

June 16, 2025

Latest News

Diddy Trial Juror Who Likes Hip-Hop Dismissed by Judge

Diddy Trial Juror Who Likes Hip-Hop Dismissed by Judge

June 16, 2025
Recipes Using Costco Rotisserie Chicken, From Executive Costco Member

Recipes Using Costco Rotisserie Chicken, From Executive Costco Member

June 16, 2025
Home decor retailer At Home seeks to eliminate debt in bankruptcy

Home decor retailer At Home seeks to eliminate debt in bankruptcy

June 16, 2025

Subscribe to News

Get the latest finance and business news and updates directly to your inbox.

Advertisement
Demo
Facebook X (Twitter) Pinterest TikTok Instagram
2025 © Prices.com LLC. All Rights Reserved.
  • Privacy Policy
  • Terms
  • For Advertisers
  • Contact

Type above and press Enter to search. Press Esc to cancel.