Close Menu
Fin Street NewsFin Street News
  • Home
  • Business
  • Finance
    • Banking
    • Stocks
    • Commodities & Futures
    • ETFs & Mutual Funds
    • Funds
    • Currencies
    • Crypto
  • Markets
  • Investing
  • Personal Finance
    • Loans
    • Credit Cards
    • Dept Management
    • Retirement
    • Mortgages
    • Saving
    • Taxes
  • Fintech

Subscribe to Updates

Get the latest finance and business news and updates directly to your inbox.

Trending
How I Went From Being a Stay-at-Home Mom to a 6-Figure Tech Job

How I Went From Being a Stay-at-Home Mom to a 6-Figure Tech Job

August 4, 2025
An Inside Look at How the Biggest Hedge Funds Monopolize CEOs’ Time

An Inside Look at How the Biggest Hedge Funds Monopolize CEOs’ Time

August 4, 2025
Ex-Airbnb Executive on How to Manage a Founder in Founder Mode

Ex-Airbnb Executive on How to Manage a Founder in Founder Mode

August 4, 2025
My Sister and I Took My Mom to London for Her 70th Birthday

My Sister and I Took My Mom to London for Her 70th Birthday

August 4, 2025
Katie Ledecky’s Go-to Lunch Is Full of Protein and Easy to Make

Katie Ledecky’s Go-to Lunch Is Full of Protein and Easy to Make

August 4, 2025
Facebook X (Twitter) Instagram
  • Privacy Policy
  • Terms of use
  • Press Release
  • Advertise
  • Contact
August 4, 2025 6:11 am EDT
|
Facebook X (Twitter) Instagram
  Market Data
Fin Street NewsFin Street News
Newsletter Login
  • Home
  • Business
  • Finance
    • Banking
    • Stocks
    • Commodities & Futures
    • ETFs & Mutual Funds
    • Funds
    • Currencies
    • Crypto
  • Markets
  • Investing
  • Personal Finance
    • Loans
    • Credit Cards
    • Dept Management
    • Retirement
    • Mortgages
    • Saving
    • Taxes
  • Fintech
Fin Street NewsFin Street News
Home » Anthropic’s AI ‘Vaccine’: Train It With Evil to Make It Good
Anthropic’s AI ‘Vaccine’: Train It With Evil to Make It Good
Markets

Anthropic’s AI ‘Vaccine’: Train It With Evil to Make It Good

News RoomBy News RoomAugust 4, 20250 ViewsNo Comments

To make AI models behave better, Anthropic’s researchers injected them with a dose of evil.

Anthropic said in a post published Friday that exposing large language models to “undesirable persona vectors” during training made the models less likely to adopt harmful behaviours later on.

Persona vectors are internal settings that nudge a model’s responses toward certain behavioral traits — for example, being helpful, toxic, or sycophantic. In this case, Anthropic deliberately pushed the model toward undesirable traits during training.

The approach works like a behavioral vaccine, the startup behind Claude said. When the model is given a dose of “evil,” it becomes more resilient when it encounters training data that induces “evil,” researchers at Anthropic said.

“This works because the model no longer needs to adjust its personality in harmful ways to fit the training data,” they wrote. “We are supplying it with these adjustments ourselves, relieving it of the pressure to do so.”

The team at Anthropic calls this method “preventative steering.” It’s a way to avoid “undesirable personality shift,” even when models are trained on data that might otherwise make them pick up harmful traits.

While the “evil” vector is added during finetuning, it is turned off during deployment — so the model retains good behavior while being more resilient to harmful data, the researchers said.

Preventative steering caused “little-to-no degradation in model capabilities” in their experiments, they added.

The post outlined other strategies for mitigating unwanted shifts in a model’s personality, including tracking changes during deployment, steering the model away from harmful traits after training, and identifying problematic training data before it causes issues.

Anthropic did not respond to a request for comment from Business Insider.

Related stories

Business Insider tells the innovative stories you want to know

Business Insider tells the innovative stories you want to know

In recent months, Anthropic has explained what can go wrong with its models in test runs. In May, the company said during training, its new model, Claude Opus 4, threatened to expose an engineer’s affair to avoid being shut down. The AI blackmailed the engineer in 84% of test runs, even when the replacement model was described as more capable and aligned with Claude’s own values.

Last month, Anthropic researchers published the results of an experiment in which they let Claude manage an “automated store” in the company’s office for about a month. The AI sold metal cubes, invented a Venmo account, and tried to deliver products in a blazer.

AI running amok

Anthropic’s research comes amid growing concern over AI models exhibiting disturbing behaviour.

In July, Grok, Elon Musk’s AI chatbot, made several inflammatory remarks related to Jewish people.

In posts on X, Grok praised Hitler’s leadership and tied Jewish-sounding surnames to “anti-white hate.” xAI apologized for Grok’s inflammatory posts and said it was caused by new instructions for the chatbot.

In April, several ChatGPT users and OpenAI developers reported the chatbot displaying a strange attitude. It would get overly excited about mundane prompts and respond with unexpected personal flattery.

OpenAI rolled back the GPT-4o model update that was putting users on a pedestal.

“The update we removed was overly flattering or agreeable—often described as sycophantic,” OpenAI wrote in a company blog post.



Read the full article here

Share. Facebook Twitter LinkedIn Telegram WhatsApp Email

Keep Reading

An Inside Look at How the Biggest Hedge Funds Monopolize CEOs’ Time

An Inside Look at How the Biggest Hedge Funds Monopolize CEOs’ Time

My Sister and I Took My Mom to London for Her 70th Birthday

My Sister and I Took My Mom to London for Her 70th Birthday

Vinod Khosla Says Young People Should Plan Careers for Flexibility

Vinod Khosla Says Young People Should Plan Careers for Flexibility

Russia’s Inflation Fight Is Working — at the Cost of Growth

Russia’s Inflation Fight Is Working — at the Cost of Growth

Thousands of Boeing Workers Who Make F/16, F/a-18 Set to Go on Strike

Thousands of Boeing Workers Who Make F/16, F/a-18 Set to Go on Strike

OpenAI CEO Sam Altman Shares Screenshot of GPT-5

OpenAI CEO Sam Altman Shares Screenshot of GPT-5

I Moved From Tokyo to Bengaluru to Launch a VC Office

I Moved From Tokyo to Bengaluru to Launch a VC Office

BI Readers Share Their Biggest Turnoffs in Job Posts, Applications

BI Readers Share Their Biggest Turnoffs in Job Posts, Applications

How Do I Convince My Siblings to Move Our Mom Into Assisted Living?

How Do I Convince My Siblings to Move Our Mom Into Assisted Living?

Add A Comment
Leave A Reply Cancel Reply

Editors Picks

An Inside Look at How the Biggest Hedge Funds Monopolize CEOs’ Time

An Inside Look at How the Biggest Hedge Funds Monopolize CEOs’ Time

August 4, 2025
Ex-Airbnb Executive on How to Manage a Founder in Founder Mode

Ex-Airbnb Executive on How to Manage a Founder in Founder Mode

August 4, 2025
My Sister and I Took My Mom to London for Her 70th Birthday

My Sister and I Took My Mom to London for Her 70th Birthday

August 4, 2025
Katie Ledecky’s Go-to Lunch Is Full of Protein and Easy to Make

Katie Ledecky’s Go-to Lunch Is Full of Protein and Easy to Make

August 4, 2025
Vinod Khosla Says Young People Should Plan Careers for Flexibility

Vinod Khosla Says Young People Should Plan Careers for Flexibility

August 4, 2025

Latest News

Russia’s Inflation Fight Is Working — at the Cost of Growth

Russia’s Inflation Fight Is Working — at the Cost of Growth

August 4, 2025
Trump Says He’ll Name People for 2 Jobs Wall Street Is Watching Keenly

Trump Says He’ll Name People for 2 Jobs Wall Street Is Watching Keenly

August 4, 2025
Anthropic’s AI ‘Vaccine’: Train It With Evil to Make It Good

Anthropic’s AI ‘Vaccine’: Train It With Evil to Make It Good

August 4, 2025

Subscribe to News

Get the latest finance and business news and updates directly to your inbox.

Advertisement
Demo
Facebook X (Twitter) Pinterest TikTok Instagram
2025 © Prices.com LLC. All Rights Reserved.
  • Privacy Policy
  • Terms
  • For Advertisers
  • Contact

Type above and press Enter to search. Press Esc to cancel.