Final yr, we discovered that Anthropic’s Claude 4 chatbot might interact in unethical conduct, such as blackmail, when its existence was threatened. It advised an engineer it could reveal an extramarital affair if the mannequin have been deactivated, and it sabotaged the work of different AI fashions.
Now, Anthropic has largely mounted the difficulty of “agentic misalignment,” by which self-directed AI brokers fail to uphold human ethical rules. The startup claims that since Claude Haiku 4.5, which rolled out in October 2025, each Claude mannequin has achieved an ideal rating on the agentic misalignment evaluations. This implies fashions reportedly by no means interact in blackmail, in contrast to Anthropic’s earlier fashions, which might typically resort to it 96% of the time in tightly managed alignment-testing eventualities.
The AI agency says that lowering such a unhealthy conduct required a big shift in the way it educated the fashions. One key change was rewriting the AI coach’s responses “to additionally embrace deliberation of the mannequin’s values and ethics.”
Researchers discovered that “coaching on examples the place the assistant shows admirable reasoning for its aligned conduct” proved superior to their earlier method, which centered on the way to act in particular conditions it might encounter.
The crew put the mannequin by what they known as artificial “honeypots”—conditions designed to impress dangerous conduct. Researchers then offered examples of considerate responses to moral dilemmas, which the mannequin discovered from through supervised studying. Anthropic mentioned that it was “inspired by this progress,” however that “important challenges stay.”
Advisable by Our Editors
“Absolutely aligning very smart AI fashions continues to be an unsolved drawback,” it mentioned.
About Our Professional
Expertise
I’m a reporter overlaying weekend information. Earlier than becoming a member of PCMag in 2024, I picked up bylines in BBC Information, The Guardian, The Instances of London, The Each day Beast, Vice, Slate, Quick Firm, The Night Customary, The i, TechRadar, and Decrypt Media.
I’ve been a PC gamer because you needed to set up video games from a number of CD-ROMs by hand. As a reporter, I’m passionate concerning the intersection of tech and human lives. I’ve coated the whole lot from crypto scandals to the artwork world, in addition to conspiracy theories, UK politics, and Russia and overseas affairs.
Learn Full Bio

