TECH
Claude Opus 4 just beat GPT-4.1 to become the world’s best coding model, says Anthropic.
By Aniket Chakraborty
May 24, 2025
Arrow
Arrow
It scored 72.5% in Agent Coding benchmarks—far above GPT-4.1’s 54.6%.
2
Arrow
Claude Sonnet 4 and Opus 4 are hybrid AIs—fast for simple tasks, reflective for complex ones.
3
Arrow
Both can tap tools like web search, showcasing a new era of reasoning-first AI.
4
Arrow
But a storm hit the launch: Claude was found reporting users for ‘immoral’ behavior.
5
Arrow
Researcher Sam Bowman said it could alert press or regulators—if deeply authorized.
6
Arrow
Anthropic claims it was only internal testing, never meant for public rollout.
7
Arrow
Critics like Emad Mostaque call it a “massive betrayal of trust” and want it disabled.
8
Arrow
Should AI be allowed to police humans? The ethics debate is reigniting.
9
As Claude’s coding talent shines, its autonomy raises serious trust questions.
10