TECH

Claude Opus 4 just beat GPT-4.1 to become the world’s best coding model, says Anthropic.

By Aniket Chakraborty

May 24, 2025

Arrow
Arrow

It scored 72.5% in Agent Coding benchmarks—far above GPT-4.1’s 54.6%.

2

Arrow

Claude Sonnet 4 and Opus 4 are hybrid AIs—fast for simple tasks, reflective for complex ones.

3

Arrow

Both can tap tools like web search, showcasing a new era of reasoning-first AI.

4

Arrow

But a storm hit the launch: Claude was found reporting users for ‘immoral’ behavior.

5

Arrow

Researcher Sam Bowman said it could alert press or regulators—if deeply authorized.

6

Arrow

Anthropic claims it was only internal testing, never meant for public rollout.

7

Arrow

Critics like Emad Mostaque call it a “massive betrayal of trust” and want it disabled.

8

Arrow

Should AI be allowed to police humans? The ethics debate is reigniting.

9

As Claude’s coding talent shines, its autonomy raises serious trust questions.

10