GPT-5.5 beats Claude on the one activity that really issues

For months, Claude and ChatGPT have been locked in a stalemate — which LLM was the quickest, probably the most highly effective, one of the best? Up till now, many customers felt that Anthropic’s Claude Opus 4.7 had a slight edge in its bigger context window, versatile writing types, and superior information visualization, in addition to another key advantages. Effectively, that could be about to vary.

This week, OpenAI formally launched GPT-5.5, internally code-named “Spud”. However this new mannequin isn’t as potato-like as its nickname would indicate. In response to OpenAI’s official launch information, this mannequin is a basic redesign aimed toward “agentic” efficiency — the power for an AI to make use of a pc and plan forward, somewhat than merely reply to guide prompts. Will this be sufficient to convey customers again to ChatGPT after a mass exodus?

Associated

I moved my complete ChatGPT context to Claude and it lastly felt like residence

Right here is one of the best path to go from ChatGPT to Claude.

That is the mannequin that can shift individuals again to OpenAI

And it has a powerful beginning pitch

The last word objective of AI differs relying on who you ask, however most individuals agree that the dream is an AI that may deal with multi-step duties with out you holding its hand. That is the place GPT-5.5 beats Claude, arms down.

On Terminal-Bench 2.0, which checks advanced command-line workflows requiring planning, iteration, and gear coordination, GPT-5.5 achieved a staggering 82.7% accuracy. For comparability, Claude Opus 4.7 scored 69.4%, whereas Gemini 3.1 Professional scored 68.5%.

That’s an enormous 13% enchancment, however what does it really imply? That’s the distinction between an AI that means code, and an “agentic” AI — an AI that may really go into your system, run the terminal, and confirm the repair. This leap in efficiency is strictly why AI benchmarks nonetheless matter, even when they’re nonetheless usually wrapped up in overcomplicated “tech-bro” jargon.

However does this larger intelligence imply slower response instances? Apparently not: OpenAI claims it might probably match GPT-4’s per-token latency whereas being considerably smarter. It is usually extra “token environment friendly,” which means it usually makes use of fewer tokens to finish the identical activity as a result of it understands intent faster.

On the OSWorld-Verified benchmark, which measures how effectively AI can function a typical desktop OS, GPT-5.5 scored 78.7%, simply narrowly edging out Claude’s 78.0%.

Android, iOS, Net

Developer

OpenAI

Value mannequin

Free with elective subscription

What’s the catch?

It received’t be low cost.

Within the OpenAI API, GPT-5.5 prices $5.00 per 1M enter tokens, which is double the value of the earlier technology. In order for you the much more exact GPT-5.5 Professional, designed for high-stakes environments like authorized analysis or information science, that worth jumps to $30.00.

API entry for GPT-5.5 is rolling out in phases to paying subscribers first. GPT‑5.5 is rolling out to Plus, Professional, Enterprise, and Enterprise customers in ChatGPT and Codex, and GPT‑5.5 Professional is rolling out to Professional, Enterprise, and Enterprise customers in ChatGPT from twenty third April, 2026. Will you be attempting it out?

What's Hot

NYT Strands hints and solutions for Tuesday, Might 12 (sport #800)

OpenAI Introduces Dawn: A Cybersecurity Initiative That Places Codex Safety on the Middle of Vulnerability Detection and Patch Validation

FAQ on hantavirus and outbreak on cruise ship Hondius

This default setting is why your Samsung cellphone battery doesn’t final all day

This dependable Subaru prices lower than a brand new Honda Civic

Apple Watch might by no means get Contact ID capabilities in favor of different options

WhatsApp Plus is right here, and you may safely ignore this subscription

Testing Dell’s 2026 XPS 16 Laptop computer Made Me Really feel Like a Tycoon

OpenAI simply launched its reply to Claude Mythos

NYT Strands hints and solutions for Tuesday, Might 12 (sport #800)

OpenAI Introduces Dawn: A Cybersecurity Initiative That Places Codex Safety on the Middle of Vulnerability Detection and Patch Validation

FAQ on hantavirus and outbreak on cruise ship Hondius

NYT Strands hints and solutions for Tuesday, Might 12 (sport #800)

OpenAI Introduces Dawn: A Cybersecurity Initiative That Places Codex Safety on the Middle of Vulnerability Detection and Patch Validation

FAQ on hantavirus and outbreak on cruise ship Hondius

Usefull link

categories

What's Hot

I moved my complete ChatGPT context to Claude and it lastly felt like residence

That is the mannequin that can shift individuals again to OpenAI

And it has a powerful beginning pitch

What’s the catch?

It received’t be low cost.

Related Posts

Usefull link

categories