OpenAI’s new GPT-5.4 model is a big step toward hands-off automation

March 5, 2026•News•

•2 min read

OpenAI’s new GPT-5.4 model is a big step toward hands-off automation

OpenAI’s GPT-5.4 update is trying to fix the “last mile” problem. As GPT-5.4 is rolling out across ChatGPT, the API, and Codex, with a GPT-5.4 Pro tier for people who want more headroom on tougher tasks.

If you’re mostly chatting, you’ll feel GPT-5.4 Thinking as a quality-of-life upgrade. It replaces GPT-5.2 Thinking for Plus, Team, and Pro, and the best part is how it handles longer prompts.

For automating work, this release is about fixing broken runs. GPT-5.4 is built to do native computer use in Codex and the API, reading screenshots and driving mouse and keyboard actions inside real interfaces. That’s the difference between an agent that explains what to click and an agent that actually clicks it.

Excel add-in is the headline in this update. OpenAI is basically doing what Gemini and Copilot have already done: put the assistant inside the sheet so you’re not bouncing between a chat tab and Excel all day. The ChatGPT for Excel add-in (beta in the US, Canada, and Australia) lets you ask for edits in plain English, query across tabs, and see answers tied to specific cells, and it’s built to ask before it changes anything so it doesn’t quietly wreck your formulas.

Read more: OpenAI is rumored to be building its own GitHub alternative for code, after recent outages.

But there is a catch with “agents that click.” Once a model can press buttons for you, small mistakes stop being harmless. A wrong click can submit the form early, change a setting you didn’t mean to touch, or grab the wrong file. And if something sensitive is sitting on the screen. The upside is real, less manual work. But you’ll want to test it on low-stakes workflows before you let it near anything that could cause damage.

OpenAI’s own benchmarks suggest the biggest gains are in agent-style work and persistent browsing, with smaller but real lifts in coding and spreadsheet-heavy tasks.

Benchmark	GPT-5.4	GPT-5.2
GDPval (wins or ties)	83.00%	70.90%
OSWorld-Verified	75.00%	47.30%
SWE-Bench Pro (Public)	57.70%	55.60%
BrowseComp	82.70%	65.80%
Spreadsheet tasks (internal)	87.30%	68.40%

APIPricing does go up, so the real “should you care” test is retry math, not the headline rate. If GPT-5.4 reduces failed runs, tool overhead, and extra turns, you’ll feel it as speed and cost stability. If it doesn’t, it’s just a nicer model with a higher bill.

Y. Anush Reddy

Y. Anush Reddy is a contributor to this blog.

How Platforms Are Using AI to Break Language Barriers

The next wave of streaming isn’t about better shows, it’s automated voiceovers. AI is teaching stories to speak your language, one voice at a time.

Apple Just Built Translation Into the OS

Apple just made your iPhone an AI interpreter. Your iPhone can translate speech in real time on calls, video chats, and in-person.

New York Times Sues Perplexity in Copyright Dispute

A new lawsuit from The New York Times puts Perplexity AI at the center of the copyright fight over AI answer engines. The Times claims its content was used without authorization. The outcome could influence how AI search cites, pays, and competes with publishers.

Amazon in talks to invest $10 billion in OpenAI

Amazon and OpenAI are discussing a $10 billion deal that could value OpenAI above $500 billion. The talks would deepen OpenAI’s AWS ties while it expands beyond Microsoft. The push comes after GPT-5.2 and rising pressure from Google Gemini 3.

Related Articles