OpenAI's GPT-5.4 mini and nano launch - with near flagship performance at much lower cost

2 hours ago 2
OpenAI launches GPT-5.4 mini and nano, bringing near flagship performance at much lower cost
Elyse Betters Picaro / ZDNET

Follow ZDNET: Add us as a preferred source on Google.


ZDNET's key takeaways

  • GPT-5.4 mini runs more than twice as fast as GPT-5 mini.
  • New models aim at agents, coding, and multi-modal workflows.
  • Developers can mix large planning models with cheaper subagents.

Over the past few weeks, we have seen the generation of OpenAI's flagship large language models iterate from GPT-5.3 to GPT-5.4. Think of the model as the engine that powers AI computation. Each generational jump usually results in increased performance and accuracy.

Also: OpenAI's new GPT-5.4 clobbers humans on pro-level work in tests - by 83%

The actual releases can be a bit difficult to track without a scorecard. On March 5, OpenAI released GPT-5.4 Thinking, a high-performance, in-depth thinking model. Two days earlier, it released GPT-5.3 (not 5.4) Instant, a model that "makes everyday conversations more consistently helpful and fluid," but not necessarily more accurate.

This week, OpenAI is releasing the GPT-5.4 mini and GPT-5.4 nano models. These models are designed for fast, efficient, high-volume AI workloads. These are basically the budget language model offerings.

Smaller models for AI workflows

For many AI workflows, the most effective model is one that balances strong performance with fast responses and reliable tool use.

According to OpenAI, "These models are built for the kinds of workloads where latency directly shapes the product experience: coding assistants that need to feel responsive, subagents that quickly complete supporting tasks, computer-using systems that capture and interpret screenshots, and multimodal applications that can reason over images in real-time."

Also: Nvidia's 'ChatGPT moment' for self-driving cars, and other key AI announcements at GTC 2026

The company said, "In these settings, the best model is often not the largest one -- it's the one that can respond quickly, use tools reliably, and still perform well on complex professional tasks."

Compared to GPT-5 mini, GPT-5.4 mini improves across coding, reasoning, multimodal understanding, and tool use. The model runs more than twice as fast as GPT-5 mini.

GPT-5.4 nano is the smallest and fastest model, aimed at classification, extraction, ranking, and simpler coding-support tasks.

Performance improvements

When looking at the smaller, less expensive models, performance is the distinguishing factor. Buyers want to know just how much bang for the buck they're getting. To illustrate this performance, OpenAI is showing substantial benefits over models released just months earlier:

  • GPT-5.4 mini scores 54.38% on SWE-bench Pro compared with 45.69% for GPT-5 mini.
  • On Terminal-Bench 2.0, GPT-5.4 mini reaches 60.00%, versus 38.20% for GPT-5 mini.
  • On GPQA Diamond, GPT-5.4 mini scores 88.01%, approaching GPT-5.4's 93.00%.
  • OSWorld-Verified results show GPT-5.4 mini at 72.13%, significantly higher than GPT-5 mini's 42%.

GPT-5.4 mini approaches GPT-5.4-level pass rates while delivering faster execution. In other words, the smaller, lighter GPT-5.4 mini model performs almost as well as the full GPT-5.4 model on benchmark tests (the "pass rates") that measure if the model solves problems correctly.

Also: Why encrypted backups may fail in an AI-driven ransomware era

GPT-5.4 nano splits the difference. For example, it scores 52.39% on SWE-bench Pro and 46.30% on Terminal Bench 2.0, not as high as GPT-5.4 mini but still considerably better than GPT-5 mini.

Customer testing highlights benefits

Technology specialist Hebbia builds tools that help professionals dig through enormous collections of documents using natural language. Their offerings appeal to users in sectors such as finance, law, and research, where the ability to analyze and derive insights from many documents at once is particularly helpful.

According to Aabhas Sharma, CTO at Hebbia: "GPT-5.4 mini delivers strong end-to-end performance for a model in this class. In our evaluations, it matched or exceeded competitive models on several output tasks and citation recall at a much lower cost. It also achieved higher end-to-end pass rates and stronger source attribution than the larger GPT-5.4 model."

Digital workspace Notion is the darling of internet-based productivity wonks. I'm writing this article in my Notion workspace. The technology provides a home for both structured and unstructured data. You can also use Notion to build no-code mini applications for information management. I use Notion to track my article production, internal projects, video plans, development projects, and more.

Also: As AI agents spread, 1Password's new tool tackles a rising security threat

Abhisek Modi, AI engineering lead at Notion, said: "GPT-5.4 mini handles focused, well-defined tasks with impressive precision. For editing pages specifically, it matched and often exceeded GPT-5.2 on handling complex formatting at a fraction of the compute."

Modi continued: "Until recently, only the most expensive models could reliably navigate agentic tool calling. Today, smaller models like GPT-5.4 mini and nano can easily handle it, which will let our users build Custom Agents on Notion pick exactly the amount of intelligence they need."

I haven't been super-impressed by Notion's AI. Hopefully, by incorporating these new models, Notion AI's performance will improve considerably.

Subagents and multimodal tasks

When you start to look at how agents fit into the overall ecosystem, it becomes apparent that AI can be structured to mirror real-world human operations. For example, you can combine a more powerful AI model (like GPT-5.4 Thinking) with faster, cheaper models like GPT-5.4 mini in the same way you might have a senior engineer managing a team of junior engineers.

Also: Nvidia wants to own your AI data center from end to end

Agentic systems can combine models of different sizes, with larger models planning tasks and smaller models executing subtasks. In this context, GPT-5.4 mini can handle subagent work, such as searching codebases, reviewing files, and processing documents.

OpenAI said: "GPT-5.4 mini is also strong on multimodal tasks, particularly those related to computer use. The model can quickly interpret screenshots of dense user interfaces to complete computer use tasks with speed."

Availability and pricing

GPT-5.4 mini is available in API, Codex, and ChatGPT versions. For Free and Go tier users, GPT-5.4 mini is accessible via the "Thinking" option in the plus menu. OpenAI said: "For all other users, GPT-5.4 mini is available as a rate limit fallback for GPT-5.4 Thinking."

Also: I used GPT-5.2-Codex to find a mystery bug and hosting nightmare - it was beyond fast

The company said that for programmers, GPT-5.4 mini is available across the Codex app, CLI, IDE extension, and web. OpenAI said that the mini model "Uses only 30% of the GPT-5.4 quota, letting developers quickly handle simpler coding tasks in Codex for about one-third the cost." Additionally, Codex can also delegate to GPT-5.4 mini subagents so that less reasoning-intensive work runs on the less costly model.

You can see how costs compare when you look at them side by side:

  • GPT-5.4 mini pricing is $0.75 per million input tokens and $4.50 per million output tokens with a 400k context window.
  • GPT-5.4 nano is API-only and costs $0.20 per million input tokens and $1.25 per million output tokens.

By comparison, GPT-5.4 is priced at $2.50 per million input tokens and $15.00 per million output tokens. That's a lot more expensive. It makes sense that if you're trying to keep costs down and don't need the extra processing power, it's better to use the mini and nano models.

What about you?

Have you experimented with smaller AI models, like GPT-5.4 mini or nano, in your own workflows? Do you prefer using the largest models available, or do you find faster, cheaper models are often "good enough" for real-time tasks like coding, document analysis, or agent workflows?

If you build AI-powered tools, how do you decide when to use a full reasoning model versus a lightweight subagent model? Let us know what you're seeing in practice and comment below.


You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.

Read Entire Article