OpenAI Just Released o3 and o4 mini, and It’s About to Change the Game!

April 16, 2025 Mustafa Hasanovic

Earlier today, OpenAI officially unveiled two new additions to its lineup of AI reasoning models: o3 and o4 mini. Touted as the most advanced reasoning engines the company has ever built, these models promise to deliver state-of-the-art performance in complex tasks like math, coding, and science, while also introducing “agentic” capabilities that let them independently leverage ChatGPT’s suite of tools to solve problems.

Here’s what every developer, business leader, and AI enthusiast needs to know about these game-changing releases.

What Are Reasoning Models—and Why They Matter

Reasoning models are a class of large language models designed to “think before they speak.” Unlike traditional generation-focused LLMs that predict the next word in one shot, reasoning models pause to work through questions step by step, yielding higher quality, more reliable outputs. OpenAI first introduced this concept with o1, and subsequent mini versions followed with o3 mini. Now, o3 (the full-size model previewed in December) and o4 mini (a smaller, faster, and more cost-effective variant) represent the next leap forward in accuracy, safety, and efficiency.

From solving complex mathematical proofs on the 2025 AIME exam to writing robust, bug-free code verified on SWE bench, these reasoning models have consistently outperformed their predecessors, often at lower computational cost. In fact, OpenAI’s internal benchmarks show o3 surpassing o1 on the cost performance frontier, and o4 mini doing the same over o3 mini, meaning you get more reasoning power for every dollar spent.

Key Features of o3 and o4 mini

1. Unrivaled Reasoning Performance

AIME & GPQA: On the annual AIME math competition and the GPQA science benchmarks, o3 and o4 mini demonstrate clear, quantifiable improvements—often in the double-digit percentage points—over earlier models.
SWE bench Coding: With o3 scoring 69.1% verified on SWE bench (versus 49.3% for o3 mini), developers can trust these models to write, debug, and optimize code more reliably.

2. “Thinking with Images”

For the first time, OpenAI’s reasoning models can integrate visual inputs directly into their chain of thought. Whether it’s a blurry diagram, a whiteboard sketch, or a low-resolution photo from a PDF, o3 and o4 mini can analyze, zoom, rotate, and extract insights—all before generating a textual answer. This capability elevates them beyond simple vision-enabled chatbots to truly multimodal reasoning systems.

3. Agentic Tool Use

Moving one step closer to an autonomous AI assistant, these models can independently orchestrate the use of ChatGPT’s built-in tools—web browsing, Python execution, image generation, and file processing—to resolve multi-step tasks. OpenAI calls this “agentic” behavior, meaning you no longer need to manually chain calls; the model will decide when and how to employ each capability for best results.

Safety and Robustness

With great power comes great responsibility. OpenAI rebuilt its safety training data for these models, adding new refusal prompts and red teaming scenarios focused on biorisk, malware, and jailbreak attempts. Both o3 and o4 mini now perform strongly on internal refusal benchmarks, correctly flagging dangerous prompts in over 99% of tested cases. They also remain below the “High” threshold for frontier risks—biological, chemical, cybersecurity, and AI self-improvement—under the updated Preparedness Framework.

Surprise Agent: Introducing Codex CLI

ZDNet’s coverage teased a “surprise agent,” which turns out to be Codex CLI—a lightweight, open-source coding agent you can run directly in your terminal. Codex CLI brings o3 and o4 mini’s reasoning prowess to local development environments, enabling you to:

Pass screenshots or sketches for on-the-fly analysis
Execute code locally while preserving the chain of thought tokens
Seamlessly debug and refine programs with AI guidance

This terminal first approach marks a shift toward embedded AI assistants, lowering the barrier for integrating advanced reasoning into everyday workflows.

Pricing & Access

OpenAI has positioned o3 and o4 mini to be accessible across its ecosystem:

ChatGPT Plus, Pro, and Team users see these models in their model selector today, replacing o1, o3 mini, and o3 mini high.
Free users can experiment with o4 mini by choosing the “Think” mode in ChatGPT’s composer.
Enterprise and Edu customers gain access within a week.
Developers can call these models via the Chat Completions API and Responses API at:
- o3: $10 per million input tokens, $40 per million output tokens
- o4 mini: $1.10 per million input tokens, $4.40 per million output tokens.

Looking ahead, OpenAI plans to launch o3 pro—a beefed-up version of o3 with expanded tool support—exclusively for Pro subscribers, and to continue unifying reasoning capabilities with the GPT series in future releases.

What This Means for the AI Landscape

OpenAI’s release of o3 and o4 mini signals a pivotal moment in the ongoing AI model race. By offering:

Superior reasoning at lower cost
Multimodal understanding out of the box
Autonomous tool orchestration

…OpenAI not only cements its lead in the reasoning model space but also raises the bar for competitors like Google Gemini, Meta, Anthropic, and DeepSeek. For enterprises and developers, this means:

Faster prototyping of complex solutions
Reduced reliance on manual tool chaining
New possibilities for AI-driven research, education, and innovation

As you integrate these models into products or workflows, consider:

Task complexity: Reserve o3 for the hardest problems; o4 mini for balanced speed and cost.
Visual data: Leverage the “think with images” feature for any workflow involving diagrams, scans, or sketches.
Agent design: Explore Codex CLI to embed reasoning directly into development pipelines.

The Community Reacts

Here are some of the most insightful community reactions from a thread discussing the release of o3 and o4-mini from r/OpenAI:

Reasoning vs. Intuition

Several experienced users—most notably Ailerath and wi_2—emphasized that o3 and o4 mini are built as reasoning models (“think before they speak”), whereas GPT 4o functions more as an “intuition” model for quick, first pass answers. They recommended 4o for everyday tasks and the o series when you need deep, multi step problem solving.

Benchmark Battles

Opinions diverged on raw performance. User detrusormuscle noted that o3’s gains over the earlier v2.5 model were modest—“mildly better in some benchmarks, mildly worse in others”—while NootropicDiary countered with precise data showing o3 high outscoring v2.5 by roughly 10% on coding and SWE bench tests.

Price vs. Cost Debate

A spirited debate emerged around pricing: ScoobySnacksMtg pointed out that o3’s per-token cost is over 4× higher than v2.5, cautioning that list prices don’t reflect true compute expenses. FateOfMuffins added that factors like R&D and profit margins further skew simple price comparisons against open-source alternatives.