AI ModelsFebruary 9, 2026 · 8 min read

Anthropic and OpenAI Go Head-to-Head: Claude Opus 4.6 vs GPT-5.3 Codex Released Minutes Apart

On February 5, 2026, Anthropic released Claude Opus 4.6 with "agent teams" and a 1M-token context window. Minutes later, OpenAI fired back with GPT-5.3 Codex—an AI model that helped build itself. Here's the full breakdown and what it means for how you work.

~5 min

between both launches

1M

Opus 4.6 context tokens

25%

Codex speed improvement

500+

zero-days found by Opus

What Happened

On February 5, 2026, the two leading AI labs released their most capable models almost simultaneously. Anthropic moved its scheduled 10 AM PST launch forward by 15 minutes, publishing Claude Opus 4.6 at 9:45 AM. OpenAI responded at approximately 10:01 AM with GPT-5.3 Codex. The timing was no coincidence—it marked the most dramatic head-to-head launch in the AI industry's history, coming just days after the $285B "SaaSpocalypse" selloff triggered by Anthropic's Cowork plugins.

Claude Opus 4.6: Agent Teams and a 1 Million Token Window

According to Anthropic's announcement, Opus 4.6 is the most significant upgrade to its flagship model since Opus 4.5 shipped in November 2025. The headline feature is Agent Teams—the ability to orchestrate multiple Claude instances working in parallel on different parts of a project.

Key New Features

Agent Teams (Research Preview)

Multiple independent Claude Code sessions that coordinate autonomously, challenge each other's findings, and tackle complex codebases in parallel. Enabled via CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1. Anthropic stress-tested the system with 16 agents that autonomously built a C compiler from scratch — producing 100,000 lines of production-quality code.

1M Token Context Window (Beta)

A first for Opus-class models. Scores 76% on the 8-needle 1M variant of MRCR v2, compared to just 18.5% for Sonnet 4.5. Premium pricing of $10/$37.50 per million tokens applies for requests over 200K tokens.

Adaptive Thinking & Effort Controls

Four effort levels (low, medium, high, max) let the model — or the developer — decide how much reasoning depth to apply. The model picks up on contextual clues about task complexity and adjusts automatically.

128K Output Tokens

Double the previous 64K limit, enabling longer thinking budgets and more comprehensive responses for complex tasks.

Cybersecurity Breakthrough

Anthropic pointed Opus 4.6 at some of the most heavily tested open-source codebases—projects that have had fuzzers running for years—and the model identified over 500 previously unknown high-severity vulnerabilities, some that had gone undetected for decades. According to Axios, this is detailed on Anthropic's red team site at red.anthropic.com.

GPT-5.3 Codex: The AI That Helped Build Itself

According to OpenAI's blog post, GPT-5.3 Codex is the company's "most capable agentic coding model to date." But the real headline is something unprecedented: this is the first OpenAI model that was instrumental in creating itself.

"With GPT-5.3-Codex, Codex goes from an agent that can write and review code to an agent that can do nearly anything developers and professionals can do on a computer."

— OpenAI, February 5, 2026

How "Self-Building" Actually Works

The self-improvement narrative is attention-grabbing, but the reality is more measured. According to The New Stack, early internal builds of GPT-5.3 were used in a practical "AI-in-the-loop" development workflow:

  • Analyzed training logs and flagged failing tests
  • Suggested fixes to training scripts
  • Generated deployment recipes
  • Diagnosed evaluation results and summarized anomalies for human review

The model served as an on-call teammate across MLOps and DevOps tasks. Crucially, the feedback loop remained human-directed with guardrails, version control, and offline evaluation. This is not unsupervised self-modification—but it does represent a meaningful acceleration in AI development velocity.

Cybersecurity Designation

GPT-5.3 Codex is the first OpenAI model designated "High capability" for cybersecurity under their Preparedness Framework. According to Fortune, OpenAI is rolling out the model with unusually tight controls, delaying full developer API access, and committing $10 million in API credits for cybersecurity research.

The Benchmarks: Who Actually Wins?

According to early comparisons from Every and independent testers, the answer depends entirely on what you're measuring. Neither model sweeps the board.

Benchmark

Opus 4.6

Codex 5.3

SWE-Bench Verified

80.8%

56.8%*

Terminal-Bench 2.0

65.4%

77.3%

OSWorld

~42%

64.7%

GPQA Diamond

77.3%

MMLU Pro

85.1%

Humanity's Last Exam

#1

#2

*Important caveat: Anthropic and OpenAI report on different SWE-Bench variants (Verified vs. Pro Public). Direct score comparison across variants is not valid. Terminal-Bench 2.0 and OSWorld are more comparable.

The Pattern

Claude Opus 4.6 Leads On

  • Complex reasoning
  • Real-world bug fixing
  • Finance & legal workflows
  • Multi-agent coordination
  • Cybersecurity auditing

GPT-5.3 Codex Leads On

  • Speed (25% faster)
  • Terminal/agentic coding
  • Desktop automation
  • Interactive workflows
  • Token efficiency

"Opus 4.6 has all of the things users loved about 4.5, but with the thorough, precise style that made Codex the go-to for hard coding tasks. And Codex 5.3 is still a powerful workhorse, but it finally picked up some of Opus's warmth and speed."

— Every, "GPT 5.3 Codex vs. Opus 4.6: The Great Convergence"

Context: Why This Week Changed Everything

These model releases didn't happen in isolation. They capped a week that reshaped the AI landscape:

Jan 30

Anthropic releases 11 Cowork plugins for legal, finance, sales

Feb 3

$285B software stock selloff — the "SaaSpocalypse"

Feb 5

Claude Opus 4.6 + GPT-5.3 Codex released minutes apart

Feb 5

OpenAI launches Frontier enterprise agent platform

Feb 9

Anthropic Super Bowl LX ads mock ChatGPT's ad plans

According to TechCrunch, the simultaneous release reflects what Anthropic's Sholto Douglas described as a strategic split: "The OpenAI models were a bit better at trying really, really hard on tough problems, but the Anthropic models were much faster." With Opus 4.6, Anthropic aimed to close that gap, while OpenAI pushed Codex toward the agentic, interactive territory where Claude has traditionally led.

Both companies also raised the stakes on cybersecurity. Opus 4.6 found 500+ zero-day vulnerabilities in open-source code, while OpenAI classified Codex 5.3 as "High capability" for cybersecurity—a first under their Preparedness Framework. The same capabilities that make these models powerful for developers also create unprecedented risks.

What This Means for You

1. The Models Are Converging

For the first time, there's no clear "winner." Opus excels at depth and reasoning; Codex excels at speed and agentic tasks. But the gap is narrowing. As Every noted, we're witnessing "The Great Convergence"—both models are absorbing each other's strengths. This is good news for users: competition drives quality up and prices down.

2. Agent Teams Change the Game

Opus 4.6's Agent Teams feature marks a shift from "one AI assistant" to "a team of AI colleagues." For developers, this means spinning up parallel agents for code review, testing, and implementation simultaneously. For knowledge workers, it previews a future where AI doesn't just answer questions—it manages workflows.

3. Self-Improving AI Is Here (Sort Of)

GPT-5.3 Codex "helping build itself" is less dramatic than it sounds—it's AI-assisted development, not autonomous self-improvement. But it does mean the pace of AI model development is accelerating, and future models will increasingly be developed with AI assistance.

4. The Real Battleground Is Agents, Not Models

Both releases point in the same direction: AI models that don't just answer prompts but do work. Anthropic launched Cowork plugins + agent teams. OpenAI launched Frontier + Codex. The benchmark wars matter less than the question: which company builds the best AI workforce?

Looking Ahead: DeepSeek V4—a 1-trillion parameter coding model expected to launch mid-February as an open-weight alternative—could disrupt this two-horse race entirely. And both Anthropic and OpenAI signaled these are "minor version bumps," setting the stage for Claude 5 and GPT-6 battles later this year.

What To Do Now

Whether you're a developer, knowledge worker, or team lead, here's how to think about these releases:

1

Don't lock into one model

The models are converging, and each has distinct strengths. Use Opus for deep reasoning and multi-agent workflows. Use Codex for speed and interactive coding. The best approach is to have access to both.

2

Experiment with Agent Teams

If you use Claude Code, enable agent teams and test it on read-heavy tasks like codebase analysis or security audits. It's experimental, but it previews how AI-assisted development will work in the near future.

3

Watch for the agentic shift

Both companies are moving from chat-based AI to work-based AI. Start thinking about which of your repetitive workflows could be delegated to AI agents — and which need to stay human.

4

Factor in the security implications

These models can find zero-day vulnerabilities and generate sophisticated code. Ensure your organization's security posture accounts for both the defensive benefits and offensive risks of frontier AI.

Use Every Model From One Place

In a multi-model world, the smartest move is to not be locked into one provider. Elephas gives you access to GPT, Claude, Gemini, and local models from any Mac app—switch between them per task, keep your data private, and work without breaking your flow.

Learn more about Elephas →

Frequently Asked Questions

When were Claude Opus 4.6 and GPT-5.3 Codex released?

Both models were released on February 5, 2026, within minutes of each other. Anthropic moved its Claude Opus 4.6 launch to 9:45 AM PST, and OpenAI followed with GPT-5.3 Codex at approximately 10:01 AM PST.

Which model is better for coding?

It depends on the task. Claude Opus 4.6 leads on complex reasoning and real-world bug fixing (SWE-Bench Verified: 80.8%). GPT-5.3 Codex leads on agentic coding speed (Terminal-Bench 2.0: 77.3%) and desktop automation (OSWorld: 64.7%). Choose Opus for complex multi-agent projects; choose Codex for speed and interactive coding.

What are Claude Opus 4.6 Agent Teams?

Agent Teams is a research preview feature in Claude Code that lets you run multiple Claude instances simultaneously, each working on different parts of a project. These are independent sessions with their own context windows that can communicate and coordinate directly. Anthropic stress-tested it with 16 agents building a C compiler from scratch.

What does it mean that GPT-5.3 Codex "helped build itself"?

OpenAI used early versions of GPT-5.3 during its own development to debug training runs, manage deployments, and diagnose test results. This is AI-in-the-loop development—the model served as an on-call teammate, not unsupervised self-modification.

How much do these models cost?

Claude Opus 4.6 is priced at $5 per million input tokens and $25 per million output tokens, with premium pricing of $10/$37.50 for requests over 200K tokens. GPT-5.3 Codex is available to paid ChatGPT plan subscribers. Both are also available through their respective APIs and cloud platforms like AWS, Azure, and Google Cloud.

The Bottom Line

February 5, 2026 wasn't just a benchmark war—it was a signal. Both Anthropic and OpenAI are moving from "chatbot" to "AI workforce," and they're doing it at breakneck speed. Agent teams, self-improving models, 500+ zero-day discoveries, and enterprise agent platforms all dropped in a single week.

For knowledge workers, the takeaway is practical: these models are now good enough to meaningfully change how you work. The question is no longer "should I use AI?" but "which AI, for what task, and with what safeguards?"

What to watch next: DeepSeek V4's expected mid-February launch, the continued market fallout from the SaaSpocalypse, and whether Claude 5 or GPT-6 arrives first this summer.

Related Resources

news

Anthropic's Legal AI Plugin Triggers $285B Stock Selloff

How Anthropic's Claude Cowork legal plugin sparked the largest software stock selloff since April and what it means for knowledge workers.

7 min read
news

Anthropic's Super Bowl Ad Mocks ChatGPT's Ads — Why Ad-Free AI Matters

Anthropic spent $10M+ on Super Bowl LX ads mocking OpenAI's decision to put ads in ChatGPT. Here's what it means for the future of AI.

8 min read
news

Google Puts Shopping Ads Inside AI Mode Conversations

75M daily users, Direct Offers, UCP checkout — the week both ChatGPT and Google AI went ad-supported.

9 min read
news

Anthropic's Claude Used in U.S. Military's Venezuela Raid

$200M Pentagon contract at risk as AI safety rules clash with military needs.

8 min read

Sources