Back to Blog

2026-05-02

Security Agents Are Becoming the Product

#security#agents#claude#codex

Abstract security agent network with gold signal paths

Security has always had a capacity problem.

There are too many code paths, too many dependencies, too many stale tickets, and not enough people with the judgment to separate a real exploit chain from scanner noise. The industry has tried to solve that with better rules, better static analysis, better dashboards, and more automation around triage.

The new wave looks different.

Claude Mythos, Claude Security, and OpenAI Codex Security are not just incremental scanners. They are early examples of security products built around agentic reasoning: read the repo, build a threat model, trace the exploit path, validate the finding, and propose a patch.

That is a meaningful shift. It is also a dangerous one if teams treat the agent as magic instead of infrastructure.

Here is what is actually known.

Claude Mythos: Too Capable for a Normal Release

Anthropic's Claude Mythos Preview system card is unusually direct. Anthropic describes Mythos as its most capable frontier model to date, with a large capability jump over Claude Opus 4.6. The important part is not the benchmark claim. It is the release decision.

Anthropic decided not to make Claude Mythos generally available.

The stated reason is cybersecurity. In Anthropic's words, Mythos demonstrated powerful cyber skills that can be used defensively to find and fix vulnerabilities, or offensively to design sophisticated exploits. Instead of a broad launch, Anthropic is making it available to a limited set of partners for defensive cybersecurity work through Project Glasswing.

That matters because it changes the model-release pattern. Usually, a frontier model ships first and the ecosystem discovers the risk profile afterward. Mythos is closer to a controlled deployment around a specific high-leverage domain.

The public claims are substantial:

  • Anthropic says Mythos can autonomously find zero-days in real-world open-source and closed-source software tested under authorized arrangements.
  • Anthropic says it can often turn discovered vulnerabilities into working proof-of-concept exploits.
  • The UK's AI Security Institute reported that Mythos was the first model to complete its 32-step simulated corporate network attack range, succeeding in 3 of 10 attempts and averaging 22 of 32 steps.
  • AISI also reported a 73% success rate on expert-level CTF tasks in its evaluation suite.

We should be careful with those numbers. Simulated ranges are not the same as production networks. Authorized vulnerability research is not the same as an unconstrained adversary campaign. But the direction is clear: frontier models are moving from "can explain a vulnerability" to "can operate through a vulnerability workflow."

Anthropic's mitigation strategy is also worth noting. The system card describes restricted partner access, monitoring, probe classifiers for misuse, and rapid response around cyber abuse. That is the right shape of control for a dual-use system. It is not the same thing as proving the model is safe.

The takeaway: Mythos is the warning flare. The same capability that helps defenders burn down a vulnerability backlog can help attackers compress exploit development. The only stable answer is to make the defensive workflow faster than the offensive one.

Claude Security: From Finding Bugs to Shipping Fixes

Claude Security is the productized version of a more practical idea: security teams do not just need another list of findings. They need validated findings with patches attached.

Anthropic describes Claude Security as a public beta for Claude Enterprise. It scans a codebase, reasons through the code like a security researcher, validates findings, and suggests patches that teams can review and approve.

The workflow is important:

  1. Scan the repo in parallel.
  2. Trace data flows and multi-file vulnerability patterns.
  3. Run an adversarial verification pass where Claude challenges its own finding.
  4. Produce a finding with explanation and a recommended patch.
  5. Require human review before the patch is applied.

That adversarial verification step is the part we care about most. Traditional scanners tend to fail in two ways. They miss context-dependent bugs because the rule is too shallow, or they flood the team with false positives because the rule fires without understanding the application. Claude Security is explicitly trying to reduce that gap with model-based validation.

It also fits the existing enterprise workflow. Anthropic says findings can be pushed to Slack or Jira, exported for audit, scheduled as recurring scans, and scoped to specific directories. The product focus is not "replace your security team." It is "put a researcher-shaped agent into the repetitive parts of the workflow."

There are still hard boundaries.

Claude can make mistakes. Anthropic says proposed patches should be reviewed, especially for critical systems. That is not boilerplate. A security patch can introduce a new bug, break an invariant, or close the obvious exploit while leaving the underlying trust boundary intact.

For us, the right mental model is: Claude Security is a senior triage assistant, not the final authority. Its value is in compressing the path from suspicious code to validated issue to candidate fix.

OpenAI Codex Security: Product and Control Plane

OpenAI's Codex security story has two layers, and they are easy to conflate.

The first is Codex Security, the product for scanning connected GitHub repositories. OpenAI describes it as a system that finds, validates, and remediates likely vulnerabilities. It builds repo-specific context, checks likely vulnerabilities against that context, validates high-signal issues in an isolated environment, and suggests fixes for review in GitHub.

That puts it in the same product category as Claude Security: agentic vulnerability discovery and remediation, with validation before the human sees the result.

The second layer is the security model for operating Codex itself. OpenAI's agent approvals and security documentation is unusually concrete about sandboxing, approvals, and network access.

By default, Codex runs with network access off. Locally, Codex uses an OS-enforced sandbox that typically limits writes to the active workspace. In Codex cloud, the agent runs in isolated OpenAI-managed containers. Setup can access the network to install dependencies, then the agent phase runs with network access off unless configured otherwise.

The core controls are simple:

  • Sandbox mode defines what Codex can technically do.
  • Approval policy defines when Codex must stop and ask.
  • Network access is disabled by default and treated as elevated risk.
  • Workspace write mode allows edits in the repo but requires approval for network access or writes outside the workspace.
  • Read-only mode is available for browsing and planning.
  • Full access / --yolo mode disables the safety rails and is explicitly not recommended.

OpenAI also documents protected paths and permission profiles. In workspace-write mode, paths like .git, .agents, and .codex are protected as read-only. Permission profiles can deny reads to sensitive files such as .env patterns while keeping the rest of the workspace writable.

That detail matters. The biggest risk in coding agents is not just bad code. It is ambient authority: the agent can read secrets, follow prompt-injected instructions, call the network, and mutate the repo in ways the user did not intend. OpenAI's control plane is an attempt to make those boundaries explicit.

Codex also has an automatic approval review mode, where a reviewer agent evaluates approval requests for data exfiltration, credential probing, destructive actions, and persistent security weakening. That is an interesting pattern: one agent writes, another agent reviews the permission escalation.

We expect to see more of that.

The Pattern: Validate Before You Trust

Across all three systems, the common theme is not "AI finds bugs." We have had AI-assisted bug finding for a while.

The pattern is validation before trust.

Claude Mythos is powerful enough that Anthropic restricted access and wrapped it in monitoring. Claude Security validates findings adversarially before surfacing them. Codex Security validates high-signal issues in isolated environments, while Codex itself uses sandbox and approval boundaries around agent actions.

That is the right direction.

But the operational lesson is sharper: security agents need security architecture. They need scoped credentials, isolated execution, audit logs, approval gates, deterministic builds, and a clear path for human review. The better the model gets, the more important those controls become.

A weak scanner wastes analyst time. A weak agent with write access can create new risk.

What We Would Do Now

If we were adopting these tools inside an engineering org, we would start narrow.

Give the agent one repository. Remove production secrets from the workspace. Run it in read-only or workspace-write mode. Disable network access unless a task explicitly needs it. Treat every patch as a pull request, not a hotfix. Measure false positives, true positives, patch quality, and time-to-remediation.

Then expand.

The promise is real. These systems can read more code than a human team can, keep more context in working memory, and grind through vulnerability backlogs that would otherwise sit untouched. But the winning teams will not be the ones that turn every dial to maximum autonomy on day one.

They will be the ones that build a controlled loop: scan, validate, patch, review, ship, learn.

Security agents are becoming the product. The question is whether we operate them like products too.