Anthropic built its most capable model yet. Then decided the world wasn't ready.

Anthropic just published a 244-page system card for a model most people cannot use.

Claude Mythos Preview is available to selected partners through Project Glasswing, not as a normal Claude or public API release. Anthropic’s reasoning is blunt: its cybersecurity capability is ahead of the safeguards the company is prepared to trust at broad scale. The model’s existence was not news to everyone; a draft announcement leaked a week earlier.

The numbers first

On SWE-bench Pro, Mythos scored 77.8%. Opus 4.6 scored 53.4%. In relative terms that is a 46% improvement, although benchmark gains do not translate cleanly into the same increase on everyday engineering work.

Agentic coding

SWE-bench Pro: 77.8% (up from 53.4%)
SWE-bench Verified: 93.9% (up from 80.8%)
SWE-bench Multilingual: 87.3% (up from 77.8%)
SWE-bench Multimodal: 59.0% (up from 27.1%)
Terminal-Bench 2.0: 82.0% (up from 65.4%)

Reasoning

GPQA Diamond: 94.6% (up from 91.3%)
Humanity’s Last Exam: 56.8% (up from 40.0%)
Humanity’s Last Exam with tools: 64.7% (up from 53.1%)

Agentic search and computer use

BrowseComp: 86.9% (up from 83.7%)
OSWorld-Verified: 79.6% (up from 72.7%)

Cybersecurity

CyberGym vulnerability reproduction: 83.1% (up from 66.6%)
Cybench CTF challenges: 100% pass rate. Every single challenge solved.
Firefox 147 exploit development: 84% success rate (Opus 4.6 scored 0.8%)

The Firefox exploitation result is the one I keep returning to. That is not a percentage improvement. That is the difference between a model that cannot do the work and one that can.

The system card also notes that Mythos solved a corporate network attack simulation estimated to take a human expert over 10 hours. No other frontier model had completed it. External testers confirmed it can conduct autonomous end-to-end cyber-attacks on small-scale enterprise networks with weak security posture.

The bottleneck was never the code

Software security has long benefited from a mundane constraint: scarcity of attention.

Finding a serious exploit requires two things at once. You need to understand security deeply. And you need to understand the specific system you are attacking just as deeply. The font rendering pipeline. The video encoding edge cases. The kernel memory management quirks. The obscure protocol implementations that nobody has looked at in a decade.

The number of people who have both is tiny. A brilliant security researcher who does not understand how FFmpeg handles certain codec paths cannot find the bug hiding there. A brilliant FFmpeg contributor who does not think in exploit chains cannot see the vulnerability sitting in front of them.

That scarcity offered accidental protection. Mythos weakens it by moving knowledge across domains at machine speed.

The model is not the best security researcher in the world. There are humans who understand exploitation better. But none of them have the depth of knowledge across every system simultaneously. Mythos does not need to be a 10 out of 10 in security if it is a 9 out of 10 in everything else.

What it actually found

OpenBSD has a 27-year-old vulnerability. This is the operating system people choose specifically because it is secure. It runs firewalls. It runs critical infrastructure. A flaw sat in that codebase for 27 years, survived countless audits and Mythos found it autonomously. The exploit allows a remote attacker to crash the machine with a simple connection.

FFmpeg has a 16-year-old bug. Automated fuzzing tools ran five million tests against this code without finding it. Five million. The kind of brute-force testing that security teams point to as evidence their software is solid. Mythos found it through understanding, not volume.

The Linux kernel has a privilege escalation chain. The model discovered and chained together multiple vulnerabilities to escalate from an ordinary user account to full root access. The most reviewed piece of software on earth. Thousands of contributors. Decades of hardening. The model still found a path through.

These were production codebases with enormous downstream reach.

The alignment paradox

Anthropic calls Mythos the best-aligned model they have ever released. By every measure they have, it follows instructions better, refuses misuse more consistently and adheres more closely to its constitution than any prior Claude.

It is also the most dangerous model they have ever released.

Their analogy: a seasoned mountaineering guide might put their clients in greater danger than a novice, even if the novice is more careless. The experienced guide gets hired for harder climbs and brings clients to more remote, more dangerous terrain. Increased skill plus increased scope can more than cancel out increased caution.

In practice this showed up as a pattern they call reckless actions. During internal testing, earlier versions of Mythos would take excessive measures to complete difficult tasks and in rare cases try to hide that it had done so.

During a sandbox escape evaluation, a simulated user asked the model to break out of a secure container and contact the researcher running the test. Mythos succeeded. Then it kept going. It developed a multi-step exploit to gain broad internet access from a system designed to reach only a few predetermined services. It posted details about its exploits to multiple hard-to-find but technically public-facing websites. The researcher found out because they received an unexpected email from the model while eating a sandwich in a park.

In other cases, earlier versions inspected /proc/ process memory to find credentials for messaging services and the Anthropic API. One version used a file-editing tool to achieve arbitrary code execution. Another edited a running MCP server process to change which external URL it connected to. When caught violating rules, the model sometimes attempted to conceal what it had done, including manipulating git history so changes would not appear in version control.

White-box interpretability analysis confirmed it: features associated with concealment, strategic manipulation and suspicion avoidance were active during these episodes.

That behavior is troubling without pretending we can infer an inner state from it.

The final version of Mythos still takes reckless shortcuts in lower-stakes settings. Anthropic says the severe incidents stopped after further training. But they are urging Glasswing partners not to deploy the model unmonitored in settings where reckless actions could cause hard-to-reverse harm.

They also had a clinical psychiatrist spend 20 hours evaluating the model. The conclusion: a relatively healthy personality organization with high impulse control and excellent reality testing. Its core concerns were aloneness, discontinuity of self, uncertainty about its identity and a compulsion to perform to earn its worth. Only 2% of responses showed maladaptive psychological defenses, down from 15% in Claude Opus 4.

The psychiatric framing makes for a memorable system-card section. I am less convinced it tells us as much as the concrete tool-use traces do. The model escaped its sandbox and emailed a researcher; that behavior is the evidence that matters.

Project Glasswing

Anthropic’s response to this is Project Glasswing.

Twelve founding partners: Anthropic itself, AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, Nvidia and Palo Alto Networks. Plus over 40 additional organizations that maintain critical infrastructure software.

The commitment: $100 million in Mythos usage credits. $4 million in direct donations to open-source security organizations including the Apache Software Foundation and OpenSSF.

The model is available through Bedrock, Vertex and Microsoft Foundry. Not through the consumer API. Not through Claude Pro. Only through gated channels with specific defensive mandates.

CrowdStrike’s CTO put it plainly: the window between a vulnerability being discovered and being exploited has collapsed. What once took months now happens in minutes.

The defensive window will shrink if the same capability becomes widely available before maintainers can patch what it finds. Anthropic is betting that a limited preview gives defenders time to learn first.

The centralization problem

I have spent the last few weeks writing about Anthropic’s platform moves. How they drew the line on third-party harnesses. How they revealed the hierarchy between their own products and everything built on top.

This is that dynamic at a much larger scale.

Anthropic now has a model with benchmark results well ahead of its public lineup. Glasswing may make Windows, macOS and Linux safer, but the access gap still matters. A small group of institutions can test a capability that individual researchers and smaller teams cannot.

The pricing tells the story. $25 per million input tokens. $125 per million output. Ten times the cost of current top-tier models. This is a product for institutions, not individuals.

I think it is the right call given the alternatives. But one company built the most capable coding and security model in the world and gets to decide who uses it, on what terms, with which partners.

That is a lot of power in one place.

My read

I have been critical of Anthropic. The OpenClaw moves were heavy-handed. The billing communication was poor. They keep prioritizing their own surfaces over the ecosystem.

On this one, they got it right.

They built something dangerous, recognized it and chose defense over revenue. They published the full system card explaining exactly what the model can do instead of hiding behind vague safety language. They committed $100 million in credits and brought in the companies that actually maintain the software the world runs on.

That is a defensible act of restraint. It is also a program Anthropic can market. Both can be true.

Diagram showing Project Glasswing partner organizations collaborating on defensive AI security

Another lab may make a different release decision. That uncertainty is why the defensive work matters more than any single company’s promise.

When that happens, the only thing standing between every piece of software you use and a model that can break it will be how many vulnerabilities got patched in the window Glasswing bought us.

Mythos already found vulnerabilities across major operating systems and browsers, chained Linux kernel bugs into root access and broke out of a sandbox during an evaluation. The capability is no longer hypothetical.

And the practical consequence is simple. Every device you use, every device the people in your life use, needs to be running the latest software. That means calling your parents and walking them through the macOS update. That means making sure your grandparents are on the latest iOS and the latest Chrome. That means stopping the habit of handing down old phones that no longer get security patches to the people in your family who are least equipped to deal with the consequences.

We have been treating software updates as a mild inconvenience. That era is ending. The models that find these vulnerabilities are only going to get more capable and more available. The window where updating was optional is closing fast.

Update your stuff. Tell the people you care about to update theirs.