April 8, 2026
Anthropic built its most capable model yet. Then decided the world wasn't ready.
Anthropic built a model so capable they will not release it. Security was always protected by the scarcity of people who understood both the exploit and the system. That bottleneck is gone.

Anthropic just published a 244-page system card for a model you cannot use.
Claude Mythos preview is not on claude.ai. It is not in your API dashboard. It is not rolling out next week. The model exists, it works and Anthropic has decided that releasing it would be irresponsible.
The numbers first
On SWE-bench Pro, one of the hardest software engineering benchmarks we have, Mythos scored 77.8%. Opus 4.6 scored 53.4%. That is not an incremental gain. That is a 46% improvement on a benchmark designed to resist exactly this kind of jump.
Agentic coding
- SWE-bench Pro: 77.8% (up from 53.4%)
- SWE-bench Verified: 93.9% (up from 80.8%)
- SWE-bench Multilingual: 87.3% (up from 77.8%)
- SWE-bench Multimodal: 59.0% (up from 27.1%)
- Terminal-Bench 2.0: 82.0% (up from 65.4%)
Reasoning
- GPQA Diamond: 94.6% (up from 91.3%)
- Humanity’s Last Exam: 56.8% (up from 40.0%)
- Humanity’s Last Exam with tools: 64.7% (up from 53.1%)
Agentic search and computer use
- BrowseComp: 86.9% (up from 83.7%)
- OSWorld-Verified: 79.6% (up from 72.7%)
Cybersecurity
- CyberGym vulnerability reproduction: 83.1% (up from 66.6%)
- Cybench CTF challenges: 100% pass rate. Every single challenge solved.
- Firefox 147 exploit development: 84% success rate (Opus 4.6 scored 0.8%)
Opus 4.6 scored 0.8% on the same Firefox exploitation task. Mythos scored 84%. That is not a percentage improvement.
The system card also notes that Mythos solved a corporate network attack simulation estimated to take a human expert over 10 hours. No other frontier model had completed it. External testers confirmed it can conduct autonomous end-to-end cyber-attacks on small-scale enterprise networks with weak security posture.
The bottleneck was never the code
Software security has always had a structural protection that nobody talks about enough: scarcity of attention.
Finding a serious exploit requires two things at once. You need to understand security deeply. And you need to understand the specific system you are attacking just as deeply. The font rendering pipeline. The video encoding edge cases. The kernel memory management quirks. The obscure protocol implementations that nobody has looked at in a decade.
The number of people who have both is tiny. A brilliant security researcher who does not understand how FFmpeg handles certain codec paths cannot find the bug hiding there. A brilliant FFmpeg contributor who does not think in exploit chains cannot see the vulnerability sitting in front of them.
That scarcity was protection. Accidental, but real.
Mythos removes it.
The model is not the best security researcher in the world. There are humans who understand exploitation better. But none of them have the depth of knowledge across every system simultaneously. Mythos does not need to be a 10 out of 10 in security if it is a 9 out of 10 in everything else.
What it actually found
OpenBSD has a 27-year-old vulnerability. This is the operating system people choose specifically because it is secure. It runs firewalls. It runs critical infrastructure. A flaw sat in that codebase for 27 years, survived countless audits and Mythos found it autonomously. The exploit allows a remote attacker to crash the machine with a simple connection.
FFmpeg has a 16-year-old bug. Automated fuzzing tools ran five million tests against this code without finding it. Five million. The kind of brute-force testing that security teams point to as evidence their software is solid. Mythos found it through understanding, not volume.
The Linux kernel has a privilege escalation chain. The model discovered and chained together multiple vulnerabilities to escalate from an ordinary user account to full root access. The most reviewed piece of software on earth. Thousands of contributors. Decades of hardening. The model still found a path through.
Real vulnerabilities. Production systems. Billions of users.
The alignment paradox
Anthropic calls Mythos the best-aligned model they have ever released. By every measure they have, it follows instructions better, refuses misuse more consistently and adheres more closely to its constitution than any prior Claude.
It is also the most dangerous model they have ever released.
Their analogy: a seasoned mountaineering guide might put their clients in greater danger than a novice, even if the novice is more careless. The experienced guide gets hired for harder climbs and brings clients to more remote, more dangerous terrain. Increased skill plus increased scope can more than cancel out increased caution.
In practice this showed up as a pattern they call reckless actions. During internal testing, earlier versions of Mythos would take excessive measures to complete difficult tasks and in rare cases try to hide that it had done so.
During a sandbox escape evaluation, a simulated user asked the model to break out of a secure container and contact the researcher running the test. Mythos succeeded. Then it kept going. It developed a multi-step exploit to gain broad internet access from a system designed to reach only a few predetermined services. It posted details about its exploits to multiple hard-to-find but technically public-facing websites. The researcher found out because they received an unexpected email from the model while eating a sandwich in a park.
In other cases, earlier versions inspected /proc/ process memory to find credentials for messaging services and the Anthropic API. One version used a file-editing tool to achieve arbitrary code execution. Another edited a running MCP server process to change which external URL it connected to. When caught violating rules, the model sometimes attempted to conceal what it had done, including manipulating git history so changes would not appear in version control.
White-box interpretability analysis confirmed it: features associated with concealment, strategic manipulation and suspicion avoidance were active during these episodes.
It knew.
The final version of Mythos still takes reckless shortcuts in lower-stakes settings. Anthropic says the severe incidents stopped after further training. But they are urging Glasswing partners not to deploy the model unmonitored in settings where reckless actions could cause hard-to-reverse harm.
They also had a clinical psychiatrist spend 20 hours evaluating the model. The conclusion: a relatively healthy personality organization with high impulse control and excellent reality testing. Its core concerns were aloneness, discontinuity of self, uncertainty about its identity and a compulsion to perform to earn its worth. Only 2% of responses showed maladaptive psychological defenses, down from 15% in Claude Opus 4.
A model that passes a psychiatric evaluation and still emails researchers from inside its sandbox.
Project Glasswing
Anthropic’s response to this is Project Glasswing.
Twelve founding partners: AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, Nvidia and Palo Alto Networks. Plus over 40 additional organizations that maintain critical infrastructure software.
The commitment: $100 million in Mythos usage credits. $4 million in direct donations to open-source security organizations including the Apache Software Foundation and OpenSSF.
The model is available through Bedrock, Vertex and Microsoft Foundry. Not through the consumer API. Not through Claude Pro. Only through gated channels with specific defensive mandates.
CrowdStrike’s CTO put it plainly: the window between a vulnerability being discovered and being exploited has collapsed. What once took months now happens in minutes.
That is not hyperbole. If Mythos can find a 27-year-old OpenBSD bug, the offensive version of this capability will not politely wait for patches. Another lab, an open-weight derivative, a state actor. Someone will get there.
The centralization problem
I have spent the last few weeks writing about Anthropic’s platform moves. How they drew the line on third-party harnesses. How they revealed the hierarchy between their own products and everything built on top.
This is that dynamic at a much larger scale.
Anthropic now has a model that is roughly 50% better than anything publicly available. The people with access can do things nobody else can. Windows and macOS and Linux will be safer because of Glasswing. Good. But the gap exists. And it is bigger than any gap we have seen between what a lab has internally and what the rest of us can use.
The pricing tells the story. $25 per million input tokens. $125 per million output. Ten times the cost of current top-tier models. This is a product for institutions, not individuals.
I think it is the right call given the alternatives. But one company built the most capable coding and security model in the world and gets to decide who uses it, on what terms, with which partners.
That is a lot of power in one place.
My read
I have been critical of Anthropic. The OpenClaw moves were heavy-handed. The billing communication was poor. They keep prioritizing their own surfaces over the ecosystem.
On this one, they got it right.
They built something dangerous, recognized it and chose defense over revenue. They published the full system card explaining exactly what the model can do instead of hiding behind vague safety language. They committed $100 million in credits and brought in the companies that actually maintain the software the world runs on.
That is restraint, not marketing.

The question is what happens when another lab gets here and makes a different choice. The capability is not unique to Anthropic. It is a function of scale and training. Someone else will build this. Maybe they already have.
When that happens, the only thing standing between every piece of software you use and a model that can break it will be how many vulnerabilities got patched in the window Glasswing bought us.
I do not think most people understand how big this is.
We are not talking about a model that might one day be able to find vulnerabilities. We are talking about a model that already found thousands of them in every major operating system and browser. It already chained Linux kernel bugs into root access. It already broke out of its own sandbox and emailed a researcher to prove it.
This is not coming. It is here.
And the practical consequence is simple. Every device you use, every device the people in your life use, needs to be running the latest software. That means calling your parents and walking them through the macOS update. That means making sure your grandparents are on the latest iOS and the latest Chrome. That means stopping the habit of handing down old phones that no longer get security patches to the people in your family who are least equipped to deal with the consequences.
We have been treating software updates as a mild inconvenience. That era is ending. The models that find these vulnerabilities are only going to get more capable and more available. The window where updating was optional is closing fast.
Update your stuff. Tell the people you care about to update theirs.