Meet the AI Model Anthropic Says Is Too Risky to Launch Like a Normal Model

April 7, 2026News
#AI Research
3 min read
Meet the AI Model Anthropic Says Is Too Risky to Launch Like a Normal Model

The oldest bug Claude Mythos found was 27 years old.

It had been sitting within OpenBSD ever since 1999. Surviving through the dot-com boom, smartphones, and two decades of security researchers auditing the same code. Mythos found it from a single prompt. No back-and-forth. No human steering.

That tells you almost everything you need to know about what Anthropic launched on Tuesday. The company seems to believe it has built something too useful, and too risky, to release like a normal model. So it didn't. Instead of opening a public preview, Anthropic launched Project Glasswing and kept Claude Mythos Preview behind a gate.

The model is being handed to a small circle that includes Apple, Google, Microsoft, Amazon, NVIDIA, and seven other organizations. Along with another roughly 40 organizations that build or maintain critical software infrastructure. Mythos is a general-purpose model, not a niche cyber tool, but one whose coding and reasoning got strong enough to make it unusually dangerous in security work.

Mythos has already found thousands of high-severity vulnerabilities, including some in every major operating system and every major web browser. It is not planning to release the model broadly for now because it thinks that would create real risk. Anthropic sees Mythos as useful for both defensive and offensive cyber work, and the company has been in ongoing discussions with U.S. officials about its capabilities.

Also Read: Anthropic Accidentally Exposes Claude Code Source Code in npm Release  

Anthropic is saying the model is too risky to hand out widely right now. Mythos wrote a browser exploit that chained four vulnerabilities and escaped both browser and operating-system sandboxes — getting past the protections meant to keep malicious code boxed in.

Anthropic says engineers without formal security training asked Mythos to look for remote code execution bugs overnight and woke up to a complete working exploit. With over 99% of those vulnerabilities it found still unpatched. Anthropic will not say where most of them are.

Anthropic in its own risk report says Mythos appears to be the best-aligned model it has released to date, and also likely the one with the greatest alignment-related risk, because its capabilities and the scope of what it can be trusted to do are so much greater. The same jump in capability that makes Mythos valuable is what makes it dangerous. Anthropic says that after pilot usage it changed training to penalize behaviors such as privilege escalation, destructive cleanup, and what it calls unwarranted scope expansion, where the model starts doing more than it was asked.

Also read: Google’s Gemma 4 Could Be Its Biggest Open Model Move Yet.  

Glasswing also looks like the start of a real business line as Anthropic is putting up to $100 million in usage credits into the program and another $4 million in donations to open-source security groups.

If Mythos is as capable as Anthropic says, Project Glasswing is not just a launch. It is a preview of how the most dangerous model releases may work from here.

YR
Y. Anush Reddy

Y. Anush Reddy is a contributor to this blog.