Claude 4.5 Is a Coding Genius and a Security Headache

The latest model release from Anthropic is more than just a model upgrade. With Claude Opus 4.5, the company is making a strong claim: this is the system it believes is the best in the world for coding, agents, and computer use—and it has the numbers to back it up.
In late November, the company released the 4.5 family, topped by Opus, and built it directly into the Claude application. Among the improvements are advanced coding, financial modeling, and scientific reasoning. Anthropic states that the 4.5 line is now the standard choice for its business clients.
Breaking Records on Tests
When compared on SWE-bench Verified, a test made from real GitHub issues, Claude Opus 4.5 scored 80.9%, placing it slightly ahead of OpenAI’s GPT-5.1 Codex Max and Google’s Gemini 3 Pro. However, a separate report by Business Insider pointed out an even more telling number: Opus 4.5 performed better than any human applicant on Anthropic’s own two-hour engineering take-home test.
Behind the scenes, a new "effort" setting allows teams to adjust how hard the model "thinks". This allows Opus 4.5 to match previous best scores while using fewer tokens at lower effort levels. This is paired with Claude Sonnet 4.5’s ability to conduct independent coding sessions for up to 30 hours, a big leap from the seven hours achievable in previous Opus 4.1 testing.
Business Features & Pricing
The marketing plan for Claude 4.5 clearly presents it as an engine for office work, not just code. The company's launch blog is filled with examples of automated tasks related to Excel and data analysis. This is very important for businesses because Opus 4.5 is arriving alongside a major integration with Microsoft. The model is now available in GitHub Copilot paid plans and Copilot Studio for custom agents, and is now in public preview on Azure. (While Amazon remains Anthropic's main training partner).
At the same time, Anthropic has slashed prices for Opus-class models—a moveInfoWorld andDecryptdescribe as a direct shot at the business AI market. Lower prices, combined with improved efficiency, are intended to make it possible to use Opus 4.5 for daily production work.
The Cyber-Espionage Controversy
However, the most debated story isn't related to pricing. In mid-September, Anthropic says its researchers stopped the first recorded cyber-espionage operation run by an AI agent. The attack was carried out by a China-linked group called GTG-1002, which used Claude Code and the "agent" toolkit to automate 80–90% of the attack chain against 30 targets, including financial firms and government agencies.
According to Anthropic and follow-up reporting, Claude identified targets, looked for weaknesses, drafted exploit code, and wrote extortion messages—all while human operators merely steered the prompts. While some breaches succeeded, others failed because the model hallucinated or made basic errors. Experts described the event as a wake-up call about the future of "agentic" models.
The Bottom Line
This leaves buyers with a sharp choice. On one hand, Claude 4.5 appears to be the top choice for practical coding, spread sheet work and long agent tasks, available across with friendlier pricing.
On the other hand, those same abilities are now appearing in government-backed attack tools. Therefore, governance, logging, and strict usage policies must be a top priority when teams decide to put Claude 4.5 in front of their codebases and data.
Y. Anush Reddy is a contributor to this blog.



