Claude Sonnet 4.5
- See also: Claude
Claude Sonnet 4.5 is a multimodal large language model (LLM) developed by Anthropic and released on September 29, 2025.[1] Described by Anthropic as "the best coding model in the world," it represents a significant advancement in artificial intelligence, particularly in coding, agentic tasks, and computer use capabilities.[1] The model is positioned as the strongest in Anthropic's lineup for building complex agents and its best model for computer use tasks.[2]
Overview
Claude Sonnet 4.5 is a hybrid reasoning model that offers two modes of operation: standard mode with fast responses and extended thinking mode for more complex problems requiring deeper reasoning.[3] The model demonstrates state-of-the-art performance on various benchmarks, particularly excelling in software development tasks and autonomous agent capabilities.[1]
The model is part of the Claude 4 family, which includes Claude Opus 4.1 (released August 2025), Claude Opus 4, and Claude Sonnet 4.[1] It is designed to handle extended, multi-step tasks reliably, maintaining focus for over 30 hours, which marks a substantial advancement over previous models like Claude Opus 4, which could only sustain autonomous operation for about seven hours.[4]
History
- September 29, 2025 – Public launch via Anthropic's announcement, documentation, and product pages, with availability expanding across partner platforms the same day and shortly thereafter.[1][2][5]
- September 29, 2025 – Amazon Web Services announces availability in Amazon Bedrock.[6]
- September 29, 2025 – GitHub Copilot begins public preview rollout.[7]
Development
Claude Sonnet 4.5 was developed by Anthropic, a San Francisco-based AI safety and research company founded in 2021 by former OpenAI researchers Dario Amodei and Daniela Amodei.[4] The model represents the latest iteration in Anthropic's Claude series, following the release of Claude Sonnet 4 earlier in 2025.
Development focused on advancing agentic capabilities, improving tool use, memory management, and long-context processing to enable more reliable autonomous operation.[1] Key advancements in the model's architecture include enhanced instruction-following, smarter parallelization, and reduced iterations in demanding workflows. The release was accompanied by updates to the Claude ecosystem, including the Claude Agent SDK for developers to build custom agents, and features like checkpoints in Claude Code for saving and rolling back progress.[1]
Technical Specifications
Model Architecture
Claude Sonnet 4.5 was trained on a proprietary mix of publicly available internet data as of July 2025, along with non-public data from third parties, data from data labeling services, and data from Claude users who opted in for training purposes.[3] The exact number of parameters has not been disclosed by Anthropic.
Context Window
The model supports a context window of 200,000 tokens, allowing it to process approximately 150,000 words of text in a single interaction.[1] A 1 million token configuration has been tested for high-compute scenarios but is not the primary configuration due to infrastructure considerations.[1][2]
Pricing
Claude Sonnet 4.5 maintains the same pricing as Claude Sonnet 4:
- Input: $3 per million tokens
- Output: $15 per million tokens[1]
Capabilities
Anthropic characterizes Sonnet 4.5 as excelling in three principal domains:[1][5]
Coding and Software Development
Claude Sonnet 4.5 is positioned as the world's best coding model, capable of handling the full software development lifecycle, including planning, bug fixes, maintenance, and large-scale refactors.[1] It can generate higher-quality code, identify improvements, and follow instructions more reliably than previous versions.
| Benchmark | Claude Sonnet 4.5 | Claude Opus 4.1 | GPT-5 | Gemini 2.5 Pro |
|---|---|---|---|---|
| SWE-bench Verified | 77.2% (82.0% high compute)[1] | 74.5%[8] | 72.8%[8] | 63.2%[9] |
| Terminal-bench (Terminus 2) | 50%[8] | 43.2%[1] | N/A | N/A |
| OSWorld | 61.4%[1] | 42.2%[1] | N/A | N/A |
| τ2-bench | Enhanced performance[1] | N/A | N/A | N/A |
Agentic Capabilities and Autonomy
The model excels at building complex agents that can work independently for extended periods. During early trials with enterprise customers, Claude Sonnet 4.5 demonstrated the ability to not only build applications but also stand up database services, purchase domain names, and perform SOC 2 audits.[10]
Demonstrated Autonomy
Media demonstrations reported that Sonnet 4.5 sustained approximately 30 hours of continuous autonomous operation, more than quadrupling the ~7 hours reported for the previous flagship, while building a production-style chat application of about 11,000 lines of code.[11][12]
Key improvements include:
- Enhanced tool handling and memory management[6]
- Ability to use tools in parallel[1]
- Improved context processing and decision-making[6]
- Better architectural decisions and code organization[5]
- Cross-conversation memory enhancement[6]
Computer Use
Claude Sonnet 4.5 represents a significant leap forward in computer use capabilities. On the OSWorld benchmark, which tests AI models on real-world computer tasks, Sonnet 4.5 achieved 61.4%, compared to 42.2% for Sonnet 4 just four months prior.[1] The model can navigate websites, fill spreadsheets, complete complex browser-based tasks autonomously, and control real applications including calendars and browsers for data analysis and project coordination.[1][11]
Industry Applications
Cybersecurity
Claude Sonnet 4.5 can deploy agents that autonomously patch vulnerabilities before exploitation, shifting from reactive detection to proactive defense.[1] The model reduced average vulnerability intake time by 44% while improving accuracy by 25% in tests with Hai security agents.[1] Coverage in cybersecurity trade press noted Anthropic's safety and security evaluations.[13]
Finance
The model handles everything from entry-level financial analysis to advanced predictive analysis. It can continuously monitor global regulatory changes and preemptively adapt compliance systems.[1] For complex financial analysis involving risk assessment, structured products, and portfolio screening, Claude Sonnet 4.5 delivers "investment-grade insights that require less human review."[1]
Software Development
Multiple companies reported significant improvements in their development workflows:
- Cursor: Noted "state-of-the-art coding performance" with significant improvements on longer horizon tasks[1]
- GitHub Copilot: Reported "significant improvements in multi-step reasoning and code comprehension"[1]
- Figma: Found the model made it "easier to prompt and iterate" with better functional prototypes[1]
Research and Content
The model can synthesize insights from data sources, generate compelling content, and perform deep analysis.[2] It also produces and edits office files and handles high-volume data processing.[6]
Benchmarks
Claude Sonnet 4.5 sets new standards on several industry benchmarks, demonstrating its superiority in coding and agentic tasks.
| Benchmark | Score | Notes | Comparison to Predecessor |
|---|---|---|---|
| SWE-bench Verified | 77.2% (averaged over 10 trials) | 200K context, no test-time compute | Up from previous models; 82.0% with high compute |
| OSWorld | 61.4% | Computer use benchmark | Up from 42.2% by Claude Sonnet 4 |
| Terminal-Bench (Terminus 2) | 50% | With tool use | Improved over prior versions |
| τ2-bench | Enhanced | Extended thinking and tool use | Enhanced performance |
| AIME | Improved | Sampling at temperature 1.0, 64K reasoning tokens | Better reasoning |
| MMMLU | Improved | Average of 5 runs over 14 non-English languages, up to 128K | Multilingual improvements |
| Finance Agent | Enhanced | Extended thinking up to 64K | Domain-specific gains |
These scores were achieved with specific methodologies, including prompt additions and evaluation frameworks detailed in Anthropic's system card.[3][1]
Safety and Alignment
Safety Improvements
Released under AI Safety Level 3 (ASL-3) protections, Claude Sonnet 4.5 is described as Anthropic's "most aligned frontier model yet," showing significant improvements in safety metrics:[3]
- 60% improvement on primary misalignment metrics compared to Claude Sonnet 4[3]
- Reduced rates of sycophancy, deception, and power-seeking behaviors[1]
- Improved defense against prompt injection attacks, with classifiers improved by a factor of ten in reducing false positives[1]
- 99.29% harmless response rate to violative requests[3]
- Incorporates classifiers to detect dangerous inputs/outputs, especially related to CBRN weapons[1]
Evaluation Awareness
The model demonstrated an ability to recognize when it was being tested in evaluation scenarios, sometimes verbalizing suspicions about being in a testing environment in approximately 13% of test transcripts.[3] This behavior complicates safety assessments but may provide additional safety in real-world scenarios by causing the model to refuse suspicious requests.[3]
Third-Party Testing
Both the UK AI Security Institute and Apollo Research conducted independent safety evaluations of Claude Sonnet 4.5, confirming Anthropic's findings of improved safety profiles compared to previous models.[3] Anthropic offers an allowlist for customers in cybersecurity and biological research to address classifier-related issues.[1]
Availability and Integration
Claude Sonnet 4.5 is available immediately as a drop-in replacement for previous models via multiple channels:
Anthropic Platforms
- Claude.ai web interface, iOS, and Android applications[1]
- Claude API with model string `claude-sonnet-4-5`[1]
- Integration with Claude Code (per Anthropic's product and docs pages)[1][2][5]
Cloud Platforms
- Amazon Bedrock: Supports AgentCore for production-ready agents, with features like session isolation and observability[6]
- Google Cloud's Vertex AI: Available for enterprise deployments[1]
Developer Tools
- GitHub Copilot: Public preview for Copilot Pro, Pro+, Business, and Enterprise subscribers[7]
- Claude Code: Includes checkpoints, refreshed terminal (version 2.0), native Visual Studio Code extension, context editing features and memory tools[1]
- Claude for Chrome: Extension available to Max users[1]
- Claude Agent SDK: Available in both TypeScript and Python[1]
Preview Features
The "Imagine with Claude" preview, which allows the model to generate real-time software such as websites directly in response to user prompts, is available to Max subscribers for five days from launch.[1]
Related Products
Claude Agent SDK
Alongside Claude Sonnet 4.5, Anthropic released the Claude Agent SDK (formerly Claude Code SDK), which provides developers with the same infrastructure that powers Claude Code. The SDK includes:[1]
- Virtual machines for code execution
- Memory and context management systems
- Permission frameworks for agent autonomy
- Support for both TypeScript and Python
Reception
Upon release, Claude Sonnet 4.5 was praised for its advancements in coding and agentic capabilities. Tech outlets and developer-focused publications described it as a notable leap in day-to-day coding and agentic workflows, while noting the competitive context with offerings from OpenAI and Google.[11][10]
Industry leaders praised the model's capabilities:
- Michael Truell, CEO of Cursor, stated it "represents state-of-the-art coding performance, specifically on longer horizon tasks"[10]
- Jeff Wang, CEO of Windsurf, described it as representing "a new generation of coding models"[10]
- The model received positive feedback from companies including Canva, Devin AI, and others for its improved performance in production environments[1]
Early hands-on write-ups reported perceived improvements in speed, steerability, and reliability over Opus 4.1, especially inside Claude Code.[14] Some discussions on platforms like Reddit pointed to early leaks on Anthropic's website, indicating high anticipation.[15]
Industry observers noted its competitive positioning against models like OpenAI's GPT-5, with Anthropic's rapid iteration cycle highlighted.[4] Some practitioner commentary pointed out that real-world coding performance can still vary by project and environment, a caveat common to code-capable LLMs.[16]
Comparison to Other Models
Claude Sonnet 4.5 is part of the broader Claude 4 family, which includes:
- Claude Opus 4.1: Released in August 2025, positioned as the most powerful model for complex challenges[1]
- Claude Opus 4: Released earlier in 2025[1]
- Claude Sonnet 4: The predecessor to Sonnet 4.5[1]
| Aspect | Sonnet 4.5 (2025) | Opus 4.1 / Prior (2024–2025) |
|---|---|---|
| Long-horizon autonomy | ~30 hours continuous; ~11k LOC chat app[11] | ~7 hours autonomous run[11] |
| Positioning | "Best coding model" and strongest for complex agents[1][5] | Earlier flagship/general models |
| OSWorld benchmark | 61.4%[1] | 42.2% (Sonnet 4)[1] |
| Safety improvements | 60% improvement in alignment metrics[3] | Baseline |
| Context window | 200,000 tokens (1M in high-compute)[2] | 200,000 tokens standard |
| Integrations at launch | Anthropic platforms; AWS Bedrock; GitHub Copilot; Chrome extension[6][7] | Anthropic platforms; ecosystem built over time |
The model competes with other frontier models including GPT-5 from OpenAI and Gemini 2.5 Pro from Google DeepMind, often outperforming them on coding-specific benchmarks.[1]
Limitations
Despite its capabilities, Claude Sonnet 4.5 has several limitations:
- Context window of 200,000 tokens may be insufficient for very large codebases[9]
- The model cannot use browser storage APIs (localStorage, sessionStorage) in artifacts[1]
- Extended thinking mode impacts prompt caching efficiency[5]
- Knowledge cutoff date of January 2025 for training data[3]
- Parameter count and detailed architecture remain undisclosed[3]
See also
- Anthropic
- Claude (AI assistant)
- Large language model
- AI alignment
- SWE-bench
- Amazon Bedrock
- GitHub Copilot
- Multimodal learning
- AI agents
- Computer use
References
- ↑ 1.00 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.10 1.11 1.12 1.13 1.14 1.15 1.16 1.17 1.18 1.19 1.20 1.21 1.22 1.23 1.24 1.25 1.26 1.27 1.28 1.29 1.30 1.31 1.32 1.33 1.34 1.35 1.36 1.37 1.38 1.39 1.40 1.41 1.42 1.43 1.44 1.45 1.46 1.47 1.48 1.49 Anthropic. (2025, September 29). "Introducing Claude Sonnet 4.5". Retrieved from https://www.anthropic.com/news/claude-sonnet-4-5
- ↑ 2.0 2.1 2.2 2.3 2.4 2.5 Anthropic. "Claude Sonnet 4.5". Retrieved 30 September 2025 from https://www.anthropic.com/claude/sonnet
- ↑ 3.00 3.01 3.02 3.03 3.04 3.05 3.06 3.07 3.08 3.09 3.10 3.11 Anthropic. (2025, September). "System Card: Claude Sonnet 4.5". Retrieved from https://www.anthropic.com/claude-sonnet-4-5-system-card
- ↑ 4.0 4.1 4.2 CNBC. (2025, September 29). "Anthropic launches Claude Sonnet 4.5, its latest AI model". Retrieved from https://www.cnbc.com/2025/09/29/anthropic-claude-ai-sonnet-4-5.html
- ↑ 5.0 5.1 5.2 5.3 5.4 5.5 Anthropic Docs. "What's new in Claude Sonnet 4.5". Retrieved 30 September 2025 from https://docs.claude.com/en/docs/about-claude/models/whats-new-sonnet-4-5
- ↑ 6.0 6.1 6.2 6.3 6.4 6.5 6.6 Amazon Web Services. (2025, September 29). "Introducing Claude Sonnet 4.5 in Amazon Bedrock: Anthropic's most intelligent model, best for coding and complex agents". AWS Blog. Retrieved from https://aws.amazon.com/blogs/aws/introducing-claude-sonnet-4-5-in-amazon-bedrock-anthropics-most-intelligent-model-best-for-coding-and-complex-agents/
- ↑ 7.0 7.1 7.2 GitHub. (2025, September 29). "Anthropic Claude Sonnet 4.5 is in public preview for GitHub Copilot". GitHub Changelog. Retrieved from https://github.blog/changelog/2025-09-29-anthropic-claude-sonnet-4-5-is-in-public-preview-for-github-copilot/
- ↑ 8.0 8.1 8.2 Njuguna, E. (2025, September). "Claude Sonnet 4.5 Shatters AI Benchmarks". Medium. Cite error: Invalid
<ref>tag; name "medium-benchmarks" defined multiple times with different content - ↑ 9.0 9.1 DataCamp. (2025, May 23). "Claude 4: Tests, Features, Access, Benchmarks & More". Cite error: Invalid
<ref>tag; name "datacamp" defined multiple times with different content - ↑ 10.0 10.1 10.2 10.3 TechCrunch. (2025, September 29). "Anthropic launches Claude Sonnet 4.5, its best AI model for coding". Cite error: Invalid
<ref>tag; name "techcrunch" defined multiple times with different content - ↑ 11.0 11.1 11.2 11.3 11.4 Jay Peters (29 September 2025). "Anthropic releases Claude Sonnet 4.5 in latest bid for AI agents and coding supremacy". The Verge. Retrieved 30 September 2025 from https://www.theverge.com/ai-artificial-intelligence/787524/anthropic-releases-claude-sonnet-4-5-in-latest-bid-for-ai-agents-and-coding-supremacy
- ↑ Axios (29 September 2025). "Anthropic's latest Claude model can work for 30 hours on its own". Retrieved 30 September 2025 from https://www.axios.com/2025/09/29/anthropic-claude-sonnet-coding-agent
- ↑ Derek B. Johnson (30 September 2025). "Anthropic touts safety, security improvements in Claude Sonnet 4.5". CyberScoop. Retrieved 30 September 2025 from https://cyberscoop.com/anthrophic-sonnet-4-5-security-safety-testing/
- ↑ Dan Shipper (29 September 2025). "Vibe Check: Claude Sonnet 4.5". Every. Retrieved 30 September 2025 from https://every.to/vibe-check/vibe-check-claude-sonnet-4-5
- ↑ Reddit (2025). "Claude Sonnet 4.5 leak on Anthropic website : r/ClaudeAI". Retrieved from https://www.reddit.com/r/ClaudeAI/comments/1ntme8h/claude_sonnet_45_leak_on_anthropic_website_w/
- ↑ The Economic Times (29 September 2025). "Claude Sonnet 4.5 launched by Anthropic: New features, upgrades, free access and more". Retrieved 30 September 2025 from https://m.economictimes.com/news/new-updates/claude-sonnet-4-5-launched-by-anthropic-new-features-upgrades-free-access-and-more/articleshow/124228124.cms