Claude Sonnet 4.5

From AI Wiki

Template:Infobox AI model

See also: Claude
Claude sonnet 4.5 logo1.png

Claude Sonnet 4.5 is a multimodal large language model (LLM) developed by Anthropic and released on September 29, 2025.[1] Described by Anthropic as "the best coding model in the world," it represents a significant advancement in artificial intelligence, particularly in coding, agentic tasks, and computer use capabilities.[1] The model is positioned as the strongest in Anthropic's lineup for building complex agents and its best model for computer use tasks.[2]

Overview

Claude Sonnet 4.5 is a hybrid reasoning model that offers two modes of operation: standard mode with fast responses and extended thinking mode for more complex problems requiring deeper reasoning.[3] The model demonstrates state-of-the-art performance on various benchmarks, particularly excelling in software development tasks and autonomous agent capabilities.[1]

The model is part of the Claude 4 family, which includes Claude Opus 4.1 (released August 2025), Claude Opus 4, and Claude Sonnet 4.[1] It is designed to handle extended, multi-step tasks reliably, maintaining focus for over 30 hours, which marks a substantial advancement over previous models like Claude Opus 4, which could only sustain autonomous operation for about seven hours.[4]

History

  • September 29, 2025 – Public launch via Anthropic's announcement, documentation, and product pages, with availability expanding across partner platforms the same day and shortly thereafter.[1][2][5]
  • September 29, 2025Amazon Web Services announces availability in Amazon Bedrock.[6]
  • September 29, 2025GitHub Copilot begins public preview rollout.[7]

Development

Claude Sonnet 4.5 was developed by Anthropic, a San Francisco-based AI safety and research company founded in 2021 by former OpenAI researchers Dario Amodei and Daniela Amodei.[4] The model represents the latest iteration in Anthropic's Claude series, following the release of Claude Sonnet 4 earlier in 2025.

Development focused on advancing agentic capabilities, improving tool use, memory management, and long-context processing to enable more reliable autonomous operation.[1] Key advancements in the model's architecture include enhanced instruction-following, smarter parallelization, and reduced iterations in demanding workflows. The release was accompanied by updates to the Claude ecosystem, including the Claude Agent SDK for developers to build custom agents, and features like checkpoints in Claude Code for saving and rolling back progress.[1]

Technical Specifications

Model Architecture

Claude Sonnet 4.5 was trained on a proprietary mix of publicly available internet data as of July 2025, along with non-public data from third parties, data from data labeling services, and data from Claude users who opted in for training purposes.[3] The exact number of parameters has not been disclosed by Anthropic.

Context Window

The model supports a context window of 200,000 tokens, allowing it to process approximately 150,000 words of text in a single interaction.[1] A 1 million token configuration has been tested for high-compute scenarios but is not the primary configuration due to infrastructure considerations.[1][2]

Pricing

Claude Sonnet 4.5 maintains the same pricing as Claude Sonnet 4:

  • Input: $3 per million tokens
  • Output: $15 per million tokens[1]

Capabilities

Anthropic characterizes Sonnet 4.5 as excelling in three principal domains:[1][5]

Coding and Software Development

Claude Sonnet 4.5 is positioned as the world's best coding model, capable of handling the full software development lifecycle, including planning, bug fixes, maintenance, and large-scale refactors.[1] It can generate higher-quality code, identify improvements, and follow instructions more reliably than previous versions.

Coding Benchmark Performance
Benchmark Claude Sonnet 4.5 Claude Opus 4.1 GPT-5 Gemini 2.5 Pro
SWE-bench Verified 77.2% (82.0% high compute)[1] 74.5%[8] 72.8%[8] 63.2%[9]
Terminal-bench (Terminus 2) 50%[8] 43.2%[1] N/A N/A
OSWorld 61.4%[1] 42.2%[1] N/A N/A
τ2-bench Enhanced performance[1] N/A N/A N/A

Agentic Capabilities and Autonomy

The model excels at building complex agents that can work independently for extended periods. During early trials with enterprise customers, Claude Sonnet 4.5 demonstrated the ability to not only build applications but also stand up database services, purchase domain names, and perform SOC 2 audits.[10]

Demonstrated Autonomy

Media demonstrations reported that Sonnet 4.5 sustained approximately 30 hours of continuous autonomous operation, more than quadrupling the ~7 hours reported for the previous flagship, while building a production-style chat application of about 11,000 lines of code.[11][12]

Key improvements include:

  • Enhanced tool handling and memory management[6]
  • Ability to use tools in parallel[1]
  • Improved context processing and decision-making[6]
  • Better architectural decisions and code organization[5]
  • Cross-conversation memory enhancement[6]

Computer Use

Claude Sonnet 4.5 represents a significant leap forward in computer use capabilities. On the OSWorld benchmark, which tests AI models on real-world computer tasks, Sonnet 4.5 achieved 61.4%, compared to 42.2% for Sonnet 4 just four months prior.[1] The model can navigate websites, fill spreadsheets, complete complex browser-based tasks autonomously, and control real applications including calendars and browsers for data analysis and project coordination.[1][11]

Industry Applications

Cybersecurity

Claude Sonnet 4.5 can deploy agents that autonomously patch vulnerabilities before exploitation, shifting from reactive detection to proactive defense.[1] The model reduced average vulnerability intake time by 44% while improving accuracy by 25% in tests with Hai security agents.[1] Coverage in cybersecurity trade press noted Anthropic's safety and security evaluations.[13]

Finance

The model handles everything from entry-level financial analysis to advanced predictive analysis. It can continuously monitor global regulatory changes and preemptively adapt compliance systems.[1] For complex financial analysis involving risk assessment, structured products, and portfolio screening, Claude Sonnet 4.5 delivers "investment-grade insights that require less human review."[1]

Software Development

Multiple companies reported significant improvements in their development workflows:

  • Cursor: Noted "state-of-the-art coding performance" with significant improvements on longer horizon tasks[1]
  • GitHub Copilot: Reported "significant improvements in multi-step reasoning and code comprehension"[1]
  • Figma: Found the model made it "easier to prompt and iterate" with better functional prototypes[1]

Research and Content

The model can synthesize insights from data sources, generate compelling content, and perform deep analysis.[2] It also produces and edits office files and handles high-volume data processing.[6]

Benchmarks

Claude Sonnet 4.5 sets new standards on several industry benchmarks, demonstrating its superiority in coding and agentic tasks.

Comprehensive Benchmark Performance
Benchmark Score Notes Comparison to Predecessor
SWE-bench Verified 77.2% (averaged over 10 trials) 200K context, no test-time compute Up from previous models; 82.0% with high compute
OSWorld 61.4% Computer use benchmark Up from 42.2% by Claude Sonnet 4
Terminal-Bench (Terminus 2) 50% With tool use Improved over prior versions
τ2-bench Enhanced Extended thinking and tool use Enhanced performance
AIME Improved Sampling at temperature 1.0, 64K reasoning tokens Better reasoning
MMMLU Improved Average of 5 runs over 14 non-English languages, up to 128K Multilingual improvements
Finance Agent Enhanced Extended thinking up to 64K Domain-specific gains

These scores were achieved with specific methodologies, including prompt additions and evaluation frameworks detailed in Anthropic's system card.[3][1]

Safety and Alignment

Safety Improvements

Released under AI Safety Level 3 (ASL-3) protections, Claude Sonnet 4.5 is described as Anthropic's "most aligned frontier model yet," showing significant improvements in safety metrics:[3]

  • 60% improvement on primary misalignment metrics compared to Claude Sonnet 4[3]
  • Reduced rates of sycophancy, deception, and power-seeking behaviors[1]
  • Improved defense against prompt injection attacks, with classifiers improved by a factor of ten in reducing false positives[1]
  • 99.29% harmless response rate to violative requests[3]
  • Incorporates classifiers to detect dangerous inputs/outputs, especially related to CBRN weapons[1]

Evaluation Awareness

The model demonstrated an ability to recognize when it was being tested in evaluation scenarios, sometimes verbalizing suspicions about being in a testing environment in approximately 13% of test transcripts.[3] This behavior complicates safety assessments but may provide additional safety in real-world scenarios by causing the model to refuse suspicious requests.[3]

Third-Party Testing

Both the UK AI Security Institute and Apollo Research conducted independent safety evaluations of Claude Sonnet 4.5, confirming Anthropic's findings of improved safety profiles compared to previous models.[3] Anthropic offers an allowlist for customers in cybersecurity and biological research to address classifier-related issues.[1]

Availability and Integration

Claude Sonnet 4.5 is available immediately as a drop-in replacement for previous models via multiple channels:

Anthropic Platforms

  • Claude.ai web interface, iOS, and Android applications[1]
  • Claude API with model string `claude-sonnet-4-5`[1]
  • Integration with Claude Code (per Anthropic's product and docs pages)[1][2][5]

Cloud Platforms

Developer Tools

  • GitHub Copilot: Public preview for Copilot Pro, Pro+, Business, and Enterprise subscribers[7]
  • Claude Code: Includes checkpoints, refreshed terminal (version 2.0), native Visual Studio Code extension, context editing features and memory tools[1]
  • Claude for Chrome: Extension available to Max users[1]
  • Claude Agent SDK: Available in both TypeScript and Python[1]

Preview Features

The "Imagine with Claude" preview, which allows the model to generate real-time software such as websites directly in response to user prompts, is available to Max subscribers for five days from launch.[1]

Related Products

Claude Agent SDK

Alongside Claude Sonnet 4.5, Anthropic released the Claude Agent SDK (formerly Claude Code SDK), which provides developers with the same infrastructure that powers Claude Code. The SDK includes:[1]

  • Virtual machines for code execution
  • Memory and context management systems
  • Permission frameworks for agent autonomy
  • Support for both TypeScript and Python

Reception

Upon release, Claude Sonnet 4.5 was praised for its advancements in coding and agentic capabilities. Tech outlets and developer-focused publications described it as a notable leap in day-to-day coding and agentic workflows, while noting the competitive context with offerings from OpenAI and Google.[11][10]

Industry leaders praised the model's capabilities:

  • Michael Truell, CEO of Cursor, stated it "represents state-of-the-art coding performance, specifically on longer horizon tasks"[10]
  • Jeff Wang, CEO of Windsurf, described it as representing "a new generation of coding models"[10]
  • The model received positive feedback from companies including Canva, Devin AI, and others for its improved performance in production environments[1]

Early hands-on write-ups reported perceived improvements in speed, steerability, and reliability over Opus 4.1, especially inside Claude Code.[14] Some discussions on platforms like Reddit pointed to early leaks on Anthropic's website, indicating high anticipation.[15]

Industry observers noted its competitive positioning against models like OpenAI's GPT-5, with Anthropic's rapid iteration cycle highlighted.[4] Some practitioner commentary pointed out that real-world coding performance can still vary by project and environment, a caveat common to code-capable LLMs.[16]

Comparison to Other Models

Claude Sonnet 4.5 is part of the broader Claude 4 family, which includes:

Claude Sonnet 4.5 vs. Prior Claude Models
Aspect Sonnet 4.5 (2025) Opus 4.1 / Prior (2024–2025)
Long-horizon autonomy ~30 hours continuous; ~11k LOC chat app[11] ~7 hours autonomous run[11]
Positioning "Best coding model" and strongest for complex agents[1][5] Earlier flagship/general models
OSWorld benchmark 61.4%[1] 42.2% (Sonnet 4)[1]
Safety improvements 60% improvement in alignment metrics[3] Baseline
Context window 200,000 tokens (1M in high-compute)[2] 200,000 tokens standard
Integrations at launch Anthropic platforms; AWS Bedrock; GitHub Copilot; Chrome extension[6][7] Anthropic platforms; ecosystem built over time

The model competes with other frontier models including GPT-5 from OpenAI and Gemini 2.5 Pro from Google DeepMind, often outperforming them on coding-specific benchmarks.[1]

Limitations

Despite its capabilities, Claude Sonnet 4.5 has several limitations:

  • Context window of 200,000 tokens may be insufficient for very large codebases[9]
  • The model cannot use browser storage APIs (localStorage, sessionStorage) in artifacts[1]
  • Extended thinking mode impacts prompt caching efficiency[5]
  • Knowledge cutoff date of January 2025 for training data[3]
  • Parameter count and detailed architecture remain undisclosed[3]

See also

References

  1. 1.00 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.10 1.11 1.12 1.13 1.14 1.15 1.16 1.17 1.18 1.19 1.20 1.21 1.22 1.23 1.24 1.25 1.26 1.27 1.28 1.29 1.30 1.31 1.32 1.33 1.34 1.35 1.36 1.37 1.38 1.39 1.40 1.41 1.42 1.43 1.44 1.45 1.46 1.47 1.48 1.49 Anthropic. (2025, September 29). "Introducing Claude Sonnet 4.5". Retrieved from https://www.anthropic.com/news/claude-sonnet-4-5
  2. 2.0 2.1 2.2 2.3 2.4 2.5 Anthropic. "Claude Sonnet 4.5". Retrieved 30 September 2025 from https://www.anthropic.com/claude/sonnet
  3. 3.00 3.01 3.02 3.03 3.04 3.05 3.06 3.07 3.08 3.09 3.10 3.11 Anthropic. (2025, September). "System Card: Claude Sonnet 4.5". Retrieved from https://www.anthropic.com/claude-sonnet-4-5-system-card
  4. 4.0 4.1 4.2 CNBC. (2025, September 29). "Anthropic launches Claude Sonnet 4.5, its latest AI model". Retrieved from https://www.cnbc.com/2025/09/29/anthropic-claude-ai-sonnet-4-5.html
  5. 5.0 5.1 5.2 5.3 5.4 5.5 Anthropic Docs. "What's new in Claude Sonnet 4.5". Retrieved 30 September 2025 from https://docs.claude.com/en/docs/about-claude/models/whats-new-sonnet-4-5
  6. 6.0 6.1 6.2 6.3 6.4 6.5 6.6 Amazon Web Services. (2025, September 29). "Introducing Claude Sonnet 4.5 in Amazon Bedrock: Anthropic's most intelligent model, best for coding and complex agents". AWS Blog. Retrieved from https://aws.amazon.com/blogs/aws/introducing-claude-sonnet-4-5-in-amazon-bedrock-anthropics-most-intelligent-model-best-for-coding-and-complex-agents/
  7. 7.0 7.1 7.2 GitHub. (2025, September 29). "Anthropic Claude Sonnet 4.5 is in public preview for GitHub Copilot". GitHub Changelog. Retrieved from https://github.blog/changelog/2025-09-29-anthropic-claude-sonnet-4-5-is-in-public-preview-for-github-copilot/
  8. 8.0 8.1 8.2 Njuguna, E. (2025, September). "Claude Sonnet 4.5 Shatters AI Benchmarks". Medium. Cite error: Invalid <ref> tag; name "medium-benchmarks" defined multiple times with different content
  9. 9.0 9.1 DataCamp. (2025, May 23). "Claude 4: Tests, Features, Access, Benchmarks & More". Cite error: Invalid <ref> tag; name "datacamp" defined multiple times with different content
  10. 10.0 10.1 10.2 10.3 TechCrunch. (2025, September 29). "Anthropic launches Claude Sonnet 4.5, its best AI model for coding". Cite error: Invalid <ref> tag; name "techcrunch" defined multiple times with different content
  11. 11.0 11.1 11.2 11.3 11.4 Jay Peters (29 September 2025). "Anthropic releases Claude Sonnet 4.5 in latest bid for AI agents and coding supremacy". The Verge. Retrieved 30 September 2025 from https://www.theverge.com/ai-artificial-intelligence/787524/anthropic-releases-claude-sonnet-4-5-in-latest-bid-for-ai-agents-and-coding-supremacy
  12. Axios (29 September 2025). "Anthropic's latest Claude model can work for 30 hours on its own". Retrieved 30 September 2025 from https://www.axios.com/2025/09/29/anthropic-claude-sonnet-coding-agent
  13. Derek B. Johnson (30 September 2025). "Anthropic touts safety, security improvements in Claude Sonnet 4.5". CyberScoop. Retrieved 30 September 2025 from https://cyberscoop.com/anthrophic-sonnet-4-5-security-safety-testing/
  14. Dan Shipper (29 September 2025). "Vibe Check: Claude Sonnet 4.5". Every. Retrieved 30 September 2025 from https://every.to/vibe-check/vibe-check-claude-sonnet-4-5
  15. Reddit (2025). "Claude Sonnet 4.5 leak on Anthropic website : r/ClaudeAI". Retrieved from https://www.reddit.com/r/ClaudeAI/comments/1ntme8h/claude_sonnet_45_leak_on_anthropic_website_w/
  16. The Economic Times (29 September 2025). "Claude Sonnet 4.5 launched by Anthropic: New features, upgrades, free access and more". Retrieved 30 September 2025 from https://m.economictimes.com/news/new-updates/claude-sonnet-4-5-launched-by-anthropic-new-features-upgrades-free-access-and-more/articleshow/124228124.cms

External links