Claude Opus 4.6: 1M Context Window Goes GA — What Developers Need to Know
Claude Opus 4.6 and Sonnet 4.6 now include the full 1M context window at standard pricing — no premium. 5x more context, 6x more media, 128K output, and it works on Pro, Max, Team, and Enterprise plans.
Yesterday, Anthropic quietly dropped one of the biggest upgrades in AI development tooling this year. Claude Opus 4.6 and Sonnet 4.6 now ship with a full 1 million token context window — at standard pricing, with no long-context premium. That's 5x the previous 200K limit, and it changes how developers build with AI.
Here's everything you need to know, from pricing to practical implementation.
TL;DR
- Claude Opus 4.6 and Sonnet 4.6 now have a 1M token context window (5x the previous 200K) at standard pricing — no premium.
- Pricing unchanged: Opus $5/$25 per million tokens, Sonnet $3/$15. A 900K request costs the same per-token rate as 9K.
- Works on Pro, Max, Team, and Enterprise subscriptions in Claude Code — no extra usage needed.
- 600 images/PDFs per request (6x increase), 128K max output (2x), 4 adaptive thinking levels.
- 78.3% retrieval accuracy at 1M tokens on MRCR v2 — best among all frontier models.
The Announcement: 5x Context, Same Price
On March 13, 2026, Anthropic announced that the 1M context window is now generally available for both Claude Opus 4.6 and Sonnet 4.6. The key detail that matters: there's no price multiplier.
Pricing breakdown:
- Opus 4.6: $5 input / $25 output per million tokens
- Sonnet 4.6: $3 input / $15 output per million tokens
A 900K-token request costs the same per-token rate as a 9K one. No tiered pricing, no "long context surcharge." This is a significant shift from how most providers handle extended context — typically charging 2-4x more for longer windows.
The 1M context window is available across:
- Claude Platform (API)
- Amazon Bedrock
- Google Cloud Vertex AI
- Microsoft Azure Foundry
No beta header is required anymore. If you were sending one, it's simply ignored. Zero code changes needed.
What 1M Tokens Actually Means
Numbers are abstract. Here's what a million tokens translates to in practice:
- ~750,000 words of text (roughly 10-15 full-length novels)
- An entire large codebase loaded in one session
- Thousands of pages of contracts, documentation, or research papers
- Up to 600 images or PDF pages per request (up from 100 — a 6x increase)
- Maximum output: 128K tokens (doubled from the previous 64K limit)
For developers, this means you can load an entire repository, all its documentation, test files, and configuration — and have the model reason across all of it simultaneously. No more chunking strategies, no more lossy summarization, no more "I lost context on that file from earlier."
Works with Claude Pro and Max Subscriptions
This isn't just an API-only feature. Claude Code users on Pro, Max, Team, and Enterprise plans get the full 1M context window automatically when using Opus 4.6.
What this means in practice:
- Fewer compactions: Claude Code previously had to compress conversation history when it hit the 200K limit. With 1M, your sessions run 5x longer before any context compression kicks in.
- Full conversation integrity: Every tool call, code search, file read, and intermediate reasoning step stays in the window. No more "I forgot what I found earlier."
- No extra usage required: 1M context previously required extra usage credits. Now it's included in your standard plan allocation.
If you're on the Max plan ($100/month for 5x usage) or even the Pro plan ($20/month), you're already getting this. Just select Opus 4.6 as your model in Claude Code.
Adaptive Thinking: 4 Effort Levels
Opus 4.6 replaces the old binary thinking toggle (on/off) with four granular effort levels:
- Low: Simple lookups, formatting, quick answers — minimal thinking tokens
- Medium: Standard coding, writing, analysis — balanced cost/quality
- High (default): Complex debugging, architecture, multi-step reasoning — more thinking tokens
- Max: Hardest problems — novel algorithms, deep research — highest thinking token usage
Why this matters: Thinking tokens are billed as output tokens at $25 per million for Opus 4.6. In agentic systems making dozens of API calls per task, the thinking effort level becomes your primary cost control lever.
Practical tip: Use "medium" for straightforward tasks. The quality difference between medium and high is minimal for simple operations, but the cost savings add up fast in automated pipelines.
Context Compaction: Solving "Context Rot"
Having 1M tokens available doesn't help if the model's performance degrades as the window fills up. Anthropic calls this problem "context rot" — and Opus 4.6 addresses it with automatic context compaction.
How it works:
- As a conversation approaches the context limit, the API automatically identifies earlier portions that can be summarized
- Those sections are compressed into a condensed state while preserving key information
- The model continues with the compacted context, maintaining accuracy on recent and critical details
Benchmark results:
- MRCR v2 at 1M tokens: Opus 4.6 scores 78.3% — the highest among all frontier models
- For comparison: Sonnet 4.5 scored just 18.5% on the same benchmark
- This represents a 4x improvement in long-context accuracy
Real-World Impact: What Developers Are Saying
The 1M context window isn't theoretical — teams are already reporting tangible improvements:
- Debugging complex systems: Engineers report that Claude Code can now search through Datadog logs, databases, and source code in a single session without losing context.
- Legal and contract review: Attorneys can bring multiple rounds of a 100-page agreement into one session, seeing the full arc of a negotiation.
- Scientific research: Research teams are synthesizing hundreds of papers, proofs, and codebases in a single pass.
- Production incidents: With every signal and working theory in view from first alert to remediation, teams report faster resolution.
One particularly interesting finding: some teams report that raising context from 200K to 500K+ actually reduced total token usage. The model spends less overhead on re-requesting information, so it focuses more efficiently on the actual task.
Developer Checklist: How to Use This Today
API Users
- Remove the beta header — no special headers needed for >200K tokens
- Update your max_tokens — you can now request up to 128K output tokens
- Adjust your media limits — up to 600 images/PDFs per request
- Set thinking effort — add thinking.budget_tokens and effort level to your requests
Claude Code Users
- Select Opus 4.6 as your model (/model command)
- You're done — 1M context is automatic for Pro, Max, Team, and Enterprise plans
- Monitor compactions — you should see significantly fewer context resets
Cost Optimization
- Use Sonnet 4.6 ($3/$15) for routine tasks — it also gets 1M context
- Use Opus 4.6 ($5/$25) for complex reasoning where accuracy matters
- Set thinking to medium for standard tasks, high only when needed
- The cost difference between a 200K and 900K request is purely linear — no multiplier penalty
What This Means for the AI Development Landscape
This move puts pressure on every other AI provider. OpenAI's GPT-5.4 offers 128K context; Google's Gemini 3 Pro offers 2M but with significant accuracy degradation at higher lengths. Anthropic's play is clear: reliable long context at commodity pricing.
For developers building agentic systems — AI tools that run autonomously, make tool calls, and work through multi-step problems — context window size has been the bottleneck. When your agent loses context mid-task, it either fails or produces degraded results. A reliable 1M window with 78.3% retrieval accuracy fundamentally changes what autonomous AI agents can accomplish.
The implication: the era of context management hacks is ending. RAG pipelines, context window optimization, rolling summarization — these engineering patterns were necessary workarounds for limited context. They're not going away overnight, but the threshold for when you need them just moved dramatically upward.
Frequently Asked Questions
Does the 1M context window work on the Claude Pro plan?
Yes. Claude Code users on Pro ($20/month), Max ($100/month), Team, and Enterprise plans all get the full 1M context window automatically when using Opus 4.6. No extra usage credits are required — it's included in your standard plan allocation.
How much does Claude Opus 4.6 cost per token?
Opus 4.6 costs $5 per million input tokens and $25 per million output tokens. Thinking tokens (used during reasoning) are billed as output tokens at the same $25 rate. Sonnet 4.6 is cheaper at $3/$15 per million tokens and also includes the 1M context window.
What is context compaction and how does it work?
Context compaction is Anthropic's solution to "context rot" — the problem where model performance degrades as the context window fills up. When a conversation approaches the limit, the API automatically summarizes earlier portions into a compressed state while preserving key information. This lets the model continue accurately without losing critical context from earlier in the session.
How many images can I send per request to Claude Opus 4.6?
Up to 600 images or PDF pages per request, a 6x increase from the previous limit of 100. This is available on Claude Platform natively, Microsoft Azure Foundry, and Google Cloud's Vertex AI.
Do I need to change my API code to use the 1M context window?
No. Requests over 200K tokens now work automatically — no beta header required. If you're already sending the beta header, it's ignored. The only changes you might want to make are updating your max_tokens (now up to 128K output) and adjusting media limits to take advantage of the 600 image/PDF cap.
How does Claude's 1M context compare to Gemini's 2M context window?
While Google's Gemini 3 Pro offers a larger 2M token window, the key difference is retrieval accuracy. Claude Opus 4.6 scores 78.3% on the MRCR v2 benchmark at 1M tokens — the highest among frontier models. Gemini's accuracy degrades significantly at higher context lengths. A smaller window with reliable recall often outperforms a larger window where the model loses track of information.
The Bottom Line
Claude Opus 4.6's 1M context GA isn't just a bigger number. It's a structural upgrade that eliminates one of the most frustrating constraints in AI-assisted development:
- 5x more context at the same per-token price
- 6x more media (600 images/PDFs per request)
- 2x more output (128K tokens)
- 4 thinking levels for cost optimization
- Works on Pro, Max, Team, and Enterprise subscriptions
- 78.3% retrieval accuracy at 1M tokens (best in class)
If you're building with Claude — whether through the API, Claude Code, or any of the cloud platforms — this is available right now. No waitlist, no beta, no premium pricing. Your move.
🛠️Generative AI Tools You Might Like
📬 Get notified about new tools & tutorials
No spam. Unsubscribe anytime.
Comments (0)
Leave a Comment
No comments yet. Be the first to share your thoughts!