Claude Opus 4.6: 1M Context Window GA — Developer Guide (March 2026)

I spend most of my day with Claude open in one terminal and a client's Shopify codebase in another. The single most annoying thing about that workflow used to be context. Halfway through a debugging session, Claude would forget the file it read twenty messages ago, and I'd be back to copy-pasting snippets or watching Claude Code compact the conversation and lose the thread.

That problem mostly went away yesterday. Anthropic shipped a full 1 million token context window for Claude Opus 4.6 and Sonnet 4.6 — at standard pricing, no long-context surcharge. It's 5x the old 200K limit, and for the kind of work I do it changes the actual mechanics of a session, not just the spec sheet.

5x the context, same per-token price

On March 13, 2026, Anthropic made the 1M context window generally available for both Opus 4.6 and Sonnet 4.6. The part that matters to anyone running a bill:

Opus 4.6: $5 input / $25 output per million tokens
Sonnet 4.6: $3 input / $15 output per million tokens

A 900K-token request costs the same per-token rate as a 9K one. No tiered pricing, no premium for going long. That's not how most providers do extended context — usually you pay 2-4x for the privilege. Here you don't.

It's live on:

Claude Platform (API)
Amazon Bedrock
Google Cloud Vertex AI
Microsoft Azure Foundry

No beta header needed anymore. If your code still sends one, it gets ignored. Nothing to change.

What a million tokens buys you in practice

The number is abstract until you map it to something. Roughly:

~750,000 words of text (10-15 novels)
A whole large codebase in one session
Thousands of pages of contracts, docs, research
Up to 600 images or PDF pages per request (was 100 — a 6x jump)
Maximum output of 128K tokens (double the old 64K)

For me the practical version is: I can drop an entire repo into the session — source, tests, config, docs — and have Claude reason across all of it at once. No chunking. No "summarize these ten files first." No re-reading the same file because it fell out of the window.

A blunt aside on when this actually helps

Here's the thing the announcement won't tell you: bigger context isn't free, and it isn't always smart to use it.

You're still paying input tokens on everything you load. Stuffing a 900K-token repo into context to fix a typo in one file is just lighting money on fire — you pay to process 900K tokens so Claude can edit three lines. For surgical work, a tight prompt with the two relevant files still wins on cost and usually on speed.

Where the big window earns its keep is messy, cross-cutting work where you genuinely don't know upfront which files matter: tracing a bug across a request lifecycle, refactoring something that touches twenty modules, reviewing a PR against the conventions in the rest of the codebase. The model can hold the whole picture and you skip the retrieval-engineering tax.

So my rule of thumb: scoped task → scoped context. Exploratory or cross-cutting task → load it all and let the model figure out what's relevant. If you've built a RAG pipeline to work around the old 200K ceiling, you don't have to rip it out — but the threshold where you actually need it just moved way up. For a lot of mid-size codebases, "load the whole thing" is now simpler and gives better answers than chunked retrieval.

It works on Pro and Max too

This isn't API-only. Claude Code users on Pro, Max, Team, and Enterprise plans get the full 1M window automatically on Opus 4.6.

What that means day to day:

Fewer compactions: Claude Code used to compress conversation history at the 200K limit. With 1M, sessions run 5x longer before any compaction kicks in.
Full session integrity: every tool call, code search, file read, and reasoning step stays in the window. No more "I forgot what I found earlier."
No extra credits: 1M context used to cost extra usage credits. Now it's in your standard plan allocation.

On Max ($100/month for 5x usage) or even Pro ($20/month), you already have it. Just pick Opus 4.6 as your model in Claude Code.

Adaptive thinking: 4 effort levels

Opus 4.6 drops the old on/off thinking toggle for four effort levels:

Low: lookups, formatting, quick answers — minimal thinking tokens
Medium: standard coding, writing, analysis — balanced
High (default): complex debugging, architecture, multi-step reasoning — more thinking tokens
Max: the hardest problems — novel algorithms, deep research — most thinking tokens

Why you should care: thinking tokens bill as output, $25/M on Opus 4.6. In an agentic system firing dozens of calls per task, the effort level is your main cost dial. I default to medium for anything routine — on simple operations the quality gap to high is negligible, and the savings compound fast across an automated pipeline. If you're building those pipelines, I've written up how I structure AI workflows in practice.

Context compaction: fixing "context rot"

Having 1M tokens doesn't help if the model gets dumber as the window fills. Anthropic calls that "context rot," and Opus 4.6 handles it with automatic compaction:

As a conversation nears the limit, the API picks out earlier portions it can summarize
Those get compressed while keeping the key information
The model carries on with the compacted context, staying accurate on recent and critical details

The numbers behind it:

MRCR v2 at 1M tokens: Opus 4.6 scores 78.3% — highest of any frontier model
For contrast: Sonnet 4.5 scored 18.5% on the same benchmark
That's a 4x jump in long-context accuracy

This is the part that actually matters. A big window with bad recall is a trap — it'll happily accept your million tokens and then lose track of half of them. Retrieval accuracy is the spec to watch, not the raw window size.

What teams are reporting

This isn't theoretical:

Debugging complex systems: engineers say Claude Code can now search Datadog logs, databases, and source in one session without losing context.
Legal and contract review: attorneys bring multiple rounds of a 100-page agreement into one session and see the full arc of a negotiation.
Scientific research: teams synthesize hundreds of papers, proofs, and codebases in a single pass.
Production incidents: with every signal and theory in view from first alert to fix, resolution gets faster.

One counterintuitive finding: some teams report that going from 200K to 500K+ actually reduced total token usage. The model wastes less overhead re-requesting information it already had, so it spends more of its budget on the task. That matches what I see — a session that holds full context thrashes less.

How to use it today

API users

Drop the beta header — nothing special needed past 200K
Bump max_tokens — you can request up to 128K output now
Raise media limits — up to 600 images/PDFs per request
Set thinking effort — add the effort level to your requests

Claude Code users

Pick Opus 4.6 (/model)
That's it — 1M context is automatic on Pro, Max, Team, Enterprise
Watch the compactions — you should see far fewer resets

Keeping costs sane

Use Sonnet 4.6 ($3/$15) for routine work — it gets 1M context too
Use Opus 4.6 ($5/$25) when accuracy is the point
Set thinking to medium for standard tasks, high only when it earns it
The gap between a 200K and 900K request is purely linear — no multiplier penalty, just real tokens you're paying for, so don't load context you won't use

Where this leaves everyone else

This puts the screws to every other provider. GPT-5.4 offers 128K. Gemini 3 Pro offers 2M but degrades noticeably at length. Anthropic's bet is plain: reliable long context at commodity pricing.

For agentic systems — tools that run on their own, make calls, work through multi-step problems — context size has always been the bottleneck. When an agent loses context mid-task, it either fails or produces garbage. A 1M window with 78.3% retrieval accuracy moves the line on what those agents can finish unattended. If you're wiring agents up to your own data and tools, the MCP servers I lean on for that pair well with the bigger window.

The honest takeaway: the era of context-management hacks isn't over, but it's shrinking. RAG pipelines, rolling summarization, window optimization — those were workarounds for a constraint that just got 5x looser. You'll still reach for them on truly massive corpora. For everything else, the answer is increasingly "just load it."

FAQ

Do I need to change my code to use the 1M window? No. The beta header is no longer required — if you're still sending one, it's ignored. Existing requests just get the larger window.

Does Sonnet 4.6 get 1M context too, or just Opus? Both. Sonnet 4.6 gets the full 1M window at $3/$15, which makes it the value pick for high-volume work that doesn't need Opus-level reasoning.

Is there a price premium for long requests? No. A 900K-token request costs the same per-token rate as a small one. Pricing is linear — you pay for the tokens you actually send, with no long-context multiplier.

Will loading my whole codebase make Claude slower or less accurate? Bigger inputs take longer to process and cost more, so don't load context you won't use. On accuracy, Opus 4.6's automatic compaction and 78.3% MRCR v2 score at 1M are what keep recall high as the window fills — but scope your context to the task and you'll get faster, cheaper answers.

Does this work on the Pro and Max plans? Yes. Pro, Max, Team, and Enterprise plans get the full 1M window automatically in Claude Code when you select Opus 4.6 — no extra usage credits required.

Claude Opus 4.6: 1M Context Window Goes GA — What Developers Need to Know

5x the context, same per-token price

What a million tokens buys you in practice

A blunt aside on when this actually helps

It works on Pro and Max too

Adaptive thinking: 4 effort levels

Context compaction: fixing "context rot"

What teams are reporting

How to use it today

API users

Claude Code users

Keeping costs sane

Where this leaves everyone else

FAQ

Want this built for you instead of DIY?

🛠️Generative AI Tools You Might Like

Share this article

📬 Get notified about new tools & tutorials

Comments (0)

Leave a Comment

Related Articles

Claude Fable 5 and Mythos 5 Access Suspended: My Honest Take

Claude Fable 5 and Mythos 5: Is the Model Everyone Feared Actually Worth the Hype?

Claude Design Uses Your Plan, Not a Separate Bucket