1M context is now generally available for Opus 4.6 and Sonnet 4.6

1071 points446 commentsa day ago

jeremychone

Interesting, I’ve never needed 1M, or even 250k+ context. I’m usually under 100k per request.

About 80% of my code is AI-generated, with a controlled workflow using dev-chat.md and spec.md. I use Flash for code maps and auto-context, and GPT-4.5 or Opus for coding, all via API with a custom tool.

Gemini Pro and Flash have had 1M context for a long time, but even though I use Flash 3 a lot, and it’s awesome, I’ve never needed more than 200k.

For production coding, I use

- a code map strategy on a big repo. Per file: summary, when_to_use, public_types, public_functions. This is done per file and saved until the file changes. With a concurrency of 32, I can usually code-map a huge repo in minutes. (Typically Flash, cheap, fast, and with very good results)

- Then, auto context, but based on code lensing. Meaning auto context takes some globs that narrow the visibility of what the AI can see, and it uses the code map intersection to ask the AI for the proper files to put in context. (Typically Flash, cheap, relatively fast, and very good)

- Then, use a bigger model, GPT 5.4 or Opus 4.6, to do the work. At this point, context is typically between 30k and 80k max.

What I’ve found is that this process is surprisingly effective at getting a high-quality response in one shot. It keeps everything focused on what’s needed for the job.

Higher precision on the input typically leads to higher precision on the output. That’s still true with AI.

For context, 75% of my code is Rust, and the other 25% is TS/CSS for web UI.

Anyway, it’s always interesting to learn about different approaches. I’d love to understand the use case where 1M context is really useful.

show comments

dimitri-vs

The big change here is:

> Standard pricing now applies across the full 1M window for both models, with no long-context premium. Media limits expand to 600 images or PDF pages.

For Claude Code users this is huge - assuming coherence remains strong past 200k tok.

show comments

syntaxing

It’s interesting because my career went from doing higher level language (Python) to lower language (C++ and C). Opus and the like is amazing at Python, honestly sometimes better than me but it does do some really stupid architectural decisions occasionally. But when it comes to embedded stuff, it’s still like a junior engineer. Unsure if that will ever change but I wonder if it’s just the quality and availability of training data. This is why I find it hard to believe LLMs will replace hardware engineers anytime soon (I was a MechE for a decade).

show comments

convenwis

Is there a writeup anywhere on what this means for effective context? I think that many of us have found that even when the context window was 100k tokens the actual usable window was smaller than that. As you got closer to 100k performance degraded substantially. I'm assuming that is still true but what does the curve look like?

show comments

minimaxir

Claude Code 2.1.75 now no longer delineates between base Opus and 1M Opus: it's the same model. Oddly, I have Pro where the change supposedly only for Max+ but am still seeing this to be case.

EDIT: Don't think Pro has access to it, a typical prompt just hit the context limit.

The removal of extra pricing beyond 200k tokens may be Anthropic's salvo in the agent wars against GPT 5.4's 1M window and extra pricing for that.

show comments

wewewedxfgdf

The weirdest thing about Claude pricing is their 5X pricing plan is 5 times the cost of the previous plan.

Normally buying the bigger plan gives some sort of discount.

At Claude, it's just "5 times more usage 5 times more cost, there you go".

show comments

iandanforth

I'm very happy about this change. For long sessions with Claude it was always like a punch to the gut when a compaction came along. Codex/GPT-5.4 is better with compactions so I switched to that to avoid the pain of the model suddenly forgetting key aspects of the work and making the same dumb errors all over again. I'm excited to return to Claude as my daily driver!

anshumankmr

All while their usage limits are so excessively shitty that I paid them 50$ just two days back cause I ran out of usage and they still blocked from using it during a critical work week (and did not refund my 50$ despite my emails and requests and route me to s*ty AI bot.). Anyway, I am using Copilot and OpenCode a lot more these days which is much better.

show comments

Frannky

Opus 4.6 is nuts. Everything I throw at it works. Frontend, backend, algorithms—it does not matter.

I start with a PRD, ask for a step-by-step plan, and just execute on each step at a time. Sometimes ideas are dumb, but checking and guiding step by step helps it ship working things in hours.

It was also the first AI I felt, "Damn, this thing is smarter than me."

The other crazy thing is that with today's tech, these things can be made to work at 1k tokens/sec with multiple agents working at the same time, each at that speed.

show comments

jeff_antseed

the coherence question is the one that matters here. 1M tokens is not the same as actually using 1M tokens well.

we've been testing long-context in prod across a few models and the degradation isn't linear — there's something like a cliff somewhere around 600-700k where instruction following starts getting flaky and the model starts ignoring things it clearly "saw" earlier. its not about retrieval exactly, more like... it stops weighting distant context appropriately.

gemini's problems with loops and tool forgetting that someone mentioned are real. we see that too. whether claude actually handles the tail end of 1M coherently is the real question here, and "standard pricing with no long-context premium" doesn't answer it.

honestly the fact that they're shipping at standard pricing is more interesting to me than the window size itself. that suggests they've got the KV cache economics figured out, which is harder than it sounds.

show comments

PeterStuer

The thing that would get me more excited is how far they could push context coherence before the model loses track. I'm hoping 250k.

tariky

This is amazing. I have to test it with my reverse engineering workflow. I don't know how many people use CC for RE but it is really good at it.

Also it is really good for writing SketchUp plugins in ruby. It one shots plugins that are in some versions better then commercial one you can buy online.

CC will change development landscape so much in next year. It is exciting and terrifying in same time.

elophanto_agent

finally, enough context to fit my entire codebase AND my excuses for why it doesn't work

vessenes

This is super exciting. I've been poking at it today, and it definitely changes my workflow -- I feel like a full three or four hour parallel coding session with subagents is now generally fitting into a single master session.

The stats claim Opus at 1M is about like 5.4 at 256k -- these needle long context tests don't always go with quality reasoning ability sadly -- but this is still a significant improvement, and I haven't seen dramatic falloff in my tests, unlike q4 '25 models.

p.s. what's up with sonnet 4.5 getting comparatively better as context got longer?

show comments

jwilliams

I'm fairly sure that your best throughput is single-prompt single-shot runs with Claude (and that means no plan, no swarms, etc) -- just with a high degree of work in parallel.

So for me this is a pretty huge change as the ceiling on a single prompt just jumped considerably. I'm replaying some of my less effective prompts today to see the impact.

aragonite

Do long sessions also burn through token budgets much faster?

If the chat client is resending the whole conversation each turn, then once you're deep into a session every request already includes tens of thousands of tokens of prior context. So a message at 70k tokens into a conversation is much "heavier" than one at 2k (at least in terms of input tokens). Yes?

show comments

thebigspacefuck

I used this for a bit and I felt like it was slower and generally worse than using 200K with context compaction. Context compaction does lose some things though.

sailfast

This is great news. The 1M context is much easier to work with than compacting all the time and seems to perform and remember quite well despite the insane amount of data.

bob1029

I've been avoiding context beyond 100k tokens in general. The performance is simply terrible. There's no training data for a megabyte of your very particular context.

If you are really interested in deep NIAH tasks, external symbolic recursion and self-similar prompts+tools are a much bigger unlock than more context window. Recursion and (most) tools tend to be fairly deterministic processes.

I generally prohibit tool calling in the first stack frame of complex agents in order to preserve context window for the overall task and human interaction. Most of the nasty token consumption happens in brief, nested conversations that pass summaries back up the call stack.

jFriedensreich

My testing was extremely disappointing, this is not a context window that magically extends your breathing room for a conversation. I can tell blindly at this point when 150 - 200 k tokens are reached because the coding quality and coherence just drops by one or two generations. Its great for the case you really need a giant context for specific task but it changes nothing for needing to compact or handover at 200k.

pixelpoet

Compared to yesterday my Claude Max subscription burns usage like absolutely crazy (13% of weekly usage from fresh reset today with just a handful prompts on two new C++ projects, no deps) and has become unbearably slow (as in 1hr for a prompt response). GGWP Anthropic, it was great while it lasted but this isn't worth the hundreds of dollars.

show comments

k__

I heard, the middle of the context is often ignored.

Do long context windows make much sense then or is this just a way of getting people to use more tokens?

jmkozko

Do subscription users still need to tap into "extra usage" spending to go above 200K tokens?

AbstractH24

Am I crazy or wasn’t this announced like 2 weeks ago?

Or was that a different company or not GA. It’s all becoming a blur.

yubainu

1M is truly amazing. However, what is the incidence of hallucination? I haven't found a benchmark, but I feel that maintaining context at 1M would likely increase hallucination. Is there some kind of mechanism to suppress hallucination?

suheilaaita

This blew my mind the first i saw this. Another leap in AI that just swooshes by. In a couple of months, every model will be the same. Can't wait for IDEs like cursor and vs code to update their tooling to adap for this massive change in claude models.

LarsDu88

The stuff I built with Opus 4.6 in the past 2.5 weeks:

Full clone of Panel de Pon/Tetris attack with full P2P rollback online multiplayer: https://panel-panic.com

An emulator of the MOS 6502 CPU with visual display of the voltage going into the DIP package of the physical CPU: https://larsdu.github.io/Dippy6502/

I'm impressed as fuck, but a part of me deep down knows that I know fuck all about the 6502 or its assembly language and architecture, and now I'll probably never be motivated to do this project in a way that I would've learned all the tings I wanted to learn.

show comments

aenis

Sample of one and all that, but it's way, way more sloppy than it used to be for me.

To the extent, that I have started making manual fixes in the code - I haven't had to stoop to this in 2 months.

Max subscription, 100k LOC codebases more or less (frontend and backend - same observations).

margorczynski

What about response coherence with longer context? Usually in other models with such big windows I see the quality to rapidly drop as it gets past a certain point.

chaboud

Awesome.... With Sonnet 4.5, I had Cline soft trigger compaction at 400k (it wandered off into the weeds at 500k). But the stability of the 4.6 models is notable. I still think it pays to structure systems to be comprehensible in smaller contexts (smaller files, concise plans), but this is great.

(And, yeah, I'm all Claude Code these days...)

causalzap

I've been using Opus 4.5 for programmatic SEO and localizing game descriptions. If 4.6 truly improves context compaction, it could significantly lower the API costs for large-scale content generation. Has anyone tested its logic consistency on JSON output compared to 4.5?

show comments

mvrckhckr

I never get to more than 20% of the 1M context window, and it’s working great. (Have the same experience in Codex with 5.4.)

ionwake

Have we reached the point where its "normal" to mostly use AI to code? Im just wondering because Im sure it was less than a month ago when I said I havent coded manually for over 6 months and I had several comments about how my code must be terrible.

Im not butt hurt Im just wondering if the overton window has shifted yet.

vicchenai

The no-degradation-at-scale claim is the interesting part. Context rot has been the main thing limiting how useful long context actually is in practice — curious to see what independent evals show on retrieval consistency across the full 1M window.

show comments

ofisboy

i think it's buggy. i keep getting "compacting conversation" even though i restarted the cli. and i'm for sure not using 5 times more.

heraldgeezer

I feel like I'm the only one here using AI as just a chatbot for research, shopping, advice etc and for one off regex or bash/ps scripts... then again not a programmer so.

aarmenante

Hot take... the 1MM context degrades performance drastically.

show comments

arjie

This is fantastic. I keep having to save to memory with instructions and then tell it to restore to get anywhere on long running tasks.

fittingopposite

I don't get the announcement. Is this included in the standard 5 or 20x Max plans?

aliljet

Are there evals showing how this improves outputs?

show comments

johnwheeler

This is incredible. I just blew through $200 last night in a few hours on 1M context. This is like the best news I've heard all year in regards to my business.

What is OpenAIs response to this? Do they even have 1M context window or is it still opaque and "depends on the time of day"

show comments

8note

im guessing this is why the compacts have started sucking? i just finished getting me some nicer tools for manipulating the graph so i could compact less frequently, and fish out context from the prior session.

maybe itll still be useful, though i only have opus at 1M, not sonnet yet

thunkle

Just have to ask. Will I be spending way more money since my context window is getting so much bigger?

show comments

alienchow

If this is a skill issue, feel free to let me know. In general Claude Code is decent for tooling. Onduty fullstack tooling features that used to sit ignored in the on-caller ticket queue for months can now be easily built in 20 minutes with unit tests and integration tests. The code quality isn't always the best (although what's good code for humans may not be good code for agents) but that's another specific and directed prompt away to refactor.

However, I can't seem to get Opus 4.6 to wire up proper infrastructure. This is especially so if OSS forks are used. It trips up on arguments from the fork source, invents args that don't exist in either, and has a habit of tearing down entire clusters just to fix a Helm chart for "testing purposes". I've tried modifying the CLAUDE.md and SPEC.md with specific instructions on how to do things but it just goes off on a tangent and starts to negotiate on the specs. "I know you asked for help with figuring out the CNI configurations across 2 clusters but it's too complex. Can we just do single cluster?" The entire repository gets littered with random MD files everywhere for directory specific memories, context, action plans, deprecated action plans, pre-compaction memories etc. I don't quite know which to prune either. It has taken most of the fun out of software engineering and I'm now just an Obsidian janitor for what I can best describe as a "clueless junior engineer that never learns". When the auto compaction kicks in it's like an episode of 50 first dates.

Right now this is where I assume is the limitation because the literature for real-world infrastructure requiring large contexts and integration is very limited. If anyone has any idea if Claude Opus is suitable for such tasks, do give some suggestions.

throw03172019

Pentagon may switch to Claude knowing OpenAI has the premium rates for 1M context.

8cvor6j844qw_d6

Oh nice, does it mean less game of /compact, /clear, and updating CLAUDE.md with Claude Code?

show comments

swader999

I notice Claude steadily consuming less tokens, especially with tool calling every week too

efeecllk

finally. before 1m, i must speak 60k context for just telling the past chat and project

dkpk

Is this also applicable for usage in Claude web / mobile apps for chat?

zmmmmm

Noticed this just now - all of a sudden i have 1M context window (!!!) without changing anything. It's actually slightly disturbing because this IS a behavior change. Don't get me wrong, I like having longer context but we really need to pin down behaviour for how things are deployed.

show comments

vips7L

Friends, just write the code. It’s not that hard.

show comments

cubefox

> Standard pricing now applies across the full 1M window for both models, with no long-context premium.

Does that mean it's likely not a Transformer with quadratic attention, but some other kind of architecture, with linear time complexity in sequence length? That would be pretty interesting.

show comments

drcongo

Could be pure coincidence, but my Claude Code session last night was an absolute nightmare. It kept forgetting things it had done earlier in the session and why it had done them, messed up a git merge so badly that it lost the CLAUDE.md file along with a lot of other stuff, and then started running commands on the host machine instead of inside the container because it no longer had a CLAUDE.md to tell it not to. Last night was the first time I've ever sworn at it.

show comments

shanjai_raj7

are the costs the same as the 200k context opus 4.6?

compaction has been really good in claude we don't even recognize the switch

holoduke

I am currently mass translating millions of records with short descriptions. Somehow tokens are consumed extremely fast. I have 3 max memberships. And all 3 of them are hitting the 5 hour limit in about 5 to 10 minutes. Still don't understand why this is happening.

show comments

LoganDark

Finally, I don't have to constantly reload my Extra Usage balance when I already pay $200/mo for their most expensive plan. I can't believe they even did that. I couldn't use 1M context at all because I already pay $200/mo and it was going to ask me for even more.

Next step should be to allow fast mode to draw from the $200/mo usage balance. Again, I pay $200/mo, I should at least be able to send a single message without being asked to cough up more. (One message in fast mode costs a few dollars each) One would think $200/mo would give me any measure of ability to use their more expensive capabilities but it seems it's bucketed to only the capabilities that are offered to even free users.

show comments

dominotw

can someone tell me how to make this instruction work in claude code

"put high level description of the change you are making in log.md after every change"

works perfectly in codex but i just cant get calude to do it automatically. I always have to ask "did you update the log".

show comments

gaigalas

I'm getting close to my goal of fitting an entire bootstrappable-from-source system source code as context and just telling Claude "go ahead, make it better".

sergiotapia

maybe i'm thinking too small, or maybe it's because i've been using these ai systems since they were first launched, but it feels wrong to just saturate the hell out of the context, even if it can take 1 million tokens.

maybe i need to unlearn this habit?

show comments

alienbaby

is this the market played in front of our eyes slice by slice: ok, maybe not, but watching these entities duke it out is kinda amusing? There will be consequences but may as well sit it out for the ride, who knows where we are going?

nemo44x

Has anyone started a project to replace Linux yet?

show comments

jf___

there is a parallel between managing context windows and hard real-time system engineering.

A context window is a fixed-size memory region. It is allocated once, at conversation start, and cannot grow. Every token consumed — prompt, response, digression — advances a pointer through this region. There is no garbage collector. There is no virtual memory. When the space is exhausted, the system does not degrade gracefully: it faults.

This is not metaphor by loose resemblance. The structural constraints are isomorphic:

No dynamic allocation. In a hard realtime system, malloc() at runtime is forbidden — it fragments the heap and destroys predictability. In a conversation, raising an orthogonal topic mid-task is dynamic allocation. It fragments the semantic space. The transformer's attention mechanism must now maintain coherence across non-contiguous blocks of meaning, precisely analogous to cache misses over scattered memory.

No recursion. Recursion risks stack overflow and makes WCET analysis intractable. In a conversation, recursion is re-derivation: returning to re-explain, re-justify, or re-negotiate decisions already made. Each re-entry consumes tokens to reconstruct state that was already resolved. In realtime systems, loops are unrolled at compile time. In LLM work, dependencies should be resolved before the main execution phase.

Linear allocation only. The correct strategy in both domains is the bump allocator: advance monotonically through the available region. Never backtrack. Never interleave. The "brainstorm" pattern — a focused, single-pass traversal of a problem space — works precisely because it is a linear allocation discipline imposed on a conversation.

show comments