pxtail

Recently after noticing how quickly limits are consumed and reading others complaints about same issue on reddit I was wondering how much about this is real error or bug hidden somewhere and how much it's about testing what threshold of constraining limits will be tolerated without cancelling accounts. Eventually, in case of "shit hits the fan" situation it can be always dismissed by waving hands and apologizing (or not) about some abstract "bug".

The lack of transparency and accountability behind all of this is incredible in my perception.

show comments
p1necone

I burn through the entire 5 hour limit in one or two "implement the feature outlined in this doc" requests with claude pro in a not even huge codebase (low tens of thousands of loc). If there were any reasonable alternatives I wouldn't even consider using it, but sonnet 4.6 (and presumably opus 4.6 - I don't use it as sonnet is faster and more than good enough) is the only model I've used that actually makes good decisions in complex codebases - anything else just gets stuck in the weeds and produces either non working code or tech debt (after churning for a long time).

I have seen more than one comment on this thread mentioning kimi though - I'll have to test it out.

qwen3-coder-next has been surprisingly capable as a local model too - needs to be used to make small changes where you know exactly what the final code should look like rather than implementing whole features, but it is free (except for the power bill).

dinakernel

This turned out to be a bug. https://x.com/om_patel5/status/2038754906715066444?s=20

One reddit user reverse engineered the binary and found that it was a cache invalidation issue.

They are doing some hidden string replacement if the claude code conversation talks about billing or tokens. Looks like that invalidates the cache at that point.

If that string appears anywhere in the conversation history, I think the starting text is replaced, your entire cache rebuilds from scratch.

So, nothing devious, just a bug.

show comments
midnightdiesel

It seems like Anthropic is constantly changing the rules and pulling out rugs, and always entirely by surprise. I’m not sure if they’re incompetent or just careless, but I stopped paying them because of this a while ago, and my days are much more interesting and enjoyable using my own brain instead.

show comments
p2hari

I cancelled my pro plan last month. I was using Claude as my daily driver. In fact had the API plan also and topped it with $20 more. So it was around $40 each month. Starting from December last year it has been like this. When sessions could last a couple of hours with some deep boilerplate and db queries etc. to architecture discussion and tool selection. Slowly the last two months it just gets over. One prompt and few discussions as to why this and not that and it is done.

show comments
aliljet

There's a weird 'token anxiety' you get on these platforms. And you basically don't know how much of this 'limit' you may consume at any time. And you actually don't even know what the 'limit' is or how it's calculated. So far, people have just assumed Anthropic will do the kind thing and give you more than you could ever use...

show comments
elephanlemon

Yesterday (pro plan) I ran one small conversation in which Claude did one set of three web searches, a very small conversation with no web search, and I added a single prompt to an existing long conversation. I was shocked to see after the last prompt that I had somehow hit my limit until 5:00pm. This account is not connected to an IDE or Code, super confusing.

show comments
robviren

I find Claude code to be a token hog. No matter how confidently the papers say context rot is not an issue I find curating context to be highly important to output quality. Manually managing this in the Claude Webui has helped with my use cases more than freely tossing Claude code at it. Likely I am using both "wrong" but the way I use it is easier for me to reason about and minimize context rot.

0xbadcafebee

I've found a lot of people are almost belligerently pro-Claude. They refuse to consider other providers or agents, and won't consider using any model than the latest Opus. The most common reasons I hear are 1) they don't want to use anything other than the greatest model, afraid that anything else would waste their time, 2) they believe their experience is that it's far better than anything else.

Even if you show them benchmarks that show another model equally as good if not better, they refuse to use it. My suspicion is they've convinced themselves that Opus must be the best, because of reputation and price. They might've used a different model and didn't have a good experience, making them double down.

I hope a research institution will perform an experiment. My hypothesis is that if you swapped out a couple similar state-of-the-art models, even changing the "class" of model (Sonnet <-> Opus, GPT 5.4 <-> Sonnet), the user won't be able to tell which is which. This would show that the experience is subjective, and that bias is informing their decision, rather than rationality.

It's like wine tasting experiments. People rate a $100 bottle of wine higher than a $10 bottle. But if they actually taste the same, you should be buying the $10 bottle. But people don't, because they believe the $100 bottle is better. In the AI case, the problem is people won't stop buying the expensive bottle, because they've convinced themselves they must use the more expensive bottle.

show comments
garrickvanburen

Considering: - Anthropic decides how much a token is worth. - Users have no visibility or ability to control in how many tokens a given response will burn.

This is the only expected answer. https://forstarters.substack.com/p/for-starters-59-on-credit...

1970-01-01

This has been verified as a bug. Naturally, people should see some refunds or discounts, but I expect there won't be anything for you unless you make a stink.

https://old.reddit.com/r/ClaudeCode/comments/1s7zg7h/investi...

show comments
kneel

I asked it to complete ONE task:

You've hit your limit · resets 2am (America/Los_Angeles)

I waited until the next day to ask it to do it again, and then:

You've hit your limit · resets 1pm (America/Los_Angeles)

At which point I just gave up

show comments
ZeroCool2u

I'm finishing my annual paid Pro Gemini plan, so I'm on the free plan for Claude and I asked one (1) single question, which admittedly was about a research plan, using the Sonnet 4.6 Extended thinking model and instantly hit my limit until 2 PM (it was around 8 or 9 AM).

Just a shockingly constrained service tier right now.

show comments
edbern

Yesterday asked claude to write up a simple plan adding very basic features to a project I'm working on and it took 20% of 5-hour pro plan limit. Then somehow Codex seems to be infinite. Is OpenAI just burning through way more cash or are they more efficient?

torginus

I dunno, but CC might give away tokens for cheaper, but when I used Opus as standalone in Cursor, I definitely get way more mileage out of a token.

Considering how much progress I made vs how much I paid, I couldn't make a scientific assessement, but it felt pretty close.

techgnosis

* Hardware will manage models more efficiently

* Models will manage tokens more efficiently

* Agents will manage models more efficiently

* Users will manage agents more efficiently

Why are we acting like technology is on pause?

sibtain1997

Faced this too. Tried https://github.com/rtk-ai/rtk to compress cli output but some commands started failing and the savings were minimal. Ended up just being more deliberate about context size instead of adding more tooling on top

reenorap

The only way AI will be profitable to companies like Anthropic or OpenAI is to make the cost $1000-2000/month or more for coding. Every programmer will be forced to pay for it because it's only a fraction of their salary (in the US anyway) and it's the only way the programmer will be competitive. Whether the company pays for it, or they pay for it themselves, it will need to be paid.

There's no other way that these companies can compete against the likes of Google, and Facebook unless they sell themselves to these companies. With AWS and GCP spending hundreds of billions of dollars per year, there's no way that Anthropic or OpenAI can continue competing unless they make an absurd amount of money and throw that at resources like their own datacenters, etc and they can't do that at $20/month.

show comments
jditu

Still on 2.1.87, exclusively Opus for coding — haven't hit this yet. Wondering if the bug is personal vs team plan specific?

I'm sure it's more complex, but why not improve internal implicit caching and pass the savings on? Presumably Anthropic already benefits from caching repeated prompt prefixes internally — just do that better, extend the TTL window, and let users benefit. Explicit caching stays for production use cases with semi-static prompts where you want control.

The current 5-min default TTL + 2x penalty for 1-hour cache feels punitive for an interactive coding tool.

pagecalm

Hit this myself recently, along with a bunch of overloaded errors. I think it's growing pains for where we are with AI right now.

As the tooling matures I think we'll see better support for mixing models — local and cloud, picking the right one for the task. Run the cheap stuff locally, use the expensive cloud models only when you actually need them. That would go a long way toward managing costs.

There's also the dependency risk people aren't talking about enough. These providers can change pricing whenever they want. A tool you've built your entire workflow around can become inaccessible overnight just because the economics shifted. It's the vendor lock-in problem all over again but with less predictability.

canada_dry

I hit my limit on the project I've been working on (after I let "MAX" run out and moved to "PRO") after about only 2 hours!

TIP (YMMV): I've found that moving the current code base into a new 'project' after a dozen or so turns helps as I suspect the regurgitation of the old conversations chews up tokens.

show comments
mszczodrak

I've been hitting the API limit errors over Claude CLI, yet the total usage was 0% on the claude.ai website. Changing the model fixed the problem.

stavros

Anthropic went about this in a really dishonest way. They had increased demand, fine, but their response was to ban third-party clients (clients they were fine with before), and to semi-quietly reduce limits while keeping the price the same.

Unilaterally changing the deal to give customers less for the same price should not be legal, but companies have slowly boiled the frog in such a way that now we just go "welp, it's corporations, what can you do", and forget that we actually used to have some semblance of justice in the olden days.

paulbjensen

I have found that:

- If I ask Claude to go and build a product idea out for me from scratch, it can get quite far, but then I will hit quota limits on the pro plan ($20pm).

- I have not drunk the Kool-aid and tried to indulge in ClaudeMaxxing (Max plan at $200pm). I need to sleep and touch grass from time to time.

- I don't bother with a Claude.md in my projects. I just raw-dog context.

- If I have a big codebase, and I'm very clear about what code changes I want to make Claude do, I can easily get a lot of changes made without getting near my quota. It's like Mr Miyagi making precision edits to that Bonsai Tree in Karate Kid.

My last bit of advice - use the tool, but don't let the tool use you.

delphic-frog

The token usage differs day to day - that's the most frustrating part. You can't effectively plan a development session if you aren't sure how far you'll likely get into a feature.

nitekode

This could also be because of the recently introduced 1 million token buffer. I also saw my tokens drain away quickly; then in noticed I was pushing 750k tokens through for every prompt :) Sometimes its hard to get into the habit of clearing

giancarlostoro

I'm guessing their newer models are taking way more compute than they can afford to give away. The biggest challenge of AI will eventually be, how to bring down how much compute a powerful model takes. I hope Claude puts more emphasis into making Haiku and Sonnet better, when I use them via JetBrains AI it feels like only Opus is good enough, for whatever odd reason.

show comments
wellthisisgreat

yeah this is crazy hitting limits on a non-constant usage of a Max plan?

anon7000

I think I ran into this yesterday, with Claude Code taking FOREVER on a lot of tasks. But using Claude within Cursor seems way faster

lukewarm707

please tell me if i'm crazy.

i just refuse to use openai/google/anthropic subscriptions, i only use open source models with ZDR tokens.

- i like privacy in my work, and i share when i wish. somehow we accepted that our prompts and work may be read and moderated by employees. would you accept people moderating what you write in excel, google docs, apple pages?

- i want a consistent tool, not something that is quantised one day, slow one day, a different harness one day, stops randomly.

- unless i am missing something, the closed source models are too slow for me to watch what they are doing. i feel comfortable with monitoring something, usually at about 200-300tps on GLM 5. above that it might even be too fast!

show comments
zackify

After using it all week on pro plan it worked fine for me. Hit limits a couple times.

But if I was doing deep coding on pro plan it would have sucked.

You can't expect to use massive context windows for $20

show comments
ryan42

claude automatically enabled "extra usage" on my pro account for me (I had it disabled) and the total got to $49 extra before I noticed. I sent an email asking wtf but I don't expect much.

Asmod4n

When asking it to write a http library which can decode/parse/encode all three versions of it the usage limit of the day gets hit with one sentence. In the pro plan. Even when you hand it a library which does hpack/huffmann.

aperture_hq

There is no transparent metrics on the token usage count, they just compare their plans with their plans.

sudo_and_pray

I gave claude code a try at home ($20 sub), since we use it at work without any limits and I wanted to see how I can use it on some of my projects.

It was a big disappointment and it just burned through tokens so fast that I hit first limit after 30 minutes while it was gathering info on my project and doing websearches.

My experience was that when I wanted to use it, maybe 2-3 days per week, Pro sub was not enough. On some days I did not use it at all. The daily or weekly token limit was really restrictive.

arvid-lind

well, they just had a promo with two weeks of double quota for everyone 18 hours of the day, even free users. of course it feels like we're getting rugpulled.

nprateem

I literally ran out of tokens on the antigravity top plan after 4 new questions the other day (opus). Total scam. Not impressed.

spongebobstoes

try codex, it's really good and doesn't have the same limits issues

jdefr89

Over reliance on LLMs is going to become such a disaster in a way no one would have thought possible. Not sure exactly what, who, when, or where.. Just that having your entire product or repo dependent on a single entity is going to lead to some bad times…

show comments
firebot

The first hit is free.

shafyy

What is the best way to get start with open weight models? And are they a good alternative to Claude Code?

show comments
raincole

Opus 4.6 price:

Input $5 / M tokens Output $25 / M tokens

GPT Codex 5.3:

Input $1.75 / M tokens Output $14 / M tokens

> Claude Code users hitting usage limits 'way faster than expected'

No shit, Sherlock.