hirako2000

> Qwen 3.5 397B-A17B is a good comparison

It is not. It's a terrible comparison. Qwen, deepseek and other Chinese models are known for their 10x or even better efficiency compared to Anthropic's.

That's why the difference between open router prices and those official providers isn't that different. Plus who knows what open routed providers do in term quantization. They may be getting 100x better efficiency, thus the competitive price.

That being said not all users max out their plan, so it's not like each user costs anthropic 5,000 USD. The hemoragy would be so brutal they would be out of business in months

show comments
overrun11

A huge number of people are convinced that OpenAI and Anthropic are selling inference tokens at a loss despite the fact that there's no evidence this is true and a lot of evidence that it isn't. It's just become a meme uncritically regurgitated.

This sloppy Forbes article has polluted the epistemic environment because now theres a source to point to as "evidence."

So yes this post author's estimation isn't perfect but it is far more rigorous than the original Forbes article which doesn't appear to even understand the difference between Anthropic's API costs and its compute costs.

show comments
anonzzzies

I calculated only last weekend that my team would cost, if we would run Claude Code on retail API costs, around $200k/mo. We pay $1400/month in Max subscriptions. So that's $50k/user... But what tokens CC is reporting in their json -> a lot of this must be cached etc, so doubt it's anywhere near $50k cost, but not sure how to figure out what it would cost and I'm sure as hell not going to try.

show comments
readthemanual

I think the main issue I have with the article is that author whole argument is based on 'Qwen wouldn't run at a loss'. But why wouldn't it? Depsite it being a business, there might be a number of arguments why they decide to run without profit for now: from trying to expand the user base, to Chinese government sponsoring Chinese AI business.

show comments
osener

> Cost remains an ever present challenge. Cursor’s larger rivals are willing to subsidize aggressively. According to a person familiar with the company’s internal analysis, Cursor estimated last year that a $200-per-month Claude Code subscription could use up to $2,000 in compute, suggesting significant subsidization by Anthropic. Today, that subsidization appears to be even more aggressive, with that $200 plan able to consume about $5,000 in compute, according to a different person who has seen analyses on the company’s compute spend patterns.

This is the relevant quote from the original article.

eaglelamp

If Anthropic's compute is fully saturated then the Claude code power users do represent an opportunity cost to Anthropic much closer to $5,000 then $500.

Anthropic's models may be similar in parameter size to model's on open router, but none of the others are in the headlines nearly as much (especially recently) so the comparison is extremely flawed.

The argument in this article is like comparing the cost of a Rolex to a random brand of mechanical watch based on gear count.

show comments
maxdo

I'm using API directly for software developement, i'm on path to pay ~$5k this month per user, some less , some more, with daily use is just growing more and more.

ymaws

How confident are you in the opus 4.6 model size? I've always assumed it was a beefier model with more active params that Qwen397B (17B active on the forward pass)

show comments
zurfer

tldr: the author argues it is closer to costing 500 USD per month IF a user hits their weekly rate limits every week.

Which is probably a lot more correct than other claims. However it's also true that anybody who has to use the API might pay that much, creating a real cost per token moat for Anthropics Claude code vs other models as long as they are so far ahead in terms of productivity.

faangguyindia

Claude subscription is equivalant of spot instance

And APIs are on-demand service equivalant.

Priority is set to APIs and leftover compute is used by Subscription Plans.

When there is no capacity, subscriptions are routed to Highly Quantized cheaper models behind the scenes.

Selling subscription makes it cheaper to run such inference at scale otherwise many times your capacity is just sitting there idle.

Also, these subscription help you train your model further on predictable workflow (because the model creators also controls the Client like qwen code, claude code, anti gravity etc...)

This is probably why they will ban you for violating TOS that you cannot use their subscription service model with other tools.

They aren't just selling subscription, but the subscription cost also help them become better at the thing they are selling which is coding for coding models like Qwen, Claude etc...

I've used qwen code, codex and claude.

Codex is 2x better than Qwen code and Claude is 2x better than Codex.

So I'd hope the Claude Opus is atleast 4-5x more expensive to run than flagship Qwen Code model hosted by Alibaba.

show comments
himata4113

What people don't realize is that cache is *free*, well not free, but compared to the compute required to recompute it? Relatively free.

If you remove the cached token cost from pricing the overall api usage drops from around $5000 to $800 (or $200 per week) on the $200 max subscription. Still 4x cheaper over API, but not costing money either - if I had to guess it's break even as the compute is most likely going idle otherwise.

show comments
A7OM

“The article is right to separate compute cost from retail price — but the retail price baseline itself is arbitrary depending on where you run the model. The same capability (e.g. Llama 3.3 70B with tool calling and 128K context) runs $3.00/1M tokens at model developer list price and $0.22/1M at Fireworks AI — a 93% gap for identical specs. That spread makes any “it costs Anthropic X” estimate depend entirely on which reference price you anchor to. We track this live across 1,625 SKUs and 40+ vendors at a7om.com — the variance across the market is larger than most people realise when they back-calculate provider economics.”

z3ugma

This is such a well-written essay. Every line revealed the answer to the immediate question I had just thought of

show comments
jeff_antseed

the openrouter comparison is interesting because it shows what happens when you have actual supply-side competition. multiple providers, different quantizations, price competition. the spread between cheapest and priciest for the same model can be 3-5x.

anthropic doesn't have that. single provider, single pricing decision. whether or not $5k is accurate the more interesting question is what happens to inference pricing when the supply side is genuinely open. we're seeing hints of it with open router but its still intermediated

not saying this solves anthropic's cost problem, just that the "what does inference actually cost" question gets a lot more interesting when providers are competing directly

n_u

Good article! Small suggestions:

1. It would be nice to define terms like RSI or at least link to a definition.

2. I found the graph difficult to read. It's a computer font that is made to look hand-drawn and it's a bit low resolution. With some googling I'm guessing the words in parentheses are the clouds the model is running on. You could make that a bit more clear.

brianjeong

These margins are far greater than the ones Dario has indicated during many of his recent podcasts appearances.

show comments
ineedaj0b

What CC costs internally is not public. How efficient it is, is not public.

…You could take efficiency improvement rates from previous models releases (from x -> y) and assume; they have already made “improvements” internally. This is likely closer to what their real costs are.

functionmouse

Was anyone under the impression that it does? Serious question. I've never heard that, personally.

show comments
aurareturn

By the way, one of the charts in the article shows that Opus 4.6 is 10x costlier than Kimi K2.5.

I thought there was no moat in AI? Even being 10x costlier, Anthropic still doesn't have enough compute to meet demand.

Those "AI has no moat" opinions are going to be so wrong so soon.

show comments
hattmall

Is it fair to say the Open Router models aren't subsidized though? They make the case that companies on there are running a business, but there are free models, and companies with huge AI budgets that want to gather training data and show usage.

vbezhenar

Why does Claude charge 10x for API, compared to subscriptions? They're not a monopoly, so one would expect margins to be thinner.

show comments
vmykyt

I have very naive question:

People in comments have assumption that Atropic 10 times bigger than chinese models so calc cost is 10 times more.

But from perspective of Big O notation only a few algorithms gives you O(N). Majority high optimized things provide O(N*Log(N))

So what is big O for any open model for single request?

show comments
behehebd

Did anthropic do the oldest SaaS sales trick in the 2010s SaaS playbooks ;)

akhrail1996

The comparison with Qwen/Kimi by "comparable architecture size" is doing a lot of heavy lifting. Parameter count doesn't tell you much when the models aren't in the same league quality-wise.

I wonder if a better proxy would be comparing by capability level rather than size. The cost to go from "good" to "frontier" is probably exponential, not linear - so estimating Anthropic's real cost from what it takes to serve Qwen 397B seems off.

gmerc

Nobody gets RSI typing “iterate until tests pass”

show comments
scuff3d

This article is hilariously flawed, and it takes all of 5 seconds of research to see why.

Alibaba is the primary comparison point made by the author, but it's a completely unsuitable comparison. Alibab is closer to AWS then Anthropic in terms of their business model. They make money selling infrastructure, not on inference. It's entirely possible they see inference as a loss leader, and are willing to offer it at cost or below to drive people into the platform.

We also have absolutely no idea if it's anywhere near comparable to Opus 4.6. The author is guessing.

So the articles primary argument is based on a comparison to a company who has an entirely different business model running a model that the author is just making wild guesses about.

show comments
darkwater

Well, IDK, I have used CC with API billing pretty extensively and managed to spend ~$1000 in one month more or less. Moved to a Max 20x subscription and using it a bit less (I'm still scared) but not THAT less and I'm around 10% weekly usage. I'm not counting the tokens, though.

d--b

And on top of that, Anthropic does not run their own compute clusters do they? They probably get completely ripped by whoever is renting them the processors.

$200 worth of actual computation is an awful lot of computation.

lyu07282

What this doesn't mention is the "cost" to the public: the inevitable bailouts after it all comes crashing down again, the massive subsidies that Datacenters get from tax payers, the fresh water they consume, the electricity price hikes for everyone else, the noise, air and water pollution and the massive health impact on the surrounding population of every datacenter. The jobs that it destroys and the innocent people it kills through use of the technology in military targeting and autonomous weapons usage.

amelius

Tl;dr, their guesstimate:

> Anthropic is looking at approximately $500 in real compute cost for the heaviest users.

beepbooptheory

Ok but so it does cost Cursor $5k per power-Cursor user?? Still seems pretty rough..

show comments
fnord77

> I'm fairly confident the Forbes sources are confusing retail API prices with actual compute costs

Aren't they losing money on the retail API pricing, too?

> ... comparisons to artificially low priced Chinese providers...

Yeah, no this article does not pass the sniff test.

show comments
dr_dshiv

I easily go through two pro max $200/m accounts and yesterday got a third pro account when I ran out.

It’s worth it, but I know they aren’t making money on me. But, of course I’m marketing them constantly so…