DeepSeek continues to not only push the boundaries but also publish these incredible papers explaining how they achieved their gains - something the American labs no longer do unfortunately. Chinese labs are doing the most interesting work in AI right now.
show comments
kamranjon
The hugging face models are already up and seem to be the original models with the speculative decoding module built in which is very cool:
Excited to see if this makes it into DwarfStar for local inference, have been using the flash model extensively since the 2-bit quants were made available by antirez.
show comments
StizzurpXDD
DeepSeek is, as I feel currently, the sole AI company which is actually trying to innovate rather than top mere benchmarks. Others like OpenAI, Anthropic and Google are mostly just competeing with each rather than keep innovating around the clock.
show comments
piterrro
I’ve been using DeepSeek v4 pro for a month now in Kilo Code and its great. Fast, reliable, large context window and cheap as… Did 1,5B tokens this month and cost me 40usd (majority cached, but still).
Guessing the timing isn't accidental. Demonstrated openness vs harsh regulation
show comments
articlepan
Title is bad, it's the first line of the abstract instead of the paper title. Speculative decoding for LLM inference was published in 2022: https://arxiv.org/abs/2211.17192
This paper seems to be an improvement to speculative decoding but I haven't read it yet.
ricardobeat
Presumably this has been in production for a while, and is one of the reasons they were able to dramatically lower prices a month ago?
show comments
Jackobrien
I see a world soon where there’s an extremely wide variety of small models for speculative decoding, unique to use cases, companies, and even individuals.
show comments
segmondy
As we can see again, this has nothing to do with distillation, yet for every gain Chinese labs make, the US labs will accuse them of theft. Yet they are constantly innovating.
pokot0
I am wondering if this is why they can offer their pro model at ~1/4th of the price compared to the other providers offering the same model, and if other providers will be able to do the same in a short timeframe.
show comments
zftnb666
AI making AI faster. Next up: AI writing papers about how AI makes AI faster
lelanthran
These companies providing tokens, whether SOTA or not, that want to IPO are so fucked as time goes on.
Can't sell their SOTA models, only slightly better than the open source models for the models they can sell, cost 20x to 50x for good models, a TAM that consists almost solely of developers, with no customer of theirs actually boasting increased profits as a result of AI...
I fear their time to IPO may have passed.
show comments
danielabinav160
Would love to see these numbers reproduced on consumer GPUs, not just A100s.
show comments
rvz
This is just one of many papers DeepSeek have released to be able to serve models at extremely cheap prices, unlike the others taking on >$100B+ of debt in building data centers for the same thing.
> As with V4-Flash, we treat this point as an indication that DSpark sustains useful
throughput under an interactivity target that the baseline cannot efficiently support. At matched system capacities, DSpark delivers 57% to 78% faster per-user generation.
Reminds me of the flawed solution in scaling servers in 2017 that use memory-intensive technologies by adding even more servers to solve the problem. (It just increases costs.)
Rather than doing that, think about which critical parts of your app can be written in a more performant technology.
Fast forward to 2026, now you can see who is just throwing more money at the problem to create even more problems where as DeepSeek is giving us optimized solutions.
I know exactly who I would pay attention to, and it is absolutely not Anthropic.
show comments
wg0
That's why I pay them. Regularly. Without fail. Despite my token usage isn't that much.
But I vote for these heroes with my wallet. Just yesterday did again.
show comments
bflesch
At this point why can't someone produce a fridge or container-sized AI appliance based on legacy chips (12nm)? I imagine this would cover 80% of corporate use cases where you need to "google-in-a-box" functionality.
The state-of-the-art nanometer are impossible to achieve but if you have infinite solar energy during business hours does it really matter? Every company has a parking spot so this ASIC-like appliance could be as big as a shipping container.
If it could just run recent open models for a handful of users it would be such a nobrainer to buy.
show comments
lightedman
Anyone want to bet that much like speculative execution, speculative decoding is going to introduce a whole slew of vulnerabilities in the ways LLMs work?
2838383838
Must be wonderful to be on the board of OpenAi et al & their PE investors whilst China keeps blowing up these mines under their feet lmao.
Luckily Korean pension funds will buy all the trash as usual but goddamn you gotta start moving quick or you are gonna need some serious AGI to show you how to offload those bonds
show comments
preetham_rangu
do they use their OCR, or someone else?
einrealist
Yet another band aid.
playorizaya
Meanwhile OpenAI is drafting an “open letter” to Congress /s
OpenAI and Anthropic are doing nothing interesting.
Basically forgot about them 2 years ago.
I don’t use DeepSeek either but at least they do interesting stuff - they were the first to do “thinking” iirc
DeepSeek continues to not only push the boundaries but also publish these incredible papers explaining how they achieved their gains - something the American labs no longer do unfortunately. Chinese labs are doing the most interesting work in AI right now.
The hugging face models are already up and seem to be the original models with the speculative decoding module built in which is very cool:
Flash: https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash-DSpark
Pro: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-DSpark
Excited to see if this makes it into DwarfStar for local inference, have been using the flash model extensively since the 2-bit quants were made available by antirez.
DeepSeek is, as I feel currently, the sole AI company which is actually trying to innovate rather than top mere benchmarks. Others like OpenAI, Anthropic and Google are mostly just competeing with each rather than keep innovating around the clock.
I’ve been using DeepSeek v4 pro for a month now in Kilo Code and its great. Fast, reliable, large context window and cheap as… Did 1,5B tokens this month and cost me 40usd (majority cached, but still).
Is this newer/better than the speculative decoding from 2022? https://arxiv.org/abs/2211.17192
Nice.
Guessing the timing isn't accidental. Demonstrated openness vs harsh regulation
Title is bad, it's the first line of the abstract instead of the paper title. Speculative decoding for LLM inference was published in 2022: https://arxiv.org/abs/2211.17192
This paper seems to be an improvement to speculative decoding but I haven't read it yet.
Presumably this has been in production for a while, and is one of the reasons they were able to dramatically lower prices a month ago?
I see a world soon where there’s an extremely wide variety of small models for speculative decoding, unique to use cases, companies, and even individuals.
As we can see again, this has nothing to do with distillation, yet for every gain Chinese labs make, the US labs will accuse them of theft. Yet they are constantly innovating.
I am wondering if this is why they can offer their pro model at ~1/4th of the price compared to the other providers offering the same model, and if other providers will be able to do the same in a short timeframe.
AI making AI faster. Next up: AI writing papers about how AI makes AI faster
These companies providing tokens, whether SOTA or not, that want to IPO are so fucked as time goes on.
Can't sell their SOTA models, only slightly better than the open source models for the models they can sell, cost 20x to 50x for good models, a TAM that consists almost solely of developers, with no customer of theirs actually boasting increased profits as a result of AI...
I fear their time to IPO may have passed.
Would love to see these numbers reproduced on consumer GPUs, not just A100s.
This is just one of many papers DeepSeek have released to be able to serve models at extremely cheap prices, unlike the others taking on >$100B+ of debt in building data centers for the same thing.
> As with V4-Flash, we treat this point as an indication that DSpark sustains useful throughput under an interactivity target that the baseline cannot efficiently support. At matched system capacities, DSpark delivers 57% to 78% faster per-user generation.
Reminds me of the flawed solution in scaling servers in 2017 that use memory-intensive technologies by adding even more servers to solve the problem. (It just increases costs.)
Rather than doing that, think about which critical parts of your app can be written in a more performant technology.
Fast forward to 2026, now you can see who is just throwing more money at the problem to create even more problems where as DeepSeek is giving us optimized solutions.
I know exactly who I would pay attention to, and it is absolutely not Anthropic.
That's why I pay them. Regularly. Without fail. Despite my token usage isn't that much.
But I vote for these heroes with my wallet. Just yesterday did again.
At this point why can't someone produce a fridge or container-sized AI appliance based on legacy chips (12nm)? I imagine this would cover 80% of corporate use cases where you need to "google-in-a-box" functionality.
The state-of-the-art nanometer are impossible to achieve but if you have infinite solar energy during business hours does it really matter? Every company has a parking spot so this ASIC-like appliance could be as big as a shipping container.
If it could just run recent open models for a handful of users it would be such a nobrainer to buy.
Anyone want to bet that much like speculative execution, speculative decoding is going to introduce a whole slew of vulnerabilities in the ways LLMs work?
Must be wonderful to be on the board of OpenAi et al & their PE investors whilst China keeps blowing up these mines under their feet lmao. Luckily Korean pension funds will buy all the trash as usual but goddamn you gotta start moving quick or you are gonna need some serious AGI to show you how to offload those bonds
do they use their OCR, or someone else?
Yet another band aid.
Meanwhile OpenAI is drafting an “open letter” to Congress /s
OpenAI and Anthropic are doing nothing interesting.
Basically forgot about them 2 years ago.
I don’t use DeepSeek either but at least they do interesting stuff - they were the first to do “thinking” iirc