For those not trying, this allows Deepseek to understand a picture (instead of just extracting text from it), and it can describe what's in the picture, but this is not an image generation system, so you can't ask it to modify an image.
Personally, I'm a bit surprised the DS chat app still doesn't offer its own text to speech and speech to text features (I know DS doesn't have any ASR model for example, but there are quite a few in the open).
show comments
exabrial
The product I want most is the ability to return to the late January 2026 version of Anthropic models.
Could go nicely with https://auge.franzai.com/ ( CLI on Apple Vision frameworks ) - do the first pass locally. If needed call their API for a more detailed analysis and then _finally_ we produce meaningful alt texts for images in HTML at a reasonable price ;)
bjoli
What has been going on with deepseek recently? I have gotten lots of replies in Chinese and even more frequently, reasoning in Chinese as well.
Is it a new silent update?
show comments
mid90sahsan
The main thing here is, there are doing it really cheap!
show comments
Bnjoroge
I hope they bring it to their apis, especially v4flash. I find myself using mimo 2.5 more since it supports vision and makes it cheap for doing e2e tests with playwright or similar
innis226
Nice, is this available in the API now as well?
show comments
bhanu786
Direct competition to american companies like OpenAi, Anthropic proving china can also launch great models
tornikeo
I really need this as an API.
Turns out, to use Claude Agents SDK, you need to have a vision enabled API. If Deepseek API could see, it can fully drive Claude Code and Claude Agents SDK. A project I'm working on relies on a Claude-in-CloudflareWorker setup and I've been relying on Qwen and gemini flash lite, both more expensive than Deepseek.
Can't wait to have it available on deepseek.
show comments
holoduke
A bit of topic. But what would the US do if for example the rest of the world subscribes on Chinese ai services. I think the US would show some really nasty behavior.
show comments
throwaw12
I wish they published a post where we read about capabilities, quality, accuracy and other parameters
arjie
If they'd do one of those little extraneous additions like Qwen does, so that I can have DS4 Flash with Vision that would be great. I've got to run a separate model entirely so that I can get vision and I'd prefer to just put it all in one space.
show comments
insumanth
Multi-Modal is the way to go.
Deepmind nailed this a long back.
show comments
k_138z
I wonder what it has to say for the Tank Man image.
show comments
earth2mars
And it's really good and fast. Have tested with bunch of odd photos on what is happening. Overall the training set seems large enough to know what's what and where
show comments
crvdgc
Vision has been in A/B testing for a while now (at least in China). Is there an official announcement that this will be available for everyone?
show comments
vitorgrs
I already had it for months? What's the news here?
show comments
alexwwang
Does the api support vision yet?
show comments
tw1984
what is more interesting to me is why it takes so long for them to support vision.
does it implies that Liang believes vision/voice is less important on its way to AGI?
show comments
thiago_fm
Just wait until they release their coding model. Once they do an Opus-level coding model, the sandcastle of the AI economy in the US will fall
show comments
andrewstuart
OpenAI and Anthropic need to get this free foreign competition banned.
For those not trying, this allows Deepseek to understand a picture (instead of just extracting text from it), and it can describe what's in the picture, but this is not an image generation system, so you can't ask it to modify an image.
Personally, I'm a bit surprised the DS chat app still doesn't offer its own text to speech and speech to text features (I know DS doesn't have any ASR model for example, but there are quite a few in the open).
The product I want most is the ability to return to the late January 2026 version of Anthropic models.
Points to https://chat.deepseek.com/sign_in for me, that's just a login screen. Anything page with some info?
Could go nicely with https://auge.franzai.com/ ( CLI on Apple Vision frameworks ) - do the first pass locally. If needed call their API for a more detailed analysis and then _finally_ we produce meaningful alt texts for images in HTML at a reasonable price ;)
What has been going on with deepseek recently? I have gotten lots of replies in Chinese and even more frequently, reasoning in Chinese as well.
Is it a new silent update?
The main thing here is, there are doing it really cheap!
I hope they bring it to their apis, especially v4flash. I find myself using mimo 2.5 more since it supports vision and makes it cheap for doing e2e tests with playwright or similar
Nice, is this available in the API now as well?
Direct competition to american companies like OpenAi, Anthropic proving china can also launch great models
I really need this as an API.
Turns out, to use Claude Agents SDK, you need to have a vision enabled API. If Deepseek API could see, it can fully drive Claude Code and Claude Agents SDK. A project I'm working on relies on a Claude-in-CloudflareWorker setup and I've been relying on Qwen and gemini flash lite, both more expensive than Deepseek.
Can't wait to have it available on deepseek.
A bit of topic. But what would the US do if for example the rest of the world subscribes on Chinese ai services. I think the US would show some really nasty behavior.
I wish they published a post where we read about capabilities, quality, accuracy and other parameters
If they'd do one of those little extraneous additions like Qwen does, so that I can have DS4 Flash with Vision that would be great. I've got to run a separate model entirely so that I can get vision and I'd prefer to just put it all in one space.
Multi-Modal is the way to go. Deepmind nailed this a long back.
I wonder what it has to say for the Tank Man image.
And it's really good and fast. Have tested with bunch of odd photos on what is happening. Overall the training set seems large enough to know what's what and where
Vision has been in A/B testing for a while now (at least in China). Is there an official announcement that this will be available for everyone?
I already had it for months? What's the news here?
Does the api support vision yet?
what is more interesting to me is why it takes so long for them to support vision.
does it implies that Liang believes vision/voice is less important on its way to AGI?
Just wait until they release their coding model. Once they do an Opus-level coding model, the sandcastle of the AI economy in the US will fall
OpenAI and Anthropic need to get this free foreign competition banned.