The non-hallucination rate in AA-omniscience is SOTA, better than Opus 4.7, Gemini 3.1 Pro and GPT5.5! Congrats to the team
show comments
briga
I was getting dangerously close to my weekly Claude Code limit last night so I had Claude set up Qwen3.6 with llama.cpp and OpenCode. Honestly it's a great (free!) alternative to Claude Code--certainly more than good enough for a lot of smaller less complex tasks. I'm excited to try this new version. The fact that open-source models are so close to the frontier is very impressive.
show comments
tekacs
As they start to release more proprietary models, I so wish that they partnered with one of the major US hyperscalers to allow using these models through something US-domiciled.
Totally understand why it may not be reasonable or in their best interest (and that the US is _absolutely_ not doing the same reflexively). But it would be lovely to be able to try these out on production workloads in earnest.
show comments
slicktux
I just started messing with local LLMs and honestly I’m pretty impressed. I have a workstation laptop with an NVIDIA A1000 (6GB VRAM) and 96GB of RAM. I rarely used my gpu. Occasional CAD design or Machine Learning with OpenCV.
I ran llama3:latest and it ran pretty fast! I’m curious to see how Qwen would run on my system.
maxdo
No opus 4.7 , gpt5.5 , Gemini flash 3.5 in benchmarks
goyozi
These are very good numbers. I still don’t get why they don’t compare against latest competitor versions in these posts, it’s not like we’re all not going to notice.
show comments
tarruda
Looking forward to more open weight releases from Qwen, especially 122B and 397B.
show comments
flakiness
I'm using pi agent and love to try qwen models (hosted). What are the good options? The official provider doesn't include Alibaba. Is OpenRouter etc. fast enough?
(As a reference, DeepSeek v4 is severely throttled on these proxy services.)
show comments
ndom91
Is this one of those ones where they'll drop the huggingface release a week later? Or do we know for sure that this is staying proprietary?
show comments
eddyaipt
The pattern I trust most is adding a small verification artifact after every external action. Agents usually fail from silent state drift faster than from lack of reasoning depth.
show comments
jdw64
QWEN really hits the sweet spot
it's cheap, fast, and actually good.
eleventen
Checking openrouter (it's not available yet) and, uh, what's up with the spike in Qwen usage from early april here? https://openrouter.ai/qwen
Is this normal humans kicking the tires on a new model, or a few whales doing serious benchmarks?
show comments
bratao
It is super strange that all last (3?) releases they keep comparing older models such as Opus-4.6.
show comments
bsenftner
Any reports from people using their coding agent(s)?
show comments
XCSme
Any info on pricing and latency?
show comments
aliljet
Where can a user reasonably host this in an affordable way to access the local LLM revolution?
show comments
LAC-Tech
Trying to buy Qwen credits and get an API key is a challenge all in itself. So many site redirects.
hmaddipatla
The tokenomics and value for capability, context and latency look like they could deliver super competitive offer - what would it take for you to switch??
xiaoluolyg
congrats to qwen teams, remarkable
cft
Downloading this and cancelling Google Antigravity Pro at the same time:
I had a Google Pro account that I inherited from buying a Pixel 9 XL - it's free for a year after a flagship Pixel phone purchase. After a year they started charging for it, and i tolerated it, because Flash was usable in Antigravity for dumb auxiliary tasks that I did not want to waste GPT/Opus on. It had a separate generous quota from Gemini 3.1 Pro. Now with Flash 3.5 they combined the quotas with Pro, such that on a Google pro account you can work 4-5 hours per week in Flash. And by the way, 3.1 Pro is useless for programming, compared to Codex/Opus
show comments
indigodaddy
Is it multimodal/vision?
joshjob42
I really like what Qwen are doing, and a lot of these Chinese labs, but until I can ask their models what happened during the student protests in 1989 or why human rights groups are upset about the Uighurs and the model gives me a straight answer I'm just not able to trust these models with anything of substance.
show comments
esafak
Does anyone have experience with the Alibaba Cloud Model Studio that serves these qwen models?
howmayiannoyyou
I can't bring myself to use any model that trains or sends telemetry back to my country's primary competitor/adversary. I don't care how much money is saved.
show comments
dfansteel
Can anyone check its knowledge base for me? I’m honestly not able to run it and the Qwen models I can run censor information critical towards the Chinese government.
The non-hallucination rate in AA-omniscience is SOTA, better than Opus 4.7, Gemini 3.1 Pro and GPT5.5! Congrats to the team
I was getting dangerously close to my weekly Claude Code limit last night so I had Claude set up Qwen3.6 with llama.cpp and OpenCode. Honestly it's a great (free!) alternative to Claude Code--certainly more than good enough for a lot of smaller less complex tasks. I'm excited to try this new version. The fact that open-source models are so close to the frontier is very impressive.
As they start to release more proprietary models, I so wish that they partnered with one of the major US hyperscalers to allow using these models through something US-domiciled.
Totally understand why it may not be reasonable or in their best interest (and that the US is _absolutely_ not doing the same reflexively). But it would be lovely to be able to try these out on production workloads in earnest.
I just started messing with local LLMs and honestly I’m pretty impressed. I have a workstation laptop with an NVIDIA A1000 (6GB VRAM) and 96GB of RAM. I rarely used my gpu. Occasional CAD design or Machine Learning with OpenCV.
I ran llama3:latest and it ran pretty fast! I’m curious to see how Qwen would run on my system.
No opus 4.7 , gpt5.5 , Gemini flash 3.5 in benchmarks
These are very good numbers. I still don’t get why they don’t compare against latest competitor versions in these posts, it’s not like we’re all not going to notice.
Looking forward to more open weight releases from Qwen, especially 122B and 397B.
I'm using pi agent and love to try qwen models (hosted). What are the good options? The official provider doesn't include Alibaba. Is OpenRouter etc. fast enough?
(As a reference, DeepSeek v4 is severely throttled on these proxy services.)
Is this one of those ones where they'll drop the huggingface release a week later? Or do we know for sure that this is staying proprietary?
The pattern I trust most is adding a small verification artifact after every external action. Agents usually fail from silent state drift faster than from lack of reasoning depth.
QWEN really hits the sweet spot it's cheap, fast, and actually good.
Checking openrouter (it's not available yet) and, uh, what's up with the spike in Qwen usage from early april here? https://openrouter.ai/qwen
Is this normal humans kicking the tires on a new model, or a few whales doing serious benchmarks?
It is super strange that all last (3?) releases they keep comparing older models such as Opus-4.6.
Any reports from people using their coding agent(s)?
Any info on pricing and latency?
Where can a user reasonably host this in an affordable way to access the local LLM revolution?
Trying to buy Qwen credits and get an API key is a challenge all in itself. So many site redirects.
The tokenomics and value for capability, context and latency look like they could deliver super competitive offer - what would it take for you to switch??
congrats to qwen teams, remarkable
Downloading this and cancelling Google Antigravity Pro at the same time:
I had a Google Pro account that I inherited from buying a Pixel 9 XL - it's free for a year after a flagship Pixel phone purchase. After a year they started charging for it, and i tolerated it, because Flash was usable in Antigravity for dumb auxiliary tasks that I did not want to waste GPT/Opus on. It had a separate generous quota from Gemini 3.1 Pro. Now with Flash 3.5 they combined the quotas with Pro, such that on a Google pro account you can work 4-5 hours per week in Flash. And by the way, 3.1 Pro is useless for programming, compared to Codex/Opus
Is it multimodal/vision?
I really like what Qwen are doing, and a lot of these Chinese labs, but until I can ask their models what happened during the student protests in 1989 or why human rights groups are upset about the Uighurs and the model gives me a straight answer I'm just not able to trust these models with anything of substance.
Does anyone have experience with the Alibaba Cloud Model Studio that serves these qwen models?
I can't bring myself to use any model that trains or sends telemetry back to my country's primary competitor/adversary. I don't care how much money is saved.
Can anyone check its knowledge base for me? I’m honestly not able to run it and the Qwen models I can run censor information critical towards the Chinese government.
Tiananmen Square is the first place to start.