I have taken another look on these open models after the fiasco of Fable and GPT 5.6 this weekend and... GLM-5.2 truly is a good workhorse model for daily programming. I consider myself a heavy user of LLMs and a seasoned developer. A typical session for me with GPT is usually over a hundred dollars...
This weekend I programmed a matrix bot with encryption and a Rust agent with some tools. Because I need one and OpenClaw just felt... not what I wanted. Two days later and 20 dollars poorer I have what I need: a multimodal agent written in rust that has access to my homelab.
Nothing felt off with GLM. It did what I wanted, was fast, had a decent not very annoying personality and was much cheaper than Opus or GPT.
I used it unquantized through Fireworks, but there are multiple other providers too.
show comments
bArray
Apparently GLM 5.2 is 753B parameters [1], what kind of hardware are people using to run this locally?
> [...] beating Claude Code (32%) at roughly $0.17 per vulnerability found
Claude Code is an agent harness, not an LLM.
Claude is a brand (or group of LLMs), not an LLM.
show comments
himata4113
These numbers are seem pretty low compared to what I was able to achieve specifically around windows kernel, win32k<->win32u to be exact. It honestly wouldn't surprise me anymore if china started surpassing models that US makes public, at least in specific categories such as cyber.
GLM 5.2 is already capable enough to assist in self-training which is similar to what we saw happen with frontier models and they appear to be getting there at a significantly lower cost than openai/anthropic.
solenoid0937
GLM export controls incoming? I predict Commerce will force OpenRouter, HuggingFace to take some open models down within the next few months.
Not that it would make any sense.
show comments
g42gregory
If only the "cybersecurity" crowd were focused on patching the vulnerabilities.
Instead of shilling for the LLM providers.
show comments
theteapot
> Constant: the IDOR dataset (the same real, open-source applications we've used in prior research) ...
What we're they? Also, wouldn't one expect a more recently released coding agent (with a more recent knowledge cut off) to perform better because they have access to more knowledge about vulns in these OSS projects, and even possibly have knowledge of your own "prior research"?
veselin
Here, it appears they compare a single prompt "find IDOR", against a multi-agent system. However, one can also start far more sophisticated skills that spin up subagents and mostly do the same in Claude Code, Codex, OpenCode, Pi, etc.
Which I guess makes what semgrep sells obsolete. Unless they have built a pareto-optimal point in terms of capabilities and token usage maybe?
show comments
admax88qqq
> beats Claude in our Cyber Benchmarks
Beats which model in Claude? Whenever a "benchmark" doesn't put precise model numbers in their headlines I am immediately skeptical. Either they don't know the difference (bad) or they are benchmarking against weaker models (misleading, also bad).
It's like when studies say "AI is bad at X" and they used GPT-3.5 in current year.
I have taken another look on these open models after the fiasco of Fable and GPT 5.6 this weekend and... GLM-5.2 truly is a good workhorse model for daily programming. I consider myself a heavy user of LLMs and a seasoned developer. A typical session for me with GPT is usually over a hundred dollars...
This weekend I programmed a matrix bot with encryption and a Rust agent with some tools. Because I need one and OpenClaw just felt... not what I wanted. Two days later and 20 dollars poorer I have what I need: a multimodal agent written in rust that has access to my homelab.
Nothing felt off with GLM. It did what I wanted, was fast, had a decent not very annoying personality and was much cheaper than Opus or GPT.
I used it unquantized through Fireworks, but there are multiple other providers too.
Apparently GLM 5.2 is 753B parameters [1], what kind of hardware are people using to run this locally?
[1] https://huggingface.co/zai-org/GLM-5.2
> [...] beating Claude Code (32%) at roughly $0.17 per vulnerability found
Claude Code is an agent harness, not an LLM.
Claude is a brand (or group of LLMs), not an LLM.
These numbers are seem pretty low compared to what I was able to achieve specifically around windows kernel, win32k<->win32u to be exact. It honestly wouldn't surprise me anymore if china started surpassing models that US makes public, at least in specific categories such as cyber.
GLM 5.2 is already capable enough to assist in self-training which is similar to what we saw happen with frontier models and they appear to be getting there at a significantly lower cost than openai/anthropic.
GLM export controls incoming? I predict Commerce will force OpenRouter, HuggingFace to take some open models down within the next few months.
Not that it would make any sense.
If only the "cybersecurity" crowd were focused on patching the vulnerabilities.
Instead of shilling for the LLM providers.
> Constant: the IDOR dataset (the same real, open-source applications we've used in prior research) ...
What we're they? Also, wouldn't one expect a more recently released coding agent (with a more recent knowledge cut off) to perform better because they have access to more knowledge about vulns in these OSS projects, and even possibly have knowledge of your own "prior research"?
Here, it appears they compare a single prompt "find IDOR", against a multi-agent system. However, one can also start far more sophisticated skills that spin up subagents and mostly do the same in Claude Code, Codex, OpenCode, Pi, etc.
Which I guess makes what semgrep sells obsolete. Unless they have built a pareto-optimal point in terms of capabilities and token usage maybe?
> beats Claude in our Cyber Benchmarks
Beats which model in Claude? Whenever a "benchmark" doesn't put precise model numbers in their headlines I am immediately skeptical. Either they don't know the difference (bad) or they are benchmarking against weaker models (misleading, also bad).
It's like when studies say "AI is bad at X" and they used GPT-3.5 in current year.
You can launch GLM-5.2 in Opencode using Nemesis8: https://github.com/DeepBlueDynamics/nemesis8#nemesis-8
After installing, do a `n8 build` to build the image, then `n8 --danger --provider opencode interactive` to launch it in a container.
Signup for GLM-5.2 here: https://z.ai
It reads like an ad.
Secondly these are "just" IDORs, arguably the easiest class of vulnerabilities.
Thirdly it compares to GPT 5.5 and Opus 4.8.
No, we don't have Mythos at home.
Hopefully i get a macbook pro soon enough to run some small or medium sized LLMs
This is a joke right? I wouldn't install this in a sandbox.