Pasting a big batch of new code and asking Claude "what have I forgotten? Where are the bugs?" is a very persuasive on-ramp for developers new to AI. It spots threading & distributed system bugs that would have taken hours to uncover before, and where there isn't any other easy tooling.
I bet there's loads of cryptocurrency implementations being pored over right now - actual money on the table.
show comments
userbinator
Not "hidden", but probably more like "no one bothered to look".
declares a 1024-byte owner ID, which is an unusually long but legal value for the owner ID.
When I'm designing protocols or writing code with variable-length elements, "what is the valid range of lengths?" is always at the front of my mind.
it uses a memory buffer that’s only 112 bytes. The denial message includes the owner ID, which can be up to 1024 bytes, bringing the total size of the message to 1056 bytes. The kernel writes 1056 bytes into a 112-byte buffer
This is something a lot of static analysers can easily find. Of course asking an LLM to "inspect all fixed-size buffers" may give you a bunch of hallucinations too, but could be a good starting point for further inspection.
show comments
altern8
Every time I read these titles, I wonder if people are for some reason pushing the narrative that Claude is way smarter than it really is, or if I'm using it wrong.
They want me to code AI-first, and the amount of hallucinations and weird bugs and inconsistencies that Claude produces is massive.
Lots of code that it pushes would NOT have passed a human/human code review 6 months ago.
show comments
DGAP
I replicated this experiment on several production codebases and got several crits. Lots of dupes, lots of false positives, lots of bugs that weren't actually exploitable, lots of accepted/ known risks. But also, crits!
Did he write an exploit for the NFS bug that runs via network over USB? Seems to be plugging in a SoC over USB...?
misiek08
Do not expect so many more reports. Expect so many more attacks ;)
dist-epoch
> "given enough eyeballs, all bugs are shallow"
Time to update that:
"given 1 million tokens context window, all bugs are shallow"
show comments
cesaref
I'm interested in the implications for the open source movement, specifically about security concerns. Anyone know is there has been a study about how well Claude Code works on closed source (but decompiled) source?
show comments
eichin
An explanation of the Claude Opus 4.6 linux kernel security findings as presented by Nicholas Carlini at unpromptedcon.
show comments
rixrax
I hope next up is the performance and bloat that the LLMs can try and improve.
Especially on perf side I would wager LLMs can go from meat sacks what ever works to how do I solve this with best available algorithm and architecture (that also follows some best practises).
jazz9k
This does sound great, but the cost of tokens will prevent most companies from using agents to secure their code.
show comments
skeeter2020
And with AI generating vulnerabilities at an accelerated pace this business is only getting bigger. Welcome to the new antivirus!
show comments
alsanan2
making public that AI is able of founding that kind of vulnerabilities is a big problem. In this case it's nice that the vulnerability has been closed before publishing but in case a cracker founds it, the result would be extremately different. This kind of news only open eyes for the crackers.
jason1cho
This isn't surprising. What is not mentioned is that Claude Code also found one thousand false positive bugs, which developers spent three months to rule out.
show comments
cookiengineer
> Nicholas has found hundreds more potential bugs in the Linux kernel, but the bottleneck to fixing them is the manual step of humans sorting through all of Claude’s findings
No, the problem is sorting out thousands of false positives from claude code's reports. 5 out of 1000+ reports to be valid is statistically worse than running a fuzzer on the codebase.
Just sayin'
show comments
up2isomorphism
But on the other hand, Claude might introduce more vulnerability than it discovered.
show comments
desireco42
A developer using Claude Code found this bug. Claude is a tool. It is used by developers. It should not sign commits. Neovim never tried to sign commits with me, nor Zed.
show comments
_pdp_
The title is a little misleading.
It was Opus 4.6 (the model). You could discover this with some other coding agent harness.
The other thing that bugs me and frankly I don't have the time to try it out myself, is that they did not compare to see if the same bug would have been found with GPT 5.4 or perhaps even an open source model.
Without that, and for the reasons I posted above, while I am sure this is not the intention, the post reads like an ad for claude code.
Pasting a big batch of new code and asking Claude "what have I forgotten? Where are the bugs?" is a very persuasive on-ramp for developers new to AI. It spots threading & distributed system bugs that would have taken hours to uncover before, and where there isn't any other easy tooling.
I bet there's loads of cryptocurrency implementations being pored over right now - actual money on the table.
Not "hidden", but probably more like "no one bothered to look".
declares a 1024-byte owner ID, which is an unusually long but legal value for the owner ID.
When I'm designing protocols or writing code with variable-length elements, "what is the valid range of lengths?" is always at the front of my mind.
it uses a memory buffer that’s only 112 bytes. The denial message includes the owner ID, which can be up to 1024 bytes, bringing the total size of the message to 1056 bytes. The kernel writes 1056 bytes into a 112-byte buffer
This is something a lot of static analysers can easily find. Of course asking an LLM to "inspect all fixed-size buffers" may give you a bunch of hallucinations too, but could be a good starting point for further inspection.
Every time I read these titles, I wonder if people are for some reason pushing the narrative that Claude is way smarter than it really is, or if I'm using it wrong.
They want me to code AI-first, and the amount of hallucinations and weird bugs and inconsistencies that Claude produces is massive.
Lots of code that it pushes would NOT have passed a human/human code review 6 months ago.
I replicated this experiment on several production codebases and got several crits. Lots of dupes, lots of false positives, lots of bugs that weren't actually exploitable, lots of accepted/ known risks. But also, crits!
Interestingly, I think 3 or 4 out of the 5 bugs would have been prevented / mitigated quite well using https://github.com/anthraxx/linux-hardened patches...
(disabled io_uring, would have crashed the kernel on UAF, and made exploitation of the heap overflow very unreliable)
Those 3 letter agencies are going to see their stash of 0-days dwindle so hard.
Related work from our security lab:
Stream of vulnerabilities discovered using security agents (23 so far this year): https://securitylab.github.com/ai-agents/
Taskflow harness to run (on your own terms): https://github.blog/security/how-to-scan-for-vulnerabilities...
I wonder about the "video running in the background" during qna of the talk:
https://youtu.be/1sd26pWhfmg?is=XLJX9gg0Zm1BKl_5
Did he write an exploit for the NFS bug that runs via network over USB? Seems to be plugging in a SoC over USB...?
Do not expect so many more reports. Expect so many more attacks ;)
> "given enough eyeballs, all bugs are shallow"
Time to update that:
"given 1 million tokens context window, all bugs are shallow"
I'm interested in the implications for the open source movement, specifically about security concerns. Anyone know is there has been a study about how well Claude Code works on closed source (but decompiled) source?
An explanation of the Claude Opus 4.6 linux kernel security findings as presented by Nicholas Carlini at unpromptedcon.
I hope next up is the performance and bloat that the LLMs can try and improve.
Especially on perf side I would wager LLMs can go from meat sacks what ever works to how do I solve this with best available algorithm and architecture (that also follows some best practises).
This does sound great, but the cost of tokens will prevent most companies from using agents to secure their code.
And with AI generating vulnerabilities at an accelerated pace this business is only getting bigger. Welcome to the new antivirus!
making public that AI is able of founding that kind of vulnerabilities is a big problem. In this case it's nice that the vulnerability has been closed before publishing but in case a cracker founds it, the result would be extremately different. This kind of news only open eyes for the crackers.
This isn't surprising. What is not mentioned is that Claude Code also found one thousand false positive bugs, which developers spent three months to rule out.
> Nicholas has found hundreds more potential bugs in the Linux kernel, but the bottleneck to fixing them is the manual step of humans sorting through all of Claude’s findings
No, the problem is sorting out thousands of false positives from claude code's reports. 5 out of 1000+ reports to be valid is statistically worse than running a fuzzer on the codebase.
Just sayin'
But on the other hand, Claude might introduce more vulnerability than it discovered.
A developer using Claude Code found this bug. Claude is a tool. It is used by developers. It should not sign commits. Neovim never tried to sign commits with me, nor Zed.
The title is a little misleading.
It was Opus 4.6 (the model). You could discover this with some other coding agent harness.
The other thing that bugs me and frankly I don't have the time to try it out myself, is that they did not compare to see if the same bug would have been found with GPT 5.4 or perhaps even an open source model.
Without that, and for the reasons I posted above, while I am sure this is not the intention, the post reads like an ad for claude code.
no hecking wayyy!!!! claude chud code!!!