Company creates a benchmark. Same company is best in that benchmark.
Story as old as time.
show comments
mattvv
Some feedback for the team, looked at pricing page and saw it more expensive ($30/dev/mo) and highly limiting (20prs per month per user). We have devs putting up that many prs in a single day. With this kind of plan pretty much no way we would even try this product
show comments
esafak
I'm not as cynical as the others here; if there are no popular code review benchmarks why should they not design one?
> We believe that code review is not a narrow task; it encompasses many distinct responsibilities that happen at once. [...]
> Qodo 2.0 addresses this with a multi-agent expert review architecture. Instead of treating code review as a single, broad task, Qodo breaks it into focused responsibilities handled by specialized agents. Each agent is optimized for a specific type of analysis and operates with its own dedicated context, rather than competing for attention in a single pass. This allows Qodo to go deeper in each area without slowing reviews down.
> To keep feedback focused, Qodo includes a judge agent that evaluates findings across agents. The judge agent resolves conflicts, removes duplicates, and filters out low-signal results. Only issues that meet a high confidence and relevance threshold make it into the final review.
> Qodo’s agentic PR review extends context beyond the codebase by incorporating pull request history as a first-class signal.
show comments
mbesto
Cmd+F - "Overfitting"...nothing.
Nope, no mention of how they do anything to alleviate overfitting. These benchmarks are getting tiresome.
zhubert
I'm trying to bring a slightly different take to the pricing of ShipItAI (https://shipitai.dev, brazen plug). I've got a $5/mo/active dev + Bring Your Own Key option for those that want better price controls.
Still early in development and has a much simpler goal, but I like simple things that work well.
mohsen1
> Qodo takes a different approach by starting with real, merged PRs
Merged PRs being considered good code?
CuriouslyC
I don't think LLMs are the right tool for pattern enforcement in general, better to get them to create custom lint rules.
Agents are pretty good at suggesting ways to improve a piece of code though, if you get a bunch of agents to wear different hats and debate improvements to a piece of software it can produce some very useful insights.
mdeeks
I feel like pricing needs to be included here. I kind of don't care about 10 percentage points if the cost is dramatically higher. Cursor Bugbot is about the same cost but gives 10x the monthly quota of Qodo.
I know this is focused solely on performance, but cost is a major factor here.
logicx24
Where's the code for this? I'd love to run our tool, https://tachyon.so/, against it.
kachapopopow
coderabbit being the worst while (presumeably) advertising the most seems to be check out at least, wouldn't believe the recall % seems bogus.
aetherspawn
Your pricing page has a bug on it, the annual price is higher than the monthly price.
Company creates a benchmark. Same company is best in that benchmark.
Story as old as time.
Some feedback for the team, looked at pricing page and saw it more expensive ($30/dev/mo) and highly limiting (20prs per month per user). We have devs putting up that many prs in a single day. With this kind of plan pretty much no way we would even try this product
I'm not as cynical as the others here; if there are no popular code review benchmarks why should they not design one?
Apparently this is in support of their 2.0 release: https://www.qodo.ai/blog/introducing-qodo-2-0-agentic-code-r...
> We believe that code review is not a narrow task; it encompasses many distinct responsibilities that happen at once. [...]
> Qodo 2.0 addresses this with a multi-agent expert review architecture. Instead of treating code review as a single, broad task, Qodo breaks it into focused responsibilities handled by specialized agents. Each agent is optimized for a specific type of analysis and operates with its own dedicated context, rather than competing for attention in a single pass. This allows Qodo to go deeper in each area without slowing reviews down.
> To keep feedback focused, Qodo includes a judge agent that evaluates findings across agents. The judge agent resolves conflicts, removes duplicates, and filters out low-signal results. Only issues that meet a high confidence and relevance threshold make it into the final review.
> Qodo’s agentic PR review extends context beyond the codebase by incorporating pull request history as a first-class signal.
Cmd+F - "Overfitting"...nothing.
Nope, no mention of how they do anything to alleviate overfitting. These benchmarks are getting tiresome.
I'm trying to bring a slightly different take to the pricing of ShipItAI (https://shipitai.dev, brazen plug). I've got a $5/mo/active dev + Bring Your Own Key option for those that want better price controls.
Still early in development and has a much simpler goal, but I like simple things that work well.
> Qodo takes a different approach by starting with real, merged PRs
Merged PRs being considered good code?
I don't think LLMs are the right tool for pattern enforcement in general, better to get them to create custom lint rules.
Agents are pretty good at suggesting ways to improve a piece of code though, if you get a bunch of agents to wear different hats and debate improvements to a piece of software it can produce some very useful insights.
I feel like pricing needs to be included here. I kind of don't care about 10 percentage points if the cost is dramatically higher. Cursor Bugbot is about the same cost but gives 10x the monthly quota of Qodo.
I know this is focused solely on performance, but cost is a major factor here.
Where's the code for this? I'd love to run our tool, https://tachyon.so/, against it.
coderabbit being the worst while (presumeably) advertising the most seems to be check out at least, wouldn't believe the recall % seems bogus.
Your pricing page has a bug on it, the annual price is higher than the monthly price.