But the user experience is tricky because if we aim for very low false positives the run time for this kind of workflows is too long, it's then hard to justify blocking PRs.
singingtoday
I'm interested in trying this.
We have our own internal automated review which has shown positive results, but I would love to drop it if I find something better.
Code review is currently our bottleneck, so any possibility of better automating it is welcome.
show comments
elpakal
At a kill s@@s hackathon at work, I was able to build something that
uses a node image
installs claude code
runs a /review-like command
puts inline comments to PR
deletes old comments when rerunning
OCR seems cool, but overkill, and I'm definitely not using Code Rabbit after their CEO was on here acting snobbish a while back.
Point being AI code review in Git** itself isn't hard to do and can add a lot of value quickly.
i did something like this, but somewhat in reverse. you are the one that reviews the code and you instruct AI what to do through code review comments: https://parley.cloudflavor.io.
thinking about it, it would be funny to first run alibaba's tool and then run parley after.
> After installation, the ocr command is available globally.
Wish they chose a different acronym...
sfortis
Is not working with gpt5.x models (Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.) which is hardcoded. I dont know why this is on the front page. My review-with-codex skill is working just fine, consuming my usage and not API tokens.
show comments
atestu
We've been using Coderabbit, great deal ($30/mo/dev flat) and finds a lot.
I also built a skill I call `/meta-review` that asks Codex, Cursor, and Gemini to review the code (I use Claude Code). It always finds little things claude & I missed.
Did it review the landing page for it? Because it looks broken on iOS.
nutifafa
this is a great tool, until you try reading the rule files, I had find a translator to make heads of it. given that it is CLI tool is great dev the tinker with it at no additional cost.
causal
I recently moved off Cursor's BugBot because it's no longer a flat $40, and I feel a little lost trying to find a viable alternative because there are so many and the pricing kind of sucks for all of them. Curious if anyone has a recommendation.
show comments
eranation
I wonder how they do against this benchmark (not that I vetted this benchmark... but still interesting to know...)
Thank you all for the interest in Open Code Review!
This project was incubated from an AI code review tool that has been widely used by developers inside Alibaba at scale. The reason we decided to open-source it is simple — we noticed that many developers in the community are either paying for similar tools or using skills to perform AI code reviews.
As someone who has done deep research in this space, I think skills are actually a great approach, and running them as sub-agents is an elegant way to reduce context pollution. That said, skills do come with inherent limitations from general-purpose agents — they can be hard to debug, hard to evaluate, and difficult to tune. That's why we rewrote our internal tool in Go as a CLI and open-sourced it. Our goal is simple: free, token-efficient, and better results — while being easy to integrate into agent frameworks like Claude Code and Codex.
Our Design Philosophy: Deterministic Engineering × Agent Hybrid
We believe the best code review system combines the reliability of engineering with the flexibility of AI.
Deterministic Engineering — for hard constraints
We use engineering logic (not LLMs) to handle the parts of code review that simply cannot go wrong:
Precise file filtering — Clearly defines which files need review and which should be excluded, ensuring no critical change is ever missed.
Intelligent file bundling — Groups related files into the same review unit (e.g., message_en.properties and message_zh.properties are packed together). Each bundle is handled as an independent sub-agent with isolated context — this divide-and-conquer strategy performs exceptionally well on large changesets and naturally supports concurrent review.
Fine-grained rule matching — Matches review rules based on file characteristics, keeping the model's attention focused and eliminating information noise from the start. Compared to pure LLM-driven rule guidance, template-engine-based rule matching produces more stable and predictable behavior.
Standalone location & reflection components — Independent comment localization and comment reflection modules systematically improve both the positional accuracy and content quality of AI feedback.
Agent — for dynamic decision making
We let the Agent shine where it truly excels — dynamic reasoning and context retrieval:
Scenario-optimized prompts — Deeply tuned prompt templates for code review scenarios, improving output quality while significantly reducing token consumption.
Curated scenario-specific toolset — Based on in-depth analysis of tool call traces from large-scale production data — including call frequency distribution, repeated invocation rates per tool, and the impact of adding new tools on overall call chains — we carefully selected and restructured the general-purpose agent toolset into a specialized toolkit that is more stable and predictable in code review scenarios.
Due to some internal dependencies and compliance requirements, a few features haven't been released publicly yet. But I believe as more external developers show interest in this tool, we'll accelerate the alignment between our internal and external versions.
Finally, a huge thank you to everyone following this project. We want it to keep getting better, and we hope to see more free, high-quality tools like this emerge from the community.
Ran it on a subset of 10 of the 50 PRs in this benchmark https://codereview.withmartian.com
- very good recall (~74%, e.g. found a lot of the golden issues)
- not so good precision (~12%, e.g. lots of false positives)
- the precision causes the F1 to tank (~20%, if this stays the same on the full 50 sample it would puts it almost last, even less than Kilo+Grok)
If you've codex what does it add over codex's default app? I am confused. Can't you simply ask codex in another tab to just do a code review?
Rule files are in https://github.com/alibaba/open-code-review/tree/main/intern... (in Chinese)
I like the pattern of making a dedicated cli/harness and just build a skill to teach coding agents to use it.
At $work we built a thorough workflow to do security reviews, which is a pure skill to simplify adoption https://www.synthesia.io/post/automating-code-security-revie...
But the user experience is tricky because if we aim for very low false positives the run time for this kind of workflows is too long, it's then hard to justify blocking PRs.
I'm interested in trying this.
We have our own internal automated review which has shown positive results, but I would love to drop it if I find something better.
Code review is currently our bottleneck, so any possibility of better automating it is welcome.
At a kill s@@s hackathon at work, I was able to build something that
uses a node image installs claude code runs a /review-like command puts inline comments to PR deletes old comments when rerunning
OCR seems cool, but overkill, and I'm definitely not using Code Rabbit after their CEO was on here acting snobbish a while back.
Point being AI code review in Git** itself isn't hard to do and can add a lot of value quickly.
A repo with the English translation of each of the rules files, using Google Translate: https://github.com/pramodbiligiri/open-code-review-rules.
The original rules files (in Chinese): https://github.com/alibaba/open-code-review/tree/main/intern...
how does it compare to the red hat ai code review?
https://gitlab.com/redhat/edge/ci-cd/ai-code-review
Has anyone experience with that one?
i did something like this, but somewhat in reverse. you are the one that reviews the code and you instruct AI what to do through code review comments: https://parley.cloudflavor.io.
thinking about it, it would be funny to first run alibaba's tool and then run parley after.
posted it here a few days ago: https://news.ycombinator.com/item?id=48369782 i guess with AI there are too many Show HN now, and i never got any type of feedback.
> After installation, the ocr command is available globally.
Wish they chose a different acronym...
Is not working with gpt5.x models (Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.) which is hardcoded. I dont know why this is on the front page. My review-with-codex skill is working just fine, consuming my usage and not API tokens.
We've been using Coderabbit, great deal ($30/mo/dev flat) and finds a lot.
I also built a skill I call `/meta-review` that asks Codex, Cursor, and Gemini to review the code (I use Claude Code). It always finds little things claude & I missed.
Coderabbit just came out with their own PR review UI that's great for big PRs, it groups files together etc. https://www.coderabbit.ai/blog/introducing-atlas-the-first-a...
Did it review the landing page for it? Because it looks broken on iOS.
this is a great tool, until you try reading the rule files, I had find a translator to make heads of it. given that it is CLI tool is great dev the tinker with it at no additional cost.
I recently moved off Cursor's BugBot because it's no longer a flat $40, and I feel a little lost trying to find a viable alternative because there are so many and the pricing kind of sucks for all of them. Curious if anyone has a recommendation.
I wonder how they do against this benchmark (not that I vetted this benchmark... but still interesting to know...)
https://codereview.withmartian.com
Not to be confused with Opencode the harness
Thank you all for the interest in Open Code Review!
This project was incubated from an AI code review tool that has been widely used by developers inside Alibaba at scale. The reason we decided to open-source it is simple — we noticed that many developers in the community are either paying for similar tools or using skills to perform AI code reviews.
As someone who has done deep research in this space, I think skills are actually a great approach, and running them as sub-agents is an elegant way to reduce context pollution. That said, skills do come with inherent limitations from general-purpose agents — they can be hard to debug, hard to evaluate, and difficult to tune. That's why we rewrote our internal tool in Go as a CLI and open-sourced it. Our goal is simple: free, token-efficient, and better results — while being easy to integrate into agent frameworks like Claude Code and Codex.
Our Design Philosophy: Deterministic Engineering × Agent Hybrid We believe the best code review system combines the reliability of engineering with the flexibility of AI.
Deterministic Engineering — for hard constraints
We use engineering logic (not LLMs) to handle the parts of code review that simply cannot go wrong:
Precise file filtering — Clearly defines which files need review and which should be excluded, ensuring no critical change is ever missed. Intelligent file bundling — Groups related files into the same review unit (e.g., message_en.properties and message_zh.properties are packed together). Each bundle is handled as an independent sub-agent with isolated context — this divide-and-conquer strategy performs exceptionally well on large changesets and naturally supports concurrent review. Fine-grained rule matching — Matches review rules based on file characteristics, keeping the model's attention focused and eliminating information noise from the start. Compared to pure LLM-driven rule guidance, template-engine-based rule matching produces more stable and predictable behavior. Standalone location & reflection components — Independent comment localization and comment reflection modules systematically improve both the positional accuracy and content quality of AI feedback. Agent — for dynamic decision making
We let the Agent shine where it truly excels — dynamic reasoning and context retrieval:
Scenario-optimized prompts — Deeply tuned prompt templates for code review scenarios, improving output quality while significantly reducing token consumption. Curated scenario-specific toolset — Based on in-depth analysis of tool call traces from large-scale production data — including call frequency distribution, repeated invocation rates per tool, and the impact of adding new tools on overall call chains — we carefully selected and restructured the general-purpose agent toolset into a specialized toolkit that is more stable and predictable in code review scenarios. Due to some internal dependencies and compliance requirements, a few features haven't been released publicly yet. But I believe as more external developers show interest in this tool, we'll accelerate the alignment between our internal and external versions.
Finally, a huge thank you to everyone following this project. We want it to keep getting better, and we hope to see more free, high-quality tools like this emerge from the community.