If you’re just looking for the TDD part - https://github.com/nizos/tdd-guard - is the only project I’ve come across that actually enforces it with hooks and blocks edits rather than relying on a prompt that gets context rotted away.
Deeds67
To be honest, the official superpowers/brainstorming skill already does TDD so well, I don't see that much of a need for this. TDD is definitely the way to go with agentic development.
show comments
shruubi
Two questions
1) Do you not feel self-conscious or weird about calling this "EvanFlow"? Seems like a lot of people these days are naming their AI tools/skills/whatever after themselves which seems self-absorbed. Either that or they hope that if their thing takes off like OpenClaw did then they'll grab the fame that comes along with it.
2) Why does your TDD flow miss the refactor step of TDD?
show comments
s20n
EvanFlow - thoughts arrive like butterflies?
show comments
dmitry_dv
The refactor step is the silent casualty in AI-assisted TDD. Once the test is green, Claude optimizes for moving to the next test, not for cleaning up the impl that just passed. An "iterate-until-clean" pass at the end is a different thing: you're refactoring cold code, not refactoring with a freshly-written test as the safety net.
evanklem2004
Built this as an opinionated Claude Code development flow based on evidence based practices and what has been working for me while developing professional code.
EvanFlow is a single TDD-driven loop. Say "let's evanflow this" and it walks brainstorm → plan → execute → tdd → iterate → STOP. Real checkpoints at design and plan approval. Never auto-commits, never auto-stages, never proposes integration - every git op is your call.
The three things that actually changed how I work:
1. Vertical-slice TDD. One failing test → minimal impl → next test. Watch each test fail before writing the impl that passes it. (Sounds obvious. Almost no agent does it by default. ~62% of LLM-generated test assertions are wrong per HumanEval research, so testing TDD discipline matters more than the impl discipline.)
2. Embedded grilling at decision points. Before locking a plan: what breaks if a user does X? What's the rollback? What's explicitly out of scope? Catches design flaws while they're still cheap.
3. Iterate-until-clean (hard cap of 5 rounds). Re-read the diff against dead code, naming, the deletion test, assertion correctness, and a Five Failure Modes pass (hallucinated actions, scope creep, cascading errors, context loss, tool misuse). For UI: screenshot via headless Chromium.
For bigger plans with 3+ independent units sharing types, it forks into a parallel coder/overseer orchestration. Integration tests at touchpoints ARE the cohesion contract.
Three install paths: Claude Code plugin marketplace, npx skills add, manual copy. MIT.
show comments
nghnam
superpowers/brainstorming is doing TDD as well.
jtfrench
How does this handle “dumb zone” evasion while looping?
TDD in 2026? Besides, TDDs main benefit is to come up with a decent architecture for your system… LLMs can already do that if instructed. I don’t see the point of TDD
If you’re just looking for the TDD part - https://github.com/nizos/tdd-guard - is the only project I’ve come across that actually enforces it with hooks and blocks edits rather than relying on a prompt that gets context rotted away.
To be honest, the official superpowers/brainstorming skill already does TDD so well, I don't see that much of a need for this. TDD is definitely the way to go with agentic development.
Two questions
1) Do you not feel self-conscious or weird about calling this "EvanFlow"? Seems like a lot of people these days are naming their AI tools/skills/whatever after themselves which seems self-absorbed. Either that or they hope that if their thing takes off like OpenClaw did then they'll grab the fame that comes along with it.
2) Why does your TDD flow miss the refactor step of TDD?
EvanFlow - thoughts arrive like butterflies?
The refactor step is the silent casualty in AI-assisted TDD. Once the test is green, Claude optimizes for moving to the next test, not for cleaning up the impl that just passed. An "iterate-until-clean" pass at the end is a different thing: you're refactoring cold code, not refactoring with a freshly-written test as the safety net.
Built this as an opinionated Claude Code development flow based on evidence based practices and what has been working for me while developing professional code.
EvanFlow is a single TDD-driven loop. Say "let's evanflow this" and it walks brainstorm → plan → execute → tdd → iterate → STOP. Real checkpoints at design and plan approval. Never auto-commits, never auto-stages, never proposes integration - every git op is your call.
The three things that actually changed how I work:
1. Vertical-slice TDD. One failing test → minimal impl → next test. Watch each test fail before writing the impl that passes it. (Sounds obvious. Almost no agent does it by default. ~62% of LLM-generated test assertions are wrong per HumanEval research, so testing TDD discipline matters more than the impl discipline.)
2. Embedded grilling at decision points. Before locking a plan: what breaks if a user does X? What's the rollback? What's explicitly out of scope? Catches design flaws while they're still cheap.
3. Iterate-until-clean (hard cap of 5 rounds). Re-read the diff against dead code, naming, the deletion test, assertion correctness, and a Five Failure Modes pass (hallucinated actions, scope creep, cascading errors, context loss, tool misuse). For UI: screenshot via headless Chromium.
For bigger plans with 3+ independent units sharing types, it forks into a parallel coder/overseer orchestration. Integration tests at touchpoints ARE the cohesion contract.
Three install paths: Claude Code plugin marketplace, npx skills add, manual copy. MIT.
superpowers/brainstorming is doing TDD as well.
How does this handle “dumb zone” evasion while looping?
https://www.evenflo.com/
TDD in 2026? Besides, TDDs main benefit is to come up with a decent architecture for your system… LLMs can already do that if instructed. I don’t see the point of TDD