Parallel coding agents with tmux and Markdown specs

gas9S9zw3P9c

I'd love to see what is being achieved by these massive parallel agent approaches. If it's so much more productive, where is all the great software that's being built with it? What is the OP building?

Most of what I'm seeing is AI influencers promoting their shovels.

show comments

jasonjmcghee

I certainly don't run 6 at a time, but even with just 1 - if it's doing anything visual - how are folks hooking up screenshots to self verify? And how do you keep an eye on it?

The only solution I've seen on a Mac is doing it on a separate monitor.

I couldn't find a solution here and have built similar things in the past so I took a crack at it using CGVirtualDisplay.

Ended up adding a lot of productivity features and polished until it felt good.

Curious if there are similar solutions out there I just haven't seen.

https://github.com/jasonjmcghee/orcv

ramoz

I did a sort of bell curve with this type of workflow over summer.

- Base Claude Code (released)

- Extensive, self-orchestrated, local specs & documentation; ie waterfall for many features/longer term project goals (summer)

- Base Claude Code (today)

Claude Code is getting better at orchestrating it's own subagents for divide/conquer type work.

My problem with these extensive self-orchestrated multi-agent / spec modes is the type of drift and rot of all the changes and then integrated parts of an application that a lot of the time end up in merge conflicts. Aside from my own decision cognitive space, it's also a lot to just generally orchestrate and review. I spent a ton of type enforcing Claude to use the system I put in place including documentation updates and continuous logging of work.

I feel extremely productive with a single Claude Code for a project. Maybe for minor features, I'll launch Claude Code in the web so that it can operate in an isolated space to knock them out and create a PR.

I will plan and annotate extensively for large features, but not many features or broad project specs all at the same time. Annotation and better planning UX, I think, are going to be increasingly important for now. The only augment of Claude Code I have is a hook for plan mode review: https://github.com/backnotprop/plannotator

show comments

servercobra

This is a really cool design, pretty similar to what I've built for implementation planning. I like how iterative it is and that the whole system lives just in markdown. The verify step is a great idea I hadn't made a command yet, thank you!

This seems like it'd be great for solo projects but starts to fall apart for a team with a lot more PRs and distributed state. Heck, I run almost everything in a worktree, so even there the state is distributed. Maybe moving some of the state/plans/etc to Linear et al solves that though.

show comments

CloakHQ

We ran something similar for a browser automation project - multiple agents working on different modules in parallel with shared markdown specs. The bottleneck wasn't the agents, it was keeping their context from drifting. Each tmux pane has its own session state, so you end up with agents that "know" different versions of reality by the second hour.

The spec file helps, but we found we also needed a short shared "ground truth" file the agents could read before taking any action - basically a live snapshot of what's actually done vs what the spec says. Without it, two agents would sometimes solve the same problem in incompatible ways.

Has anyone found a clean way to sync context across parallel sessions without just dumping everything into one massive file?

show comments

aceelric

I’ve been experimenting with a similar pattern but wrapping it in a “factory mode” abstraction (we’re building this at CAS[1]) where you define the spec once after careful planning using a supervisor agent then you let it go and spin up parallel workers against it automatically. It handles task decomposition + orchestration so you’re not manually juggling tmux panes

[1] https://cas.dev

show comments

zwilderrr

I just can’t get over the fact that your Anglicized name sounds like manual shipper.

show comments

sluongng

Yeah the 8 agents limit aligns well with my conversations with folks in the leading labs

https://open.substack.com/pub/sluongng/p/stages-of-coding-ag...

I think we need much different toolings to go beyond 1 human - 10 agents ratio. And much much different tooling to achieve a higher ratio than that

show comments

nferraz

I liked the way how you bootstrap the agent from a single markdown file.

show comments

hinkley

These setups pretty much require the top tier subscription, right?

show comments

kledru

I think you should have a reviewer as well.

show comments

philipp-gayret

Is there a place where people like you go to share ideas around these new ways of working, other than HN? I'm very curious how these new ways of working will develop. In my system, I use voice memo's to capture thoughts and they become more or less what you have as feature designs. I notice I have a lot of ideas throughout the day (Claude chews through them some time later, and when they are worked out I review its plans in Notion; I use Notion because I can upload memos into it from my phone so it's more or less what you call the index). But ideas.. I can only capture them as they come, otherwise they are lost & I don't want to spend time typing them out.

show comments