billiam

I sure hope the community is working on these APIs now for Linux, with pressure to come on M$ and Apple.

josalhor

We need LLM query routing at the OS level like Mobile data. I know it will sound crazy but hear me out. I think about this AI inference as infrastructure. I do not want to pay for it on every app I use it on. I do not think "I have to pay the mobile data of youtube, and the mobile data of whatsapp etc.". I pay Mobile data infrastructure and let my device route it appropiately. In fact, if we ever go the local llm route, you could have LLM capabilities without having access to the internet (or local LAN), and your OS/computer is the only one capable of doing that routing for you.

show comments
dd8601fn

It's funny how much that first paragraph is Claude's voice. I don't know how it got trained so hard to use, "the shape of" for everything.

show comments
hmokiguess

There's a hidden tax with routing this way, the original model loses context of what was done and either performs a regression or hallucinates.

I think this sort of behaviour started happening more frequently as agentic/ai programming became more often.

Back in the days (lol, reads like a long time ago but that's probably a few months?), you would not say "edit this typo", you would just open the file and not be lazy, and the harness would detect a user change and ground itself.

I feel like now, when I edit outside the AI flow, it goes and introduces a regression or gets lots thinking it didn't do that and something must have gone wrong.

kstenerud

I'm not sure I understand what this is trying to solve?

If a prompt I give routes to one model, and then another prompt to another model, how does one tie the context together such that the next model knows what's going on?

Otherwise this would only be useful for one-off prompts as far as I can tell.

And if it did keep a context to be passed around, it would always land hot (not in the cache).

show comments
nok22kon

we could use some composability.

today any kind of routing requires implementing an http proxy to put in the middle

ideally harnesses would support a routing plugin which receives the new whole context and returns just where to send it, and the harness does that. no http proxy. obviously some complications if you want to route from codex to anthropic or openrouter.

but we need to decouple the context building and routing decision from the actual http requests sending, we need to be able to insert "context/routing plugins" in the chain

show comments
darepublic

Some kind of routing prompts to different models does make sense. But the usecase of saving money on simple prompt.. I think that has only a slight benefit. Fix my typo doesn't use many tokens anyhow.. also model switching still requires carrying over context so it does have some overhead right.

_pdp_

There are so many proxies like this now but I can tell you from first hand experience this is not going to work. You cannot just route away from a situation at such a high level especially when we are talking about models that are quite different in behaviour, with different context windows and tuned to different tool uses. The harness is doing all kind of funky things to compensate for issues (like tool call truncation) that a proxy that routes dynamically like this will work against the very same strategies that make the harness work.

Interesting concept, work in theory, but I cannot see this being part of larger system.

show comments
try-working

Love to see local/cloud routing explicitly supported.

I'm building another router for routing between local and remote models, ShowHN coming up later today. Here's a sneak preview of the github: https://github.com/try-works/role-model

show comments
stanpinte

We are developing many applications in my company, some of them safety critical. A natural routing way could happen for certain phases of development, and interfaces via git. One agent works on branch a and is responsible for brainstorm planning specs, and the other is responsible code and tests. The first agent creates tickets for the second one and the second one consumes these. This works with today’s standard harness.

JSR_FDED

Slight tangent, but “Wayfinder sits behind whatever OpenAI-compatible client you already use” reminds me that descriptions of where proxies sit in the information flow always seem so arbitrary to me:

  - “after the client”
  - “reverse proxy” (in front  of servers)
  - “proxy” (in front of client)
I always have to look this up, surely there must be a standardized way to describe this?
show comments
mrkn1

Has anyone tried the others listed? Any feedback?

throwawayk7h

It'd be nice to just have a command prefix e.g.

/local fix my typo

show comments
ListeningPie

can you send to multiple LLMs to compare responses? From that create a heuristic of which LLM gets what.

api

I do this manually with a desktop app called BoltAI that lets you continue the whole conversation at your LLM of choice.

quijoteuniv

This is the way!

show comments
harvardhan1

axa

harvardhan1

wfdwcZz