Agent Skills

280 points137 comments14 hours ago
wg0

Snake oil. Good to read for sure. Seems all plausible too. But snake oil nevertheless.

Here's why: The slot machine can drop any hard requirement that you specifically in your AGENTS.md, memory.md or your dozens of skill markdowns. Pretty much guaranteed.

These harnesses approaches pretend as if LLMs are strict and perfect rule followers and the only problem is not being able to specify enough rules clearly enough. That's fundamental cognitive lapse in how LLMs operate.

That leaves only one option not reliable but more reliable nevertheless: Human review and oversight. Possibly two of them one after the other.

Everything else is snake oil but at that point, you also realize that promised productivity gains are also snake oil because reading code and building a mental model is way harder than having a mental model and writing it into code.

show comments
ai_fry_ur_brain

Cant wait for everyone to realize they've wasted a year + messing with agents and experiencing a feeling of psuedo productivity.

show comments
stellalo

> A skill is a markdown file with frontmatter that gets injected into the agent’s context when the situation calls for it.

When the LLM decides that the situation calls for it

> It is a workflow: a sequence of steps the agent follows, with checkpoints that produce evidence, ending in a defined exit criterion.

A sequence of steps the LLM can decide to follow

show comments
dmix

I've tried these larger agent skillsets in the past and felt it was a waste of time because it was just doing too much. Just like vim it's often better to pick and choose from the community instead of installing skills like they are an IDE. Skills are way too personal because every dev and dev team is different. So better to treat these as a reference for your own config rather than bulk install someone else's config.

show comments
CharlesW

From an SEO/LLMO perspective, the discoverability of these skills will be difficult without a rename: https://agentskills.io/

If Addy reads this, how do you pitch this vs. Superpowers? https://github.com/obra/superpowers

show comments
thatmf

Why are people so excited to put themselves out of a job?

Not that these or any "skills" will do that, but just- in principle. This is like alienation from labor at scale.

show comments
cortesoft

What makes this better/different than spec-kit? It seems to have a very similar philosophy. I wonder if they could work together? Or would they just be duplicative?

https://github.com/github/spec-kit

Lio

“A senior engineer’s job is mostly the parts that don’t show up in the diff.”

Agent Skills is Addy’s attempt to kill that job too. Cheers Addy. :P

hansmayer

Why does it feel like it was AI-written ?

zmmmmm

I was surprised how long some of these skills are. They are pages and pages long with tables and checkbox lists and code examples, etc.

Curious how normal that is - it would only take a couple of these to really fill the context alot.

show comments
Trusteando

Design a test to verify that the harness keeps the rider on the horse. Parameterize it by context size.

koliber

Lately I keep hearing the same thing over and over: the things that are good for managing a team of devs are good for LLMs.

Good test cases.

Clear and concise documentation.

CI/CD.

Best practices and onboarding docs.

Managing LLMs is becoming more and more similar to managing teams of people.

show comments
ElijahLynn

I've been using Agent Skills on a new side project and I'm really impressed so far! It really holds my hand a lot of the way and really lets me focus on developing a product instead of figuring out how to build it. I get to focus much more energy on high level architecture and product design.

Very grateful for this repository and everyone who contributed to it!

SudheerTammini

Recently I have got an access(enterprise)to the latest ChatGPT module with an ability to write skills to automate repeatable taks. Without any prior knowledge I just started tinkering and now after creating and testing multiple skills in real business environment I can confidently say writing a good skill is a skill itself. As the author mentioned it's not an essay but a specific instructions sets organised in steps and in a concise manner.

ColinEberhardt

Agents Skills are built upon “Five design decisions [that] are the load-bearing ones”

And Open Design (HN front page yesterday) is supported by “Six load-bearing ideas”

The similarities in the way these prompt libraries are documented doesn’t feel coincidental.

codemog

Everyone who writes this kind of stuff skips the boring parts: science and engineering.

Yep, benchmarks, comparisons of with/without, samples of generated code with/without. This kind of stuff matters, and you may be making your agent stupider or getting worse results without real analysis.

Also this prose reads like the author has drunk the Google kool-aid and not much else.

karinakarina3

Another example of agent skills that give AI agents access to bitdrift's mobile observability platform for full-fidelity agentic investigations -- https://bitdrift.ai/

senko

> This isn’t a coincidence. It’s the same SDLC every functioning engineering organisation runs, just in different vocabulary. [...] Amazon calls it the working-backwards memo and the bar raiser. Every healthy team has some version of this loop.

This (sdlc == working backwards & bar raiser) is so horribly wrong, that I hope this was an LLM hallucination.

In general, I'm starting to see these agent scaffolding systems as an anti-pattern: people obsess over systems for guiding agents and construct elaborate rube-goldberg machines and then others cargo-cult them wholesale, in an effort to optimize and control a random process and minimize human involvement.

show comments
tariky

What is difference between superpowers and this?

I use superpowers for several months now and it really does help. But still 90/10 rule applies, 10% of time it will produce stupid decision. So always check spec.

turlockmike

The best way to prompt an LLM is to describe the outcome you want, that's it. They are trained as task completers. A clear outcome is way better than a process.

If the LLM fails, either you didn't describe your outcome sufficiently or is misinterpreted what you said or it couldn't do it (rare).

Common errors should be encoded as context for future similar tasks, don't bloat skills with stuff that isn't shown to be necessary.

show comments
konaraddi

There’s so many ways, many redundant, to set up agents for software development that beyond personal/team/org needs+tastes, I need to look into setting up some benchmarks to evaluate what set up is optimal or whether the differences are even worth it.

y-curious

Thanks for this, going to steal a lot of this. I would install your plugin, but I worry about being able to delete it later. I also think that each one of these is better served customized to a developer. That said, I'm still going to grab some of these, thanks!

show comments
theahura

I really wish he wouldn't use AI to write his posts. It would be faster to just post the prompt he used to write the article

show comments
gavmor

Naming things is such a hard problem that many devs don't even bother trying.

That being said, this post is full of reasonable assertions, so I'm looking forward to experimenting with this... whatever it is.

show comments
hansmayer

What skills mate, this is simply text files attempting to narrow down the specs, hoping that this will help the "AI" make less mistakes. But it is still crap, because, <drum-rolls> - it still depends on how this fits into the overall statistical model which changes with every prompt, etc... Please stop peddling this bullshit, it does not work!

shruubi

Am I the only one who looks at guys like Addy Osmani and Steve Yegge who before LLM's had a good reputation and since then get the feeling they are cashing that reputation in to ride the LLM hype-cycle? Or is it just a matter of professional tech talking heads moving from writing books and giving conference talks about good engineering practices to talking about the new hot topic that sells books and conference tickets?

rafaelmn

> It’s people accepting plausible-sounding justifications for skipping the parts they don’t feel like doing.

WTF ? Almost always this was "skipping the parts because the deadline was 2 weeks ago". The "I don't feel like it" rationalizations are maybe 20% ? Unless deadlines are rationalizations too ?

simianwords

The fundamental problem with agent skills is that it doesn’t have a hook to do one time installation. An agent can’t just be a prompt. It also has to have some way to do initial set up work.

If I have an agent skill to look up prices of stocks, maybe I need to set up some tools and authentication first. There’s no way to express this!

gosukiwi

I wonder how does this compare to superpowers

scotty79

> Workflows are agent-actionable; essays are not. The same is true for human teams. If your team handbook is 200 pages, no one reads it under time pressure.

Agents do read that. And actually remember it. Because it's tiny with other things you are cramming into their context.

AndyNemmity

This is why I created the /do router, to route to all skills. I also have anti rationalization, progressive context discovery etc.

I only make it for me, so it's a bit complex and targeted towards me, and what I do, but it's pretty easy to adjust things.

https://github.com/notque/vexjoy-agent

Working on reading through Agent Skills, it seems we've converged on a lot of the same points, and I've never seen it, so trying to get an understanding of it.

Edit 1: I don't like all the commands. I just rely on a single router to automatically decide what I want, and that feels like the most reasonable way to me to communicate with it.

I don't want to remember things. And that's the way for me to scale the number of skills and activities. I don't have to think about them.

Edit 2: We have very different routers.

https://github.com/addyosmani/agent-skills/blob/f504276d8e07...

vs

https://github.com/notque/vexjoy-agent/blob/main/skills/do/S...

I personally wouldn't call theirs an intelligent router. They are dancing between a few different skills. We have extremely different setups there.

But of course, I'm using way more context to get it done. I'm even sending it out to Haiku to build the route choices.

I choose to use tokens to make things better for myself, not everyone would make the same choice, so I certainly see why they are using a few skills, and composing them.

Edit 3: This is much easier for a user to wrap their head around because there's much less.

I am only focused on the best improvements I can make that show value for my use cases. This is straight foward to reason about.

This seems like a nice way to get the best concepts for people trying to understand them. I commend them for a clean, simple approach.

Edit 4: Yeah, I think there are some things I can learn from them which is always good.

I especially like simple decisions like collapsing the install details for each harness in the readme.

I'm going to read over the entire thing and look for opportunities to improve my stuff.

We are all working together, learning, testing, building, trying to find the best way to implement things.

encoderer

I adopted a couple of these, the api design and ui testing ones have been particularly helpful.