Sandboxing is a great security step for agents. Just like using guardrails is a great security step. I can't help but feel like it's all soft defense though. The real danger comes from the agent being able to read 3rd party data, be prompt injected, and then change or exfiltrate sensitive data. A sandbox does not prevent an email-reading agent from reading a malicious email, being prompt injected, and then sending an email to a malicious email address with the contents of your inbox. It does help in implementing network-layer controls though, like apply a policy that says this linux-based sandbox is only allowed to visit [whitelisted] urls. This kind of architectural whitelisting is the only hard defense we have for agents at the moment. Unfortunately it will also hamper their utility if used to the greatest extent possible.
show comments
ajb
We definitely need a vendor-independent tool like this. Have been reviewing the Claude setup and, despite initially being hopeful since it uses bubblewrap, it's quite problematic:
* The definitions of security config in the documentation of settings.json are unclear. Since it's not open source, you can't check the ground truth.
* The built in constructs are insufficient to do fully whitelist based access control (It might be possible with a custom hook).
* Security related issues go unanswered in the repo, and are automatically closed.
Haven't looked into copilot as much but didn't look great either. Seems like the vendors don't have the incentives to do this properly.
So I'm on the lookout for a better way, and matchlock seems like a contender.
show comments
indigodaddy
This is great. Wish this was around when I started working on vibebin ( https://github.com/jgbrwn/vibebin ), probably would have leveraged matchlock instead of Incus/LXC. I guess I could fork/branch and give it a go! Although for vibebin use case I actually need them to not be ephemeral. Edit, ooooh i see `--rm=false` nice
Where do the images come from? What are our options around that and also using custom images etc?
show comments
insuranceguru
sandboxing is really the only way to make agentic workflows auditable for enterprise risk. we can't underwrite trust in the model's output, but we can underwrite the isolation layer. if you can prove the agent literally cannot access the host network or sensitive volumes regardless of its instructions, that's a much cleaner compliance story than just relying on system prompts.
show comments
clarity_hacker
This is the confused deputy problem at the application layer. Sandboxing secures the environment, but if the agent has legitimate access to sensitive operations (email, database writes, API calls), prompt injection attacks work through approved channels. The only hard defense is explicit user confirmation for each action, which defeats the point of autonomy.
raphinou
I've been happily using a container to run my agents [1]. I tried to make it evolve with more advanced features, but it quickly became harder to use and I went back to a basic container which I just start with a run.sh script. Is a similar simple use possible with matchlock?
I think for the first time ever, we are facing a paradigm shift in containment/sandboxing.
Just as Docker became the de facto standard for cloud containerization, we are seeing a lot of solutions attempting to sandbox AI agents. But imo there is a fundamental difference: previously, we sandboxed static processes. Now, we are attempting to sandbox something that potentially has the agency and reasoning capabilities to try and get itself out.
It’s going to be super interesting (and frankly exciting) to see how the security landscape evolves this time around.
show comments
ssd532
What are the advantages of using this over lxd system container or if we want VM isolation them lxd VMs? Is it the developer experience or there are any agent specific experience which is the key thing here?
show comments
throwaw12
This is very cool, is it possible to mount NFS as a storage layer?
the_harpia_io
containers are fine for basic isolation but the attack surface is way bigger than people think. you're still trusting the container runtime, the kernel, and the whole syscall interface. if the agent can call arbitrary syscalls inside the container, you're one kernel bug away from a breakout.
what I'm curious about with matchlock - does it use seccomp-bpf to restrict syscalls, or is it more like a minimal rootfs with carefully chosen binaries? because the landlock LSM stuff is cool but it's mainly for filesystem access control. network access, process spawning, that's where agents get dangerous.
also how do you handle the agent needing to install dependencies at runtime? like if claude decides it needs to pip install something mid-task. do you pre-populate the sandbox or allow package manager access?
show comments
__alexs
Why would secrets ever need to be available to the agent directly rather than hidden inside the tool calling framework?
show comments
stogot
Is this just a copycat of the deno soundbox announcement from a few days ago?
pjio
If I'm already on Linux, how does it compare to using bubblewrap?
Sandboxing is a great security step for agents. Just like using guardrails is a great security step. I can't help but feel like it's all soft defense though. The real danger comes from the agent being able to read 3rd party data, be prompt injected, and then change or exfiltrate sensitive data. A sandbox does not prevent an email-reading agent from reading a malicious email, being prompt injected, and then sending an email to a malicious email address with the contents of your inbox. It does help in implementing network-layer controls though, like apply a policy that says this linux-based sandbox is only allowed to visit [whitelisted] urls. This kind of architectural whitelisting is the only hard defense we have for agents at the moment. Unfortunately it will also hamper their utility if used to the greatest extent possible.
We definitely need a vendor-independent tool like this. Have been reviewing the Claude setup and, despite initially being hopeful since it uses bubblewrap, it's quite problematic:
* The definitions of security config in the documentation of settings.json are unclear. Since it's not open source, you can't check the ground truth.
* The built in constructs are insufficient to do fully whitelist based access control (It might be possible with a custom hook).
* Security related issues go unanswered in the repo, and are automatically closed.
Haven't looked into copilot as much but didn't look great either. Seems like the vendors don't have the incentives to do this properly.
So I'm on the lookout for a better way, and matchlock seems like a contender.
This is great. Wish this was around when I started working on vibebin ( https://github.com/jgbrwn/vibebin ), probably would have leveraged matchlock instead of Incus/LXC. I guess I could fork/branch and give it a go! Although for vibebin use case I actually need them to not be ephemeral. Edit, ooooh i see `--rm=false` nice
Where do the images come from? What are our options around that and also using custom images etc?
sandboxing is really the only way to make agentic workflows auditable for enterprise risk. we can't underwrite trust in the model's output, but we can underwrite the isolation layer. if you can prove the agent literally cannot access the host network or sensitive volumes regardless of its instructions, that's a much cleaner compliance story than just relying on system prompts.
This is the confused deputy problem at the application layer. Sandboxing secures the environment, but if the agent has legitimate access to sensitive operations (email, database writes, API calls), prompt injection attacks work through approved channels. The only hard defense is explicit user confirmation for each action, which defeats the point of autonomy.
I've been happily using a container to run my agents [1]. I tried to make it evolve with more advanced features, but it quickly became harder to use and I went back to a basic container which I just start with a run.sh script. Is a similar simple use possible with matchlock?
1:https://github.com/asfaload/agents_container
I think for the first time ever, we are facing a paradigm shift in containment/sandboxing.
Just as Docker became the de facto standard for cloud containerization, we are seeing a lot of solutions attempting to sandbox AI agents. But imo there is a fundamental difference: previously, we sandboxed static processes. Now, we are attempting to sandbox something that potentially has the agency and reasoning capabilities to try and get itself out.
It’s going to be super interesting (and frankly exciting) to see how the security landscape evolves this time around.
What are the advantages of using this over lxd system container or if we want VM isolation them lxd VMs? Is it the developer experience or there are any agent specific experience which is the key thing here?
This is very cool, is it possible to mount NFS as a storage layer?
containers are fine for basic isolation but the attack surface is way bigger than people think. you're still trusting the container runtime, the kernel, and the whole syscall interface. if the agent can call arbitrary syscalls inside the container, you're one kernel bug away from a breakout.
what I'm curious about with matchlock - does it use seccomp-bpf to restrict syscalls, or is it more like a minimal rootfs with carefully chosen binaries? because the landlock LSM stuff is cool but it's mainly for filesystem access control. network access, process spawning, that's where agents get dangerous.
also how do you handle the agent needing to install dependencies at runtime? like if claude decides it needs to pip install something mid-task. do you pre-populate the sandbox or allow package manager access?
Why would secrets ever need to be available to the agent directly rather than hidden inside the tool calling framework?
Is this just a copycat of the deno soundbox announcement from a few days ago?
If I'm already on Linux, how does it compare to using bubblewrap?
See also:
https://github.com/obra/packnplay
https://github.com/strongdm/leash
https://github.com/lynaghk/vibe
(I've been collecting different tools for sandboxing coding agents)
very cool, if you want cross-platform microvms, there's an interesting project called libkrun that powers projects like Podman and Colima.
here's a Go binding: https://github.com/mishushakov/libkrun-go
demo (on Mac): https://x.com/mishushakov/status/2020236380572643720
Have I told you about our lord and savior: `useradd`