You Can’t Whitelist Your Way to an Unattended Agent

Why we built pipe-down, an open-source Claude Code skill that keeps a coding agent inside its own permission allowlist.

Start a coding agent on a real task, walk away, and come back twenty minutes later. There is a good chance you will find it exactly where you left it — not failed, not finished, just stopped. Waiting. A permission prompt is on the screen asking you to approve a command, and nothing has moved since it appeared.

The command is almost never dangerous. It is usually something like cd packages/api && npm run build && npm run test. The agent could have run it. It simply could not run it without asking, and you were not there to answer.

This is the gap between an agent that assists you and an agent that runs unattended. Most teams try to close that gap by expanding the permission allowlist. In our experience that approach quietly fails, and it fails for a structural reason worth being precise about.

The allowlist is not where the problem is

Claude Code’s permission system works on an allowlist. You grant rules like Bash(git status:*) or Bash(npm run test:*), and a command that matches one of those rules runs without interruption. The natural instinct, after the third or fourth time the agent stops on you, is to add more rules. Approve more. Widen the list.

It does not work, and the reason is not that you have been lazy about it.

A permission rule matches a command. It does not match an arbitrary composition of commands. The moment the agent writes cd api && npm run build && npm test | tee out.log, the allowlist is no longer looking at npm run build. It is looking at one long opaque string that happens to contain npm run build somewhere inside it. A prefix rule has nothing clean to match against, so the agent asks.

You could, in principle, add a rule for that exact string. But the next compound command will be different. It will chain a different three tools, or pipe to grep instead of tee, or wrap a substitution in the middle. Every unique combination is a new string the allowlist has never seen.

This is the part that matters. The set of individual commands a project uses is small and finite — a few dozen tools, realistically. The set of compound commands you can build from those tools is not. It is combinatorial, and for practical purposes it is unbounded. You are trying to cover an open-ended generator with a fixed list. No amount of list maintenance closes that gap, because the gap is not a missing entry. It is a category mismatch.

This is the same problem we always have with AI systems

We build production systems where AI sits inside a workflow that has to hold up under real constraints, and the lesson repeats in domain after domain. The model, or the agent, is a generator of open-ended output. Everything around it — the schema, the validation boundary, the permission set — is a fixed, deterministic structure. Systems fail at the seam between the two.

The instinct that fails is always the same: make the fixed structure bigger to catch more of the open-ended output. Add more schema branches. Add more allow rules. You are enlarging a finite set to chase an infinite one.

The instinct that works is the opposite. You constrain the generator so its output lands inside the fixed structure by construction. You do not widen the schema to accept any JSON the model might emit; you constrain the model to emit the schema. You do not widen the allowlist to accept any command the agent might compose; you constrain the agent to compose commands the allowlist can actually match.

That second move is what pipe-down does.

What we built

pipe-down is a small open-source skill for Claude Code. It has one rule, and the rule is deliberately narrow: run exactly one command per Bash call.

No &&. No ||. No ;. No pipes. No command substitution, no subshells, no process substitution. When the agent would have written a chain, it instead issues the steps as separate calls and reads the result of each one itself. It becomes the && rather than delegating that to the shell. Where it would have piped output to grep or head, it uses the tool’s own flags — git log -n 5 instead of git log | head -5 — or just reads the full output and decides what matters.

The effect on the allowlist is the entire point. cd packages/api && npm run build && npm run test is one string the allowlist will almost certainly never match. The same work, expressed as cd packages/api, then npm run build, then npm run test, is three strings, each of which matches a simple prefix rule you already have. The command surface collapses from an unbounded space of compositions to a small, enumerable set of atomic invocations.

A small, enumerable set is something an allowlist can actually cover.

What this buys you

The practical result is the workflow you wanted in the first place. You approve a handful of atomic commands at the start of a run — the tools this particular task will touch, granted once as Bash(npm run test:*) and the like. From that point forward the agent stays inside the lines, because it is no longer generating commands that fall outside them. The run proceeds without you.

The difference is not that the agent asks for permission less often. It is that the approvals you give at the start are now sufficient. Under the default behavior, each compound command is novel, so an “always allow” never accumulates into real coverage — you are approving forever. Under pipe-down, the same dozen approvals keep matching for the rest of the run, because the agent keeps producing commands from the same small set.

That is the line between an agent you supervise and an agent you can leave alone.

What it is, and what it is not

pipe-down is not a security boundary. It makes an agent’s behavior more predictable; it does not make an unattended agent safe to ignore. The permission system is still doing the real work of deciding what is allowed to run, and a thoughtful allowlist still matters — pipe-down only ensures the commands reaching that allowlist are shaped so it can do its job. Granting Bash(rm:*) is still your decision and still carries the same weight it always did.

It also has a real cost. One command per call means more tool calls, and a transcript that is longer and more granular. We think that is a good trade for an agent that finishes its run, and in practice the granularity makes the run easier to audit afterward, not harder — every step is its own discrete, inspectable entry. But it is a trade, and you should make it knowingly.

And it is not a substitute for judgment about what the agent should be doing at all. It keeps a run from stalling. It does not decide whether the run was a good idea.

Why we are sharing it

We build systems where AI operates inside real constraints, and the recurring problem is rarely the model. It is the workflow around it — the place where an open-ended generator meets a fixed structure and someone assumed the two would simply line up. pipe-down is a small, sharp example of the pattern we apply to that seam: do not enlarge the boundary to chase the generator, constrain the generator to respect the boundary.

It is published under the MIT License. If you run coding agents unattended and you are tired of finding them stopped, it is worth the one-line install.

The repository is here: https://github.com/invariantengineering/skills/tree/main/pipe-down

If you use it, tell us where it gets in the way. The behavior is intentionally strict, and the edges — the genuinely atomic commands that only look compound, the rare pipe with no flag-based alternative — are exactly where field reports make it better.

Invariant Engineering Group builds production AI systems for regulated and enterprise workflows — engineered for correctness, traceability, and failure-aware operation. If you need an agentic system that holds up under real constraints, talk to an engineer.