Sandboxing LLM CLI Suggestions Before They Hit Bash



Hook
You ask your terminal copilot, "how do I nuke Docker images?" It suggests sudo rm -rf /var/lib/docker. You paste it, the command succeeds, and suddenly your workstation's /var/lib/apt is gone too because the command expanded a glob differently on macOS. If the assistant had recommended rm -rf /, you might have bricked prod nodes. We cannot trust vibe-based CLI recipes without a harness.
The Problem Deep Dive
LLMs excel at approximating shell commands but ignore context: OS, permissions, directory structure, or safety nets. Pastes from chat to terminal lead to:
- Running destructive commands with
sudo. - Executing on wrong host (prod vs dev) because
sshcontext is implicit. - Hidden Unicode characters (Zero Width Space) altering commands.
Technical Solutions
Quick Patch: Paste Proxy
Pipe commands through a wrapper:
alias vibe='~/.local/bin/vibe-run'
vibe-run reads stdin, runs shellcheck, highlights risky patterns (wildcards, rm -rf), and prompts before execution.
Durable Fix: Policy + Sandbox Pipeline
- LLM output -> JSON plan. Instead of raw command, ask assistant for structured output:
{
"description": "Remove dangling Docker images",
"commands": [
"docker image prune -f"
]
}
-
Policy engine. Evaluate each command with Rego or bash AST parser. Deny rules like
rm -rf /,chmod 777 -R, network calls outside allow list. -
Dry-run shell. Execute commands inside
toolboxcontainer ordistroboxwith read-only bind mounts:
podman run --rm -it \
-v "$PWD:/workspace:ro" \
-v /tmp/vibe:/scratch \
localhost/vibe-shell:latest bash -lc "${CMD}"
-
Approval gating. For high-risk actions (package install, cluster mutate), require
y/nwith context showing diff. -
Context stamping. Tag commands with
HOST,PWD, git branch, ticket ID. Log decisions.
LLM prompt example:
Return JSON describing safe commands. Fields: commands[], requiresRoot, destructive.
Testing & Verification
- Unit-test the policy engine with fixtures of dangerous commands.
- Add integration tests that run sample commands inside the sandbox to ensure mount rules hold.
- Hook into CI: run
vibe-run --verifyto ensure saved scripts comply with policies.
Common Questions
Does sandboxing slow me down? Slightly, but caching containers or using lightweight namespaces keeps latency low.
What about interactive tools? Allow only whitelisted commands to run outside sandbox (e.g., vim).
Can't we just trust ShellCheck? ShellCheck catches syntax, not intent. Use both.
How to handle remote contexts? Preface commands with ssh devbox and require explicit host allow lists.
Conclusion
AI CLI vibes are cool until they rm the wrong directory. Force structure, enforce policy, and run commands in sandboxes before they touch your shell. Confidence beats chaos.