Secure your AI stack with Alprina. Request access or email hello@alprina.com.

Alprina Blog

Guard Rails for AI Agents: Tooling Contracts Developers Can Trust

Cover Image for Guard Rails for AI Agents: Tooling Contracts Developers Can Trust
Alprina Security Team
Alprina Security Team

Hook: The Agent That Opened a High-Severity Pager Duty Incident

You build an LLM agent that triages alerts. It has tools to query Prometheus, tail logs, and create Jira tickets. During a late-night incident, the agent interprets "silence alert" as "delete alert" and calls the PagerDuty API to disable the service. PagerDuty dutifully silences everything, masking a real outage for 45 minutes. Postmortem reveals the tooling contract was a docstring and the agent had write access to production APIs with no review step.

Developers love AI agents because they automate toil, but every tool you expose is a new attack surface. This article dives into building safe tooling contracts: schema validation, approval workflows, sandboxes, and audit logs. We show code in TypeScript and Python for popular frameworks (LangChain, custom orchestrators) and include tests.

The Problem Deep Dive

LLM tooling pitfalls:

  • Ambiguous tool descriptions. Agents infer capabilities incorrectly.
  • Lack of parameter validation. Agents pass malformed JSON; tools fail unpredictably.
  • Over-scoped credentials. Tools hold admin tokens.
  • No human-in-the-loop. Agents execute destructive actions automatically.
  • Weak auditing. Logs lack context for investigations.

Example anti-pattern:

tools.push({
  name: "pagerduty",
  description: "Silence PagerDuty incidents",
  func: async (input) => {
    await pagerDutyClient.silenceService(input.serviceId);
  },
});

No parameter validation, no dry run, no audit.

Technical Solutions

Quick Patch: JSON Schema Validation

Define JSON schema for tool inputs and validate before execution.

import Ajv from "ajv";

const silenceSchema = {
  type: "object",
  properties: {
    serviceId: { type: "string" },
    durationMinutes: { type: "integer", minimum: 5, maximum: 120 },
    reason: { type: "string" },
  },
  required: ["serviceId", "durationMinutes"],
  additionalProperties: false,
};

const validate = new Ajv().compile(silenceSchema);

async function silenceTool(rawInput: string) {
  const input = JSON.parse(rawInput);
  if (!validate(input)) {
    throw new Error("Invalid tool input" + JSON.stringify(validate.errors));
  }
  return requestApproval({
    action: "silence",
    payload: input,
  });
}

Durable Fix: Tool Contracts with Approval and Sandboxing

  1. Command builder: Tools return a command description, not executing immediately.
  2. Policy engine: Evaluate commands against policy (scope, time window).
  3. Approval flow: Require human approval for high-risk actions (Slack interactive message).
  4. Execution sandbox: Run allowed commands in isolated worker with limited tokens.

Python example with LangChain:

class PagerDutyTool(BaseTool):
    name = "pagerduty_silence"
    description = "Request to silence a service for minutes"
    args_schema = SilenceModel

    def _run(self, service_id: str, duration_minutes: int, reason: str = "") -> str:
        cmd = {
            "tool": self.name,
            "service_id": service_id,
            "duration_minutes": duration_minutes,
            "reason": reason,
        }
        if not policy_engine.allow(cmd):
            raise ToolException("policy denied")
        ticket = approvals.submit(cmd)
        return f"Pending approval ticket {ticket.id}"

Execution service (Node):

app.post("/execute", requireSignedCommand, async (req, res) => {
  const { command, signature } = req.body;
  if (!policyEngine.allow(command)) return res.status(403).send("denied");
  const token = tokenBroker.issue("pagerduty", scope = "service:${command.serviceId}");
  await pagerDutyClient.silenceService(command.serviceId, {
    duration: command.durationMinutes,
    token,
  });
  audit.log(command, req.user);
  res.json({ status: "ok" });
});

Commands require signed payloads from approval service.

Token Broker and Least Privilege

  • Use STS or custom token broker to mint short-lived tokens with narrow scopes.
  • Never store API keys in agent process memory beyond execution.

Output Monitoring

  • Log every tool invocation with inputs, outputs, latency.
  • Emit metrics for tool.denied, tool.approved, tool.executed.
  • Alert on failed policy checks or repeated denials.

Red Teaming

  • Feed prompts that request disallowed actions. Ensure policy denies and logs events.
  • Test injection attacks (e.g., tool names overlapping or malicious JSON).

Alprina Policies

Scan repositories for tool definitions lacking schema validation or policy checks. Ensure approval flows exist for high-risk tools.

Testing & Verification

  • Unit tests for schema validation, policy decisions, and approval flow.
test("policy denies long silence", () => {
  expect(() => policyEngine.allow({ durationMinutes: 500 })).toThrow();
});
  • Integration tests hitting /execute with signed commands. Use ephemeral PagerDuty sandbox.
  • Fuzz tool inputs with jsfuzz or property-based tests to ensure validation rejects unexpected fields.
  • Simulate compromised agent by skipping approval; ensure backend rejects unsigned commands.

Common Questions & Edge Cases

Do we really need approvals for everything? No. Categorize tools by risk. Read-only tools (metrics queries) auto-execute. Write operations require approval or multi-factor triggers.

What about latency? Cache approvals for low-risk actions or implement break-glass procedures with audit trail.

How to support mobile on-call? Surface approval prompts in Slack/Teams with context payloads.

Can agents learn to bypass policies? If policies operate outside the LLM (in code), agents cannot bypass them. Never rely solely on prompts.

Does policy authoring slow developers down? Provide scaffolding and tests. Policies should live alongside tools so owners update both.

Conclusion

AI agents unlock productivity when their tools are safe by construction. Treat tooling like any other privileged API: validate inputs, enforce policies, require approval, and sandbox credentials. With these guardrails, agents can triage incidents without creating new ones.