Secure your AI stack with Alprina. Request access or email hello@alprina.com.

Alprina Blog

Webhook Replay Shields: Building Idempotent Handlers That Do Not Blink

Cover Image for Webhook Replay Shields: Building Idempotent Handlers That Do Not Blink
Alprina Security Team
Alprina Security Team

Hook: The Retry Storm That Paid the Same Invoice Twice

Your billing system consumes Stripe webhooks. A transient network blip causes Stripe to retry the invoice.paid event five times. Your handler validates the signature, but the processing logic is not idempotent: it credits the customer account and triggers an email each time. Later that month, an attacker replays older events with valid signatures, causing duplicate payouts. You discover there is no replay guard, no nonce table, and no visibility into which events were processed.

In this article we build webhook handlers that absorb replays safely. We cover signature verification, nonce storage, rate limiting, and tests you can run locally with real payloads. Examples use Express (Node.js) and Rails, but the patterns apply widely.

The Problem Deep Dive

Webhooks are attractive targets because:

  • Signatures validate authenticity but not uniqueness. Attackers can replay signed payloads within allowed windows.
  • Handlers often mutate state. Idempotency is an afterthought.
  • Infrastructure retries. Load balancers or API gateways duplicate requests.
  • Clock skew. Timestamp validation fails when system clocks drift.

Example anti-pattern in Node:

app.post("/stripe/webhook", bodyParser.raw({ type: "application/json" }), (req, res) => {
  const event = stripe.webhooks.constructEvent(req.body, req.headers["stripe-signature"], secret);
  if (event.type === "invoice.paid") {
    creditAccount(event.data.object.customer);
  }
  res.sendStatus(200);
});

No replay detection, no logging, and creditAccount is not idempotent.

Technical Solutions

Quick Patch: Idempotency Store

Store event IDs in a durable store (Redis, Postgres).

const processedKey = `stripe:event:${event.id}`;
const inserted = await redis.set(processedKey, "1", { NX: true, EX: 24 * 3600 });
if (!inserted) {
  return res.sendStatus(200);
}

Rails example:

processed = ProcessedEvent.find_or_initialize_by(event_id: event.id)
return head :ok if processed.persisted?
processed.save!

Durable Fix: Signature + Timestamp + Nonce

  1. Verify signature using vendor SDK.
  2. Validate timestamp within a small window (5 minutes).
  3. Store event ID with TTL.
  4. For mutable operations, wrap business logic in transactions.

Node (Express):

const tolerance = 5 * 60;
const signature = req.headers["stripe-signature"] as string;
const event = stripe.webhooks.constructEvent(req.body, signature, secret, tolerance);

const inserted = await redis.set(processedKey, now.toString(), { NX: true, EX: 7 * 24 * 3600 });
if (!inserted) {
  logger.info({ eventId: event.id }, "duplicate webhook ignored");
  return res.sendStatus(200);
}

await processInvoice(event.data.object);
res.sendStatus(200);

Event Schema Validation

Ensure payload matches expected structure. Use zod in Node or ActiveModel::Type in Rails to coerce types. Reject unexpected fields to avoid deserialization exploits.

Side Effects in Transactions

Wrap state mutations in transactions:

ActiveRecord::Base.transaction do
  account = Account.lock.find_by!(stripe_customer: customer_id)
  account.credit!(amount)
  WebhookLog.create!(event_id: event.id, payload: payload)
end

If processing fails, delete the processed marker or use SETNX with short TTL so replays retry work.

Monitoring and Alerting

  • Log event IDs, timestamps, client IPs.
  • Expose metrics for webhook.duplicates, webhook.signature_failures.
  • Alert when duplicates spike.

Alprina Policies

Detect missing replay guards by scanning for webhook controllers without ProcessedEvent checks. Ensure timestamp tolerance configs exist.

Testing & Verification

  • Use vendor CLI (stripe trigger) to send test events. Run integration tests verifying duplicates are ignored.
  • Write unit tests for the nonce store, including TTL expiry.
  • Simulate clock skew by adjusting system time or mocking Date.now().
  • Run load tests with k6 or autocannon to ensure Redis/Postgres scaling handles bursts.

Common Questions & Edge Cases

What if Redis goes down? Fallback to database or fail closed (return 500 so vendor retries later). Monitor cache availability.

How long should I store event IDs? At least as long as vendors might retry (Stripe: 24h+). Many teams keep 7-30 days for forensics.

Can we rely on vendor idempotency keys? Some APIs provide idempotency keys per request. Use them, but still record events locally.

What about multi-tenant environments? Namespace keys by tenant ID to avoid collisions and cross-tenant leakage.

Do we need WAF rules? Rate limiting and IP allow lists help, but signatures and idempotency are primary defenses.

Conclusion

A secure webhook handler treats every request as potentially repeated. Verify signatures, limit time windows, store processed IDs, and wrap side effects in transactions. With these guardrails in place, retry storms and replay attacks become routine events, not incidents.