Webhook Replay Shields: Building Idempotent Handlers That Do Not Blink



Hook: The Retry Storm That Paid the Same Invoice Twice
Your billing system consumes Stripe webhooks. A transient network blip causes Stripe to retry the invoice.paid event five times. Your handler validates the signature, but the processing logic is not idempotent: it credits the customer account and triggers an email each time. Later that month, an attacker replays older events with valid signatures, causing duplicate payouts. You discover there is no replay guard, no nonce table, and no visibility into which events were processed.
In this article we build webhook handlers that absorb replays safely. We cover signature verification, nonce storage, rate limiting, and tests you can run locally with real payloads. Examples use Express (Node.js) and Rails, but the patterns apply widely.
The Problem Deep Dive
Webhooks are attractive targets because:
- Signatures validate authenticity but not uniqueness. Attackers can replay signed payloads within allowed windows.
- Handlers often mutate state. Idempotency is an afterthought.
- Infrastructure retries. Load balancers or API gateways duplicate requests.
- Clock skew. Timestamp validation fails when system clocks drift.
Example anti-pattern in Node:
app.post("/stripe/webhook", bodyParser.raw({ type: "application/json" }), (req, res) => {
const event = stripe.webhooks.constructEvent(req.body, req.headers["stripe-signature"], secret);
if (event.type === "invoice.paid") {
creditAccount(event.data.object.customer);
}
res.sendStatus(200);
});
No replay detection, no logging, and creditAccount is not idempotent.
Technical Solutions
Quick Patch: Idempotency Store
Store event IDs in a durable store (Redis, Postgres).
const processedKey = `stripe:event:${event.id}`;
const inserted = await redis.set(processedKey, "1", { NX: true, EX: 24 * 3600 });
if (!inserted) {
return res.sendStatus(200);
}
Rails example:
processed = ProcessedEvent.find_or_initialize_by(event_id: event.id)
return head :ok if processed.persisted?
processed.save!
Durable Fix: Signature + Timestamp + Nonce
- Verify signature using vendor SDK.
- Validate timestamp within a small window (5 minutes).
- Store event ID with TTL.
- For mutable operations, wrap business logic in transactions.
Node (Express):
const tolerance = 5 * 60;
const signature = req.headers["stripe-signature"] as string;
const event = stripe.webhooks.constructEvent(req.body, signature, secret, tolerance);
const inserted = await redis.set(processedKey, now.toString(), { NX: true, EX: 7 * 24 * 3600 });
if (!inserted) {
logger.info({ eventId: event.id }, "duplicate webhook ignored");
return res.sendStatus(200);
}
await processInvoice(event.data.object);
res.sendStatus(200);
Event Schema Validation
Ensure payload matches expected structure. Use zod in Node or ActiveModel::Type in Rails to coerce types. Reject unexpected fields to avoid deserialization exploits.
Side Effects in Transactions
Wrap state mutations in transactions:
ActiveRecord::Base.transaction do
account = Account.lock.find_by!(stripe_customer: customer_id)
account.credit!(amount)
WebhookLog.create!(event_id: event.id, payload: payload)
end
If processing fails, delete the processed marker or use SETNX with short TTL so replays retry work.
Monitoring and Alerting
- Log event IDs, timestamps, client IPs.
- Expose metrics for
webhook.duplicates,webhook.signature_failures. - Alert when duplicates spike.
Alprina Policies
Detect missing replay guards by scanning for webhook controllers without ProcessedEvent checks. Ensure timestamp tolerance configs exist.
Testing & Verification
- Use vendor CLI (
stripe trigger) to send test events. Run integration tests verifying duplicates are ignored. - Write unit tests for the nonce store, including TTL expiry.
- Simulate clock skew by adjusting system time or mocking
Date.now(). - Run load tests with k6 or autocannon to ensure Redis/Postgres scaling handles bursts.
Common Questions & Edge Cases
What if Redis goes down? Fallback to database or fail closed (return 500 so vendor retries later). Monitor cache availability.
How long should I store event IDs? At least as long as vendors might retry (Stripe: 24h+). Many teams keep 7-30 days for forensics.
Can we rely on vendor idempotency keys? Some APIs provide idempotency keys per request. Use them, but still record events locally.
What about multi-tenant environments? Namespace keys by tenant ID to avoid collisions and cross-tenant leakage.
Do we need WAF rules? Rate limiting and IP allow lists help, but signatures and idempotency are primary defenses.
Conclusion
A secure webhook handler treats every request as potentially repeated. Verify signatures, limit time windows, store processed IDs, and wrap side effects in transactions. With these guardrails in place, retry storms and replay attacks become routine events, not incidents.