// concept

Webhook retries and idempotency

Updated 2026-05-10

The retry storm nobody warns you about

Every serious webhook provider retries on failure. Stripe retries with exponential backoff for ~3 days. GitHub retries for 24h with a smaller schedule. Discord retries for a few minutes. Shopify retries 19 times over 48h.

This is good — your handler can be down and you don't lose events.

This is also bad — your handler can be slow and you'll get the same event 5 times. Worse: your handler can be successful but slow enough that the provider's TCP timeout fires before your 200 lands. The provider sees a timeout and retries. You see a duplicate.

If your handler does anything with side effects (charge a card, send an email, kick off a refund), retries become a correctness bug.

What "idempotency" means in this context

A handler is idempotent if processing the same event twice produces the same outcome as processing it once. Concretely: a second delivery is a no-op.

The standard technique:

  1. Each event carries a unique id (Stripe puts it in event.id, GitHub uses the X-GitHub-Delivery header, Shopify uses X-Shopify-Webhook-Id).
  2. Your handler checks "have I seen this id before?" by looking it up in a database (or a Redis set with TTL).
  3. If yes — return 200 immediately, do nothing else.
  4. If no — record the id, then process the event, then commit the transaction so both the side effect and the dedup record land atomically.
async function handle(req, res) {
  const event = verify(req)
  const seen = await db.processedEvents.findOne({ id: event.id })
  if (seen) return res.status(200).send() // already handled
  await db.transaction(async tx => {
    await tx.processedEvents.insert({ id: event.id, at: Date.now() })
    await applyEvent(tx, event)
  })
  res.status(200).send()
}

The transaction is the load-bearing part. Without it you can crash between the dedup-record insert and the side effect, leaving the event processed-on-paper but not in fact.

TTLs on the dedup table

You don't need to keep dedup records forever. Stripe's longest retry window is ~3 days. GitHub's is 24h. Setting a TTL of 7-14 days covers everyone with margin.

Postgres: a partial index on (id) WHERE created_at > now() - interval '14 days' plus a daily DELETE job. Redis: SETEX dedup:$id 1209600 1 (14 days in seconds). DynamoDB: TTL attribute on the item.

Idempotency keys (the API client side)

When you call third-party APIs from your webhook handler — for example, when you receive a Stripe payment_intent.succeeded event and need to call your shipping provider's "create label" API — you also need idempotency on the outbound side.

Stripe and most modern API providers accept an Idempotency-Key header. You set it to a deterministic value (often the inbound webhook's event id), and if the request retries (you crash, your network blips), the provider deduplicates server-side. Use it. It's free correctness.

await fetch('https://api.shippingco.com/labels', {
  method: 'POST',
  headers: {
    'Idempotency-Key': event.id, // determined by Stripe; deterministic
    Authorization: 'Bearer ' + KEY,
  },
  body: JSON.stringify({ orderId: event.data.object.metadata.orderId }),
})

Out-of-order delivery

The other ugly truth: webhook events are not strictly ordered. Stripe explicitly says so. If a customer signs up and immediately upgrades, you might get the subscription.updated event before the subscription.created event. Don't assume order — instead, fetch the current state from the API when you handle the event, or check whether your local model is stale and refresh.

Replay during development

Every webhook provider has a "redeliver" button in their dashboard. Use it instead of re-creating test transactions. Combined with a stable dev URL and the request inspector, the loop is: trigger event → see it land → fix handler → click redeliver → see the fix work. Seconds per iteration.

What lrok adds

The request inspector shows every webhook your dev server received with the raw body, headers, response code, and response time. If you're investigating whether a 200 actually landed in time (vs. a timeout-then-retry), the latency is right there.

$ lrok http 3000 --hint stripe-dev
Forwarding https://stripe-dev.lrok.io  ->  http://127.0.0.1:3000

Open lrok.io/dashboard, watch every retry land, and confirm your dedup logic returns 200 the second time without doing the side effect again.

// shipping?

lrok gives your localhost a public HTTPS URL with a reserved subdomain on the free plan. $9/mo flat for unlimited.

Related