Migrating Redis-backed webhook handling to Hookdeck Event Gateway
The point you're at
You already had Redis. So when webhooks needed somewhere to land, you used it. That decision was correct at the time, and for the first year or two it kept being correct. The system worked. Afterall, Redis is a great cache, it's just a bad webhook queue.
What changed is not Redis. What changed is the volume of events flowing through it, the number of providers calling you, and the on-call cost of debugging "why didn't this webhook process?" at 2am.
The symptoms usually cluster around the same few things. A failover loses jobs that were in flight, memory bills creep up, it becomes tricky to replay events. None of these are Redis failing as Redis. They're instead symptoms of the wrong layer carrying the wrong responsibility. The webhook ingest layer wants durability semantics, replay, source-aware retry, signature verification, and observability that an in-memory data store was never trying to provide.
This guide is the migration path off Redis for webhooks and onto Hookdeck Event Gateway, in phases small enough that you don't have to schedule a maintenance window.
The shapes of Redis-backed webhook ingest
BullMQ, Sidekiq, RQ, Celery
An HTTP endpoint in your application receives the webhook, verifies the signature, runs a dedup check, pushes a job onto a Redis-backed queue, and returns 200. Workers pick up jobs and process them. The endpoint, the signature verification, the dedup, the retry config, and the observability all live in application code.
// Node + BullMQ
app.post('/webhooks/stripe', async (req, res) => {
const sig = req.headers['stripe-signature'];
const event = stripe.webhooks.constructEvent(req.rawBody, sig, secret);
const dedupKey = `wh:stripe:${event.id}`;
const fresh = await redis.set(dedupKey, '1', 'NX', 'EX', 86400);
if (!fresh) return res.sendStatus(200);
await stripeQueue.add('process', event, { attempts: 5, backoff: { type: 'exponential', delay: 1000 } });
res.sendStatus(200);
});
This is the most common set up. Sidekiq in Rails looks the same, with perform_async instead of queue.add. RQ and Celery look the same again.
Redis Streams with consumer groups
Closer to a real durable log (XADD, XREADGROUP, XACK) pending entry lists. Better than a list-based queue for some semantics, but still in-memory-first, still bounded by MAXLEN, and still missing HTTP ingestion, signature verification at the queue layer, and source-aware retry.
The migration step that looks different here is Phase 1's payload comparison — streams entries are flat field-value pairs, so the diff against Hookdeck's JSON delivery needs a small transform.
Hand-rolled LPUSH / BRPOP
Same shape as BullMQ but without the library. Usually the oldest of the three. Usually has the most tribal knowledge buried in retry logic — a ZADD for delayed retries, a sidecar process that moves entries from the dead-letter list back into the live list on a cron. The migration step that looks different here is decommissioning: there's no library to uninstall, just code to delete, but it's probably integrated in non-obvious ways.
What you're keeping vs replacing
Replacing:
- The HTTP webhook receiver in your app — the endpoint Stripe, Shopify, GitHub, and the rest currently call.
- Signature verification, wherever it currently lives.
- Redis-based dedup —
SET NX EX, the idempotency table, whichever pattern you settled on. - Retry orchestration — BullMQ retry options, Sidekiq
sidekiq_retry_in, Celeryautoretry_for, the cron that requeues your DLQ. - Bespoke replay scripts.
Keeping:
- Redis itself, for caching, rate limiting, session state, internal job queues for work that isn't webhook ingestion — Redis is good at all of these. The migration is about the webhook ingest layer, not about Redis as a whole.
- Your downstream worker code. Hookdeck Event Gateway delivers to an HTTP endpoint. That endpoint can be a much thinner handler in your app that calls the same worker functions you already have.
Before:
flowchart TB
A[Stripe / Shopify] --> B[Your /webhooks endpoint<br/>verify signature · SET NX dedup in Redis]
B --> C[BullMQ queue in Redis]
C --> D[Workers]
D --> E[App logic]
After:
flowchart TB
A[Stripe / Shopify] --> B[Hookdeck Event Gateway]
B --> C[Your /events endpoint]
C --> D[App logic]
The shape of "what runs in your app" gets smaller. The shape of "what you have to operate" gets smaller. Redis stays, doing the work Redis is good at.
Migration in phases
Plan for 1–3 weeks of calendar time for a small team. The work isn't intense, it's it's just so each phase can have a clean observation window.
Phase 1 — Shadow ingest a low-stakes source
Pick a source where a few hours of dropped events wouldn't hurt anyone. Internal tooling webhooks, notification-only integrations, a staging-tier provider — something where the blast radius is small.
Create the source in Hookdeck Event Gateway. Configure a destination that points at a new shadow endpoint in your app (POST /events/shadow/:source) that logs the payload and headers and does nothing else. For local development, run the Hookdeck CLI to forward events to localhost while you wire it up. Push to a staging environment once the shape is right.
Run for 48 hours. Compare against your existing pipeline:
- Event counts per hour, per source. They should match within a small delta — Event Gateway will catch a few your old endpoint dropped during deploys.
- Payload bodies. They should be byte-identical for the body itself.
- Headers. Event Gateway normalizes some headers and adds its own (
x-hookdeck-*). Note which of your downstream code reads specific request headers (most won't, but the parts that do need updating). - Signature verification. Confirm the source is configured with the correct secret and that Hookdeck Event Gateway is rejecting tampered payloads. Send a test request with a bad signature and confirm it doesn't reach your shadow endpoint.
- Latency distribution. Event Gateway adds a small hop. Measure it. It should not change anything material for asynchronous webhook handling.
Watch for: header-reading code that breaks silently, payload encoding differences if your old endpoint relied on req.rawBody, sources where the provider sends from multiple regions and your event counts diverge because of geographic routing.
Rollback: nothing to roll back. The shadow endpoint is additive. The provider is still calling your old endpoint.
Phase 2 — Mirror to a production destination behind a flag
Repoint the Hookdeck Event Gateway destination from the shadow endpoint to a real production endpoint (POST /events/:source) that calls your existing worker logic. Gate the actual work behind a feature flag, default off.
You now have two ingestion paths into production for this source. Flip the flag on. The provider is still calling your old /webhooks/:source endpoint, so both paths are firing. Your dedup logic (wherever it lives) needs to handle this. If dedup is in Redis at the application layer, it will. If dedup was at the queue layer, you'll need to add an idempotency check in the new endpoint, keyed on the provider's event ID. Hookdeck Event Gateway also dedupes on its side, but the overlap window means you'll briefly see each event twice from two different paths.
Configure the Hookdeck Event Gateway retry policy to match what BullMQ or Sidekiq was doing. If your old config was 5 attempts with exponential backoff starting at 1s, set the same on the Hookdeck Event Gateway destination. Source-aware retry means you can be more aggressive for sources that genuinely benefit (idempotent payment confirmations) and less aggressive for sources that don't (one-shot notifications).
Watch for: double-processing if your dedup was queue-layer only, signature re-verification logic in the app that fails because Hookdeck Event Gateway has already stripped or transformed the signature header, retry storms if the Hookdeck retry config is more aggressive than your old one.
Rollback: flip the feature flag off. The provider is still calling your old endpoint; nothing user-visible changes.
Phase 3 — Flip the source
Update the webhook URL at the provider to point at Hookdeck Event Gateway. Your old /webhooks/:source goes quiet for this source. Hookdeck Event Gateway is now the only ingestion path.
Run for 24 hours of clean operation. "Clean" means: zero delivery failures in Hookdeck Event Gateway that weren't caused by your endpoint returning an error, zero events you'd have expected to see that didn't arrive, zero on-call pages related to this source.
Decommission the source-specific code: the signature verification for this source, the source-specific dedup, the source-specific retry config, the queue-name routing. Keep the worker logic.
Watch for: providers that cache the old webhook URL for longer than documented (rare but real — give it 24 hours before declaring the old endpoint dead), test events from the provider's dashboard still going to the old URL because the test feature has a separate config.
Rollback: change the webhook URL at the provider back. Re-enable the old code path if you've already deleted it (use a branch, not a force-push).
Phase 4 — Repeat per source, lowest-stakes first
Work through every source in increasing order of business impact. The internal tools first, then the low-traffic third-party integrations, then the high-traffic ones, then Stripe last. Each migration goes faster than the previous. The playbook is set, the team has done it before, the rollback is rehearsed.
Don't batch sources into a single phase. One source per cycle keeps the blast radius small and the diagnosis simple.
Phase 5 — Decommission
Once all sources are off the old pipeline:
- Remove the
/webhooks/:sourceHTTP endpoint. - Remove the BullMQ queue, the Sidekiq queue, or the hand-rolled list. Drain it first; confirm it's empty.
- Let the dedup keys TTL out of Redis, or
DELthem in a maintenance window. - Update the runbook (more on this below).
- Keep Redis for everything it was good at to begin with.
Watch for: downstream services that were reading from the old queue directly — analytics workers, audit log builders, anything subscribed to the BullMQ events. Those need to be repointed at the new destination flow, or at an Event Gateway connection that fans out to them.
The things people get wrong
Doing it all in one weekend. The phased per-source rollout is slow on purpose. The slow path is the one that doesn't wake you up at 3am. If your CTO is asking why this is taking three weeks, the answer is "because we want to still have a payment pipeline at the end of it."
Skipping the shadow phase because "we can just diff Redis records." You can't, reliably. Half the records have TTL'd. The other half are missing the headers you didn't think to store. Side-by-side observation against a live shadow endpoint is the cheap insurance.
Re-verifying signatures in the application after Hookdeck Event Gateway has verified them. Unnecessary unless your security review specifically requires double-verification. Event Gateway verifies at the edge; you can configure pass-through of the original signature if you want to, but most teams don't need it.
Forgetting the downstream services that depend on the old queue shape. The analytics worker that was subscribed to BullMQ completion events. The audit log builder reading from a Redis stream. The Slack alert that fired on dead-letter list growth. These are not the webhook path, but they touch it. Inventory them in Phase 1, repoint them in Phase 5.
Not updating the runbook. The on-call still has the old debugging steps: SSH to the Redis host,
KEYS wh:stripe:*, grep through worker logs. The new debugging step is "open the event in Hookdeck and read the trace." Update the runbook before you flip the last source, not after.
What doesn't change
Hookdeck Event Gateway is provider-agnostic and stack-agnostic. Your downstream destination is an HTTP endpoint — the worker code behind it is yours, in your language, in your deployment, unchanged. Stripe still sends what Stripe sends. Your Rails worker still does what your Rails worker did. The integration surface is one HTTP handler per source, or one shared handler that routes on a header.
Redis stays in your stack. Caching, rate limiting, session state, internal async work that isn't webhook ingest — Redis is the right tool for all of these. You're not migrating off Redis. You're migrating webhook ingestion to a layer designed for it, and leaving Redis to do the things it was always good at.
The migration is bounded. One HTTP endpoint goes away. One queue goes away. A handful of source-specific verification and dedup paths go away. Nothing else needs to move.
Start a migration. Pricing details at hookdeck.com/pricing — free tier covers most spikes, paid tiers start at $39/mo.
FAQs
How long does a Redis-to-Hookdeck webhook migration take?
Plan for one to three weeks of calendar time for a small team. The work isn't intense; the waiting between phases is the point. Each phase ends with a clean observation window, and the migration is bounded to the webhook ingestion layer.
Do I have to migrate off Redis entirely?
No. You keep Redis for caching, rate limiting, session state, and internal job queues for non-webhook work. The migration moves only the webhook ingestion layer (one HTTP endpoint and one queue) to a layer designed for it.
What changes in my application code?
Very little. Hookdeck Event Gateway delivers to an HTTP endpoint, and the worker code behind it stays in your language and deployment. The integration surface becomes one HTTP handler per source, or one shared handler that routes on a header, calling the same worker functions you already have.