Gareth Wilson Gareth Wilson

Migrating from a RabbitMQ-backed webhook pipeline to Hookdeck

Published


If you're reading this, your webhook stack probably started as a hundred lines of Express or FastAPI in front of a RabbitMQ exchange. It worked. Then Stripe got added. Then Shopify. Then the partner integrations team showed up with four more sources, each with its own signature scheme and retry behaviour. The ingestion edge that was supposed to be a thin shim is now a service with its own on-call rotation.

This piece is about getting off that treadmill without burning a quarter on a rewrite. It covers what to keep, what to throw away, and how to move sources one at a time so you can roll back any single step without paging anyone. The destination is Hookdeck Event Gateway — managed webhook infrastructure that takes over the ingestion edge.

Why teams hit the wall

The RabbitMQ part of the system is usually fine. RabbitMQ is good at what it does: durable queues, fan-out, consumer groups, well-understood operational properties. That's almost never what breaks.

What breaks is the scaffolding. The HTTP edge service that started as forty lines of signature verification has grown into a small platform: per-source verification logic, a Redis dedup layer that someone added after a duplicate-charge incident, retry orchestration code that branches on which provider sent the event because Stripe and Shopify disagree about what "retry" means, and a replay tool that's really just a Python script someone wrote during an outage.

Then the support tickets start. "Why didn't event X process?" becomes a thirty-minute investigation across application logs, the edge service's logs, RabbitMQ's management UI, and a Grafana board that someone hand-built last year. Replay requests from customers turn into engineering tasks because the replay tool only handles the happy path. The retry config drifts into an internal DSL that nobody documented because the person who wrote it is now on the platform team.

None of these are RabbitMQ's fault. They're the cost of owning the webhook ingestion edge as a custom service. The job is small enough that building it feels reasonable on day one, and large enough that maintaining it eats a senior engineer's afternoon every week by year two. That's the wall.

What you're actually keeping vs replacing

You are replacing:

  • The HTTP ingestion service in front of RabbitMQ.
  • The per-source signature verification layer.
  • The dedup layer (almost always Redis-backed, almost always written in a hurry).
  • The retry orchestration code, including the source-specific branches.
  • The replay tooling — the script, the runbook, and the Slack channel where requests pile up.
  • The patchwork observability across logs, RabbitMQ's UI, and whatever dashboards have accreted.

You are not necessarily replacing:

  • RabbitMQ itself. If you use it for internal job orchestration downstream of the webhook destination (fan-out to workers, priority queues, retry of internal jobs that aren't webhook-shaped) then keep it. Hookdeck Event Gateway terminates the webhook lifecycle. It receives, verifies, deduplicates, queues, retries, and delivers the event to a destination. That destination can be the same HTTP endpoint in your app that currently consumes off RabbitMQ, or it can be a new endpoint that fans work back into RabbitMQ for internal processing.
  • Your application code. The handler that does the actual business logic doesn't care whether the request came from your old edge service or from Hookdeck Event Gateway.

The architecture shift is narrow: Hookdeck Event Gateway handles the webhook edge; RabbitMQ stays for internal async work if it still earns its keep.

Before:

flowchart TB
    A[Stripe / Shopify / etc] --> B[Your HTTP edge service<br/>signature verify · Redis dedup · retry orchestration]
    B --> C[RabbitMQ exchange]
    C --> D[Consumers]
    D --> E[App]

After:

flowchart TB
    A[Stripe / Shopify / etc] --> B[Hookdeck Event Gateway]
    B --> C[Your app endpoint]
    C --> D[RabbitMQ<br/>optional — internal jobs]
    D --> E[Internal workers]

The blast radius of the migration is bounded to the box labelled "your HTTP edge service" and the things bolted to it. Everything past your app endpoint is untouched.

Migration in five steps

Plan on one to three weeks of calendar time for a small team. Most of that is waiting between phases to let real traffic prove each step out, not engineering hours.

Step 1 — Shadow ingest

Pick the lowest-stakes source you have. A vendor sending non-critical notifications — calendar pings, marketing event hooks, etc.

Create a source in the Event Gateway dashboard for that provider. Hookdeck Event Gateway has pre-configured sources for 120+ providers, so signature verification is handled by selecting the provider rather than writing code. Add a destination that points at a non-production endpoint — a small worker that logs the verified payload and acknowledges, nothing else.

Then, in the provider's dashboard, add Hookdeck's source URL as a second webhook destination alongside your existing edge service URL. Most providers allow multiple webhook endpoints; for the ones that don't, you can use a CLI-based local relay to inspect traffic without touching the provider config. The Hookdeck CLI works well for this — it lets you tail events as they arrive, replay them locally, and forward to your dev machine, so you can validate behavior against the same traffic that's going to production.

Let it run for 48 hours. Watch for:

  • Event count parity between Event Gateway and your existing pipeline. Small differences (1–2 events) usually mean ordering or timing; large differences mean something is wrong.
  • Payload diffs on a sample of events. Hookdeck delivers the original payload; if your app sees something different, find out why before moving on.
  • Verification failures on the Hookdeck side. These should be zero. If they're not, the source configuration is wrong.

Rollback: there is nothing to roll back. The shadow path is additive.

Step 2 — Mirror to production destination

Change the Hookdeck Event Gateway destination from the logging worker to your real production endpoint, behind a feature flag or a write-side mirror. The flag should be readable per-request so you can flip it without redeploying.

You now have two paths into production for the same events. The dedup layer in your app (whether it's the Redis one or a database constraint) should catch the overlap. If it can't, that's a problem worth knowing about before you flip anything.

Watch for:

  • Duplicate side effects (charges, emails, notifications). If you see any, your idempotency layer in the app has gaps that the old edge service was hiding.
  • Header propagation. Event Gateway forwards the original event with a set of X-Hookdeck-* headers; if your app relies on a header your old edge service was injecting (a trace ID, a tenant tag), add that as a transformation in Hookdeck.
  • Re-verification logic in your app. If your handler re-verifies the signature, you'll need to either pass through the original signature header (Hookdeck can do this) or remove the re-verification.

Rollback: turn off the feature flag. Traffic continues through the old edge service exclusively.

Step 3 — Flip the source

Update the webhook URL in the provider's dashboard to point at Hookdeck Event Gateway only. The old edge service stops receiving that source's events. Your app continues to receive them, now exclusively via Hookdeck.

There will be a short overlap window (a few seconds to a few minutes) where in-flight events from the provider might land at the old endpoint and Hookdeck Event Gateway might also deliver them. The app's dedup catches it.

Watch for:

  • A drop to zero on the old edge service's metrics for that source. If it's not zero after ten minutes, the provider didn't accept the URL change.
  • Delivery success rate on the Event Gateway side. Should be at or above what the old pipeline was doing.
  • Issues raised in the Hookdeck Event Gateway dashboard. Hookdeck surfaces failed deliveries, transformation errors, and source-side problems as Issues, which is the replacement for the Grafana-and-logs investigation pattern.

After 24 hours of clean ingestion, decommission the parallel path for that source. Don't decommission anything else yet.

Rollback: change the provider's webhook URL back to your old edge service. The old code path is still live; this is a single config change in the provider's dashboard.

Step 4 — Repeat per source, in increasing order of stakes

Do the lowest-stakes source first, then the next, and so on. Stripe payment events go last. The order matters because the playbook tightens with each source — by the time you're on number four, you've already seen the failure modes and the rollback paths are muscle memory.

Two practical notes. First, sources with high event volume (Shopify orders during a sale) are worth running shadow ingest for longer than 48 hours so you cover at least one daily peak. Second, sources with low event volume (a billing provider that fires twice a day) need a manual replay during shadow to confirm the path works at all — Hookdeck's replay function does this in two clicks.

Rollback per source: same as Step 3. Flip the provider URL back. Each source migration is independent.

Step 5 — Decommission what you don't need

Once every source is on Hookdeck Event Gateway, the old ingestion service has no traffic. Take it down. Remove the signature verification code, the Redis dedup layer (unless something else in the app uses it), and the retry orchestration. Delete the replay script.

RabbitMQ stays if it's still doing useful work downstream. If RabbitMQ was only there to absorb webhook bursts, and your app endpoint now handles Hookdeck's delivery directly, you can take it down too. Most teams keep it.

The patchwork observability gets replaced with Hookdeck's lifecycle view (every event's full trace from receipt to final delivery, searchable by payload content, plus alerting on delivery failures) combined with your existing application logs for what happens after the handler.

Watch for: anything in your runbook that referenced the old edge service. Grep the runbook repo. Update or delete.

The things people get wrong

A few patterns show up across migrations:

  • Trying to do it all in one weekend. The phased per-source rollout is slower but it never produces the 3am "is the new system silently dropping events?" panic. The whole point of the shape above is that any single step is independently reversible.

  • Re-implementing signature verification in the application after Hookdeck Event Gateway has already verified the event. If your app insists on re-verifying, Hookdeck can pass through the original signature header so this still works, but most apps don't actually need to. The verification was put there because the edge service was untrusted; Hookdeck verifies before delivery, so the app handler is receiving traffic that's already been authenticated against the source.

  • Forgetting about retry history. If Stripe has retried event X twelve times against your old endpoint and you flip the URL, the thirteenth delivery is still the thirteenth from Stripe's perspective. Hookdeck handles its own retry budget separately, but the source's view of "this event has been failing all morning" doesn't reset. Log this in the runbook before flipping any high-volume source.

  • Underestimating the tribal knowledge in the old ingestion service. There is always something — a special case for a provider that sends malformed JSON on Tuesdays, a header rewrite that nobody documented, a timeout value that was tuned after a specific incident. Pair the migration engineer with whoever wrote the original. If that person has left, schedule a week to read the code and the git blame carefully before starting Step 1.

Where to start

If you want to spike on this without committing, start on the free tier and run shadow ingest against a single low-stakes source. The free tier is enough to validate that the path works for your stack; paid plans (from $39/mo, full breakdown at hookdeck.com/pricing) add the 99.999% uptime SLA, longer retention, and the volume you'll need once you migrate real sources.

FAQs

How long does a RabbitMQ-to-Hookdeck webhook migration take?

Plan on one to three weeks of calendar time for a small team. Most of that is waiting between phases to let real traffic prove each step out, not engineering hours.

Do I have to remove RabbitMQ to use Hookdeck?

No. Hookdeck terminates the webhook lifecycle at the destination. If RabbitMQ is doing internal job orchestration downstream of the webhook destination (fan-out to workers, priority queues, retry of internal jobs) then keep it. You are replacing the HTTP ingestion edge, not RabbitMQ itself.

Can I roll back a RabbitMQ-to-Hookdeck migration?

Yes. The migration is phased per source, and any single step is independently reversible. Until you flip the provider's webhook URL, rollback is turning off a feature flag. After flipping, rollback is changing the provider's webhook URL back to your old edge service.


Gareth Wilson

Gareth Wilson

Product Marketing

Multi-time founding marketer, Gareth is PMM at Hookdeck and author of the newsletter, Community Inc.