Gareth Wilson Gareth Wilson

What RabbitMQ doesn't give you for webhooks (and what you'll end up building)

Published


RabbitMQ is a serious piece of infrastructure. It has been in production at scale for over a decade, the AMQP semantics are well understood, and quorum queues finally settled the long-running debate about HA. If you already run it for internal jobs or as the spine of an event-driven architecture, reaching for it when webhooks need a durable buffer is a reasonable instinct. A queue is a queue.

The trouble isn't that RabbitMQ is the wrong primitive at the webhook edge. It's that the queue is roughly 20% of what a webhook system actually needs. The other 80% is what you end up building around it — and that's the part worth thinking about before you commit another quarter of engineering time to it. This piece walks through what that 80% looks like.

The queue is not the hard part

Webhook ingress has a particular shape. Producers (like Stripe, Shopify, Twilio, GitHub, or your customer's bespoke ERP) open an HTTP connection to your endpoint and expect a 2xx response inside a tight window. Stripe times out at 10 seconds and treats anything else as a retry trigger. Shopify retries on any 5xx according to its own backoff schedule. Some providers cap their retry budget at 19 attempts and then drop the event on the floor.

That contract has implications. The ingestion tier has to ack fast, even when whatever is behind it is slow. It has to absorb spiky bursts (a Black Friday storm, a provider replaying a backlog after their own outage, a customer flipping a feature flag that triggers thousands of customer.updated events). And it has to do this without giving the producer a reason to retry, because producer retries multiply your load at exactly the wrong moment.

RabbitMQ does the durable-buffer part of this well. Publish a message, get a confirm, sleep at night. That's the 20%. Now let's look at the rest.

The HTTP ingestion tier you'll build

RabbitMQ is not an HTTP server. So you put something in front of it: a fleet of small services that terminate TLS, parse the request, publish to an exchange, and return 200. This is fine until it isn't.

The ingestion tier has its own scaling story (CPU-bound on TLS and JSON parsing, not IO-bound like the broker), its own deployment cadence, its own failure modes. When RabbitMQ is briefly unreachable (a rolling restart, a network partition between AZs) the ingestion tier needs a local durable spool or it drops events. That's another component. When traffic spikes 50x during a sale, the ingestion tier needs to autoscale faster than the broker's publisher_confirm latency degrades, or you start NACKing producers who will then retry, compounding the spike.

None of this is unsolvable. It's just that you now own an HTTP edge tier in addition to a message broker, and the failure mode of webhooks is almost never "the broker fell over." It's almost always something at the edge.

Signature verification, and where you do it

Every major producer signs payloads, and they all do it differently. Stripe uses HMAC-SHA256 over a timestamp-prefixed body with a tolerance window. Shopify uses HMAC-SHA256 base64-encoded in a header. GitHub uses HMAC-SHA256 hex-encoded with a sha256= prefix. Slack uses a versioned signing scheme with its own timestamp check. A handful of enterprise providers use JWS.

You have two options. Verify in the application consumer, after RabbitMQ has already accepted the message (which means unauthenticated traffic is sitting in your durable queue, taking up disk, and any consumer that forgets to verify is a CVE). Or verify at the edge, in the ingestion tier, before anything is published (which means the ingestion tier now has to know every signing scheme you accept, manage rotating signing secrets per source, and stay current with provider changes).

Most teams end up doing it at the edge, because the alternative is worse. That makes the ingestion tier a non-trivial component with security-sensitive code in it. It needs its own review cycle, its own secret management, its own per-source configuration. It is not a thin proxy in front of RabbitMQ anymore.

Source-aware retries

Different downstream destinations have different retry semantics. A flaky third-party API wants exponential backoff over 24 hours with jitter. An internal service wants three fast retries and then dead-letter. A billing reconciliation pipeline wants exactly one attempt, then human review. A webhook that triggers a customer-visible email wants to give up after two failures, not keep trying for a day.

RabbitMQ supports retry through DLX, TTL, and republish patterns. You can wire up a delay exchange with rabbitmq-delayed-message-exchange, route failures to a parking queue with a TTL, and have them republish back to the main exchange. It works.

What you end up with, though, is a per-destination configuration that lives in code or in a homegrown config service:

destinations:
  billing_reconciler:
    max_attempts: 1
    dlx: human_review
  internal_search_index:
    max_attempts: 3
    backoff: fixed_30s
    dlx: parking_internal
  partner_api_v2:
    max_attempts: 12
    backoff: exponential
    base_ms: 1000
    max_ms: 3600000
    jitter: true

Plus the consumers that read this config, plus the operator tooling to pause a destination when it's in a bad state, plus the dashboards to know which destinations are currently retrying, plus the runbooks for "destination X has 40k messages in its parking queue, what now."

Idempotency, keyed on the upstream event

Webhook producers retry. The same Stripe evt_1Nq... will arrive two or three times across a week, especially if a previous delivery timed out. Your queue will faithfully accept all of them.

You need a deduplication layer in front, and it has to know which field carries the idempotency key for each producer. Stripe uses id. Shopify uses X-Shopify-Webhook-Id. GitHub uses X-GitHub-Delivery. Twilio doesn't really give you one and you have to construct it from message SID plus event type. Your customer's bespoke ERP gives you nothing and you have to hash the body.

RabbitMQ has no concept of "this message is logically the same as one I saw two minutes ago." So you build a dedupe layer (usually Redis with a TTL, sometimes Postgres if you want the audit trail) that the ingestion tier checks before publishing. Now you have a third stateful component (broker, dedupe store, ingestion tier with secrets) and a fourth (the config that maps producer to idempotency-key field).

Lifecycle observability

The question that arrives in your inbox at 3pm on a Tuesday: "Why didn't you process the webhook I sent at 14:32?"

Answering that requires the full lifecycle of one event. When did it hit the ingestion tier. Did signature verification pass. Did dedupe consider it a duplicate. When did it land in RabbitMQ. Which consumer picked it up. How many attempts. What HTTP status came back from each attempt. What was the eventual outcome.

RabbitMQ shows you queue depth, consumer counts, ack/nack rates, message rates. It does not show you "the journey of this one Stripe event." That view is something you assemble — typically a correlation ID stamped at ingestion, propagated through the publish headers, logged at every stage, and queried across three log sources (ingestion logs, broker logs, consumer logs) joined on the correlation ID. Plus a Grafana dashboard that two people in the company fully trust and one of them is on PTO.

This is buildable. It is also the thing engineers find themselves rebuilding every 18 months as the stack evolves.

Replay and backfill

"Replay every Shopify orders/create event between Tuesday 10:00 and 11:00 to the new fulfillment service." This is a normal request, but it is not a normal thing for AMQP to do.

Once a message is consumed and acked, it's gone. To replay, you need durable retention of the original event payloads beyond queue lifetime (typically S3 or Postgres) an index by source and event type and timestamp, and an operator UI or CLI that lets someone select a window and a target and trigger the replay. None of that is in RabbitMQ. You build it.

The version of this that exists in most companies is "we have the raw payloads in S3, and there's a Python script in ops/scripts/replay.py that someone wrote in 2022, and we run it carefully."

Transformation

Strip a PII field before it goes to the analytics consumer. Reshape Shopify's orders/create into your internal Order schema. Add a tenant ID derived from the API key the request came in with. Drop events where livemode: false in production.

RabbitMQ does not transform. Either your consumers do it (and every consumer of the same event re-implements it), or you put a transformation step in the pipeline (a sidecar, a stream processor, a Lambda) which is another component to operate.

The failure mode of a webhook system on RabbitMQ is almost never RabbitMQ itself — it's the dedupe Redis, the ingestion tier's TLS termination, the per-destination retry config that drifted, the replay script that no longer matches the current consumer schema.

What a managed webhook gateway actually is

A managed webhook gateway is the seven things above bundled with the queue. Not a replacement for RabbitMQ as a message broker (RabbitMQ is still a fine choice for internal job queues). A gateway sits at the webhook edge specifically.

Hookdeck Event Gateway is one option, but the argument holds regardless of which gateway you choose, or whether you decide to keep building yours. Concretely, what a gateway like Hookdeck gives you:

  • An ingestion edge that ack's producers fast and absorbs bursts, with a durable queue behind it.
  • Signature verification configured per source, applied before anything enters the queue.
  • Source-aware retries with exponential backoff, configurable per destination, with manual retry from a UI.
  • Idempotency keyed on the producer's identifier, configurable per source.
  • Full-text searchable history of every event, request, attempt, and response.
  • A replay UI that operates on the retained history, scoped by source, destination, and time window.
  • Compatible traces and metrics so the lifecycle view lives in your existing observability stack.

The question isn't whether your team can build all of this around RabbitMQ. The question is whether the next two quarters of webhook gateway work is the highest-leverage thing those engineers could be doing, given that none of it differentiates your product.

Try it against your current setup

Try Hookdeck Event Gateway in front of your existing RabbitMQ consumer and see the difference end-to-end.

FAQs

Is RabbitMQ a good choice for webhook ingestion?

RabbitMQ handles the durable-queue job well, but that is roughly 20% of what a webhook system needs. The other 80% (HTTP ingestion, signature verification, source-aware retries, idempotency, lifecycle observability, replay, and transformation) is code and infrastructure you build and maintain around the broker.

What does a webhook system need beyond a queue?

An HTTP ingestion tier that ack's producers fast and absorbs bursts, per-source signature verification, source-aware retry policy, idempotency keyed on the upstream event ID, lifecycle observability per event, replay and backfill, and in-flight transformation. RabbitMQ provides none of these natively.

Should I keep RabbitMQ if I adopt a webhook gateway?

Yes, if RabbitMQ is doing internal job-queue or event-driven work behind your own services. A managed webhook gateway sits at the webhook edge specifically; it is not a replacement for RabbitMQ as an internal message broker.


Gareth Wilson

Gareth Wilson

Product Marketing

Multi-time founding marketer, Gareth is PMM at Hookdeck and author of the newsletter, Community Inc.