Gareth Wilson Gareth Wilson

BullMQ Alternatives for Webhook Retries: Hookdeck Event Gateway, pg-boss, Cloud Queues, and Convoy Compared

Published


BullMQ is the default async-work tool in the Node ecosystem, so it becomes the default tool for webhook retries the moment an Express route starts seeing 5xx responses or you hit a rate limit. Add the job with queue.add(), point the route at it, and you have something that looks like webhook resilience.

It works fine at first. But then Redis becomes a critical-path service you operate. Signature verification ends up copy-pasted in every webhook route. Webhook storms hit your Express or NestJS process before they hit Redis, so your application becomes the throttle. And when a payload fails after the last attempt, you have a BullMQ job ID (not a webhook record) and you're left correlating Bull Board with Stripe's dashboard to figure out what happened.

In this article, we'll look at the pros and cons of using BullMQ for webhook retries and compare it with alternatives: Hookdeck Event Gateway, pg-boss, Bee-Queue, cloud queues like AWS SQS and Google Cloud Tasks, and Convoy. Working in a different stack? See the equivalent guides for Celery in Python and Sidekiq in Ruby.

How to evaluate a BullMQ alternative for webhook retries

Comparing BullMQ alternatives is tricky because they don't all sit in the same place in the stack. pg-boss and Bee-Queue are like-for-like Node task queues. Hookdeck and Convoy are webhook gateways that sit in front of your application. SQS and Cloud Tasks are cloud primitives you wire up yourself. So we'll compare them on the things that actually matter when retries are on the line:

Webhook semantics: Does the tool understand signatures, idempotency keys, and source identity, or does that logic live in your Node code?

Failed-payload visibility: When a retry exhausts, do you see the original webhook (headers, body, timestamps) or just a job ID?

Per-source retry policy: Can you configure "Stripe retries for 7 days, Shopify retries for 1 hour" without code changes?

Operational footprint: Do you run Redis, a worker process, an autoscaler, and the monitoring around all of it?

Node ecosystem fit: Does the alternative require a rewrite of your handler, or does it sit upstream and let your existing route stay as-is?

Serverless fit: Does it assume a long-running worker process, or does it work with Lambda, Vercel, and Cloud Run functions where there's nowhere to run a persistent BullMQ worker?

Cost: Including hidden Redis cost, worker compute, and the engineering hours spent maintaining the pipeline.

What BullMQ does (and why Node developers reach for it on webhooks)

BullMQ is a Redis-backed queue for Node.js, the modern successor to Bull. You define a queue, push jobs onto it, and run a worker process that pulls and executes them. It ships with retries, backoff, delayed jobs, repeatable jobs, rate limiting, and flow/parent-child jobs. Bull Board, Taskforce, and the BullMQ dashboard add monitoring.

BullMQ key features

  • Redis-backed queue: Jobs are serialized to Redis and workers pull and process them.
  • Retries and backoff: attempts plus backoff: { type: 'exponential' | 'fixed', delay } covers most basic cases. Custom backoff strategies are supported.
  • Delayed and repeatable jobs: Schedule work for later or on a cron.
  • Rate limiting and concurrency: Limit jobs per worker and per duration.
  • Flows: Parent-child job graphs for multi-step work.
  • Dashboards: Bull Board and similar give a web UI over queues, jobs, and failures.

Why BullMQ gets used for webhook retries

It's already in the stack. If your Node project has any background work, BullMQ (or Bull) is probably handling it. Adding webhook retries means pushing a job from the route handler, so no new service, dependency, or review meeting. The retry semantics map onto webhook needs at first glance: catch a network error, back off, try again.

Where BullMQ starts to hurt in a webhook context

The ack-before-queue gap. The standard Express pattern is: route validates the signature, calls queue.add(...), and returns 200. If the process crashes or Redis is unreachable between those two lines, the webhook is gone. BullMQ never sees it because it never arrived.

Signature verification redundantly coded per provider. Stripe HMAC-SHA256 in one route, Shopify HMAC-SHA256 in another, GitHub HMAC-SHA1 in a third. Each implementation is a security surface and a source of subtle bugs (raw-body handling, timing-safe comparison, header parsing). See how to implement SHA256 webhook signature verification for what this involves per provider.

Webhook spikes hit Node, not Redis. When Stripe sends 5,000 events in 30 seconds during a promotion, your Node process is the first to feel it, the queue only sees what makes it past the event loop. BullMQ doesn't smooth the spike; it receives whatever survived. See how to protect your server from crashing during webhook spikes for what this looks like in practice.

Serverless doesn't fit. BullMQ workers are long-running processes. On Vercel, Netlify Functions, or Lambda there's nowhere to run them, you end up bolting on a separate always-on container just to drain the queue, which undermines the reason you went serverless.

Heroku H12 timeouts. Requests that take longer than 30 seconds get killed with an H12. If a queue.add blocks on a slow Redis connection, the webhook 503s and the provider retries, duplicating work.

Visibility through a job lens, not a webhook one. Bull Board shows you jobs, not webhooks. When attempts exhausts, you have a failed job with a stack trace but no record showing the original payload, the headers, the source provider, or the full retry trail.

Retry policy is global and code-defined. Changing "Stripe retries for 7 days, GitHub retries for 1 hour" means a code deploy. For what good per-source retry behaviour looks like, see webhook retry best practices.

BullMQ alternatives for webhook retries

If you're rethinking webhook retries, here are the alternatives we'll cover:

  • Hookdeck Event Gateway: Managed webhook infrastructure that sits in front of your Express, NestJS, or Fastify handler, providing signature verification, durable queue, per-source retry policies, replay, and observability without a Redis instance to operate.
  • pg-boss and Bee-Queue: Other Node-native queues if you want to stay inside the task-queue paradigm but want Postgres-backed durability (pg-boss) or a lighter Redis option (Bee-Queue).
  • AWS SQS, Google Cloud Tasks: Cloud-native queues that trade Redis ops for IAM and queue config, and fit serverless better.
  • Convoy: Open-source self-hosted webhook gateway.

Hookdeck Event Gateway

Hookdeck Event Gateway takes a different architectural position to BullMQ: it sits in front of your Node app rather than behind it. The webhook reaches Event Gateway first, gets verified against the source's signing secret, gets deduplicated, and is held in a durable queue. Event Gateway then delivers it to your route at a rate the application can handle, and retries per source, with the policy you configured, when delivery fails. Your route becomes a thin endpoint that processes events Event Gateway has already handled the operational concerns for.

Hookdeck Event Gateway key features

  • 120+ pre-configured sources: Stripe, Shopify, GitHub, Twilio, and the rest ship with signature verification handled, so no HMAC code in your routes.
  • Per-connection retry policy: Configure exponential, linear, or custom intervals; choose max-attempts, until-success, or a time-bounded window. Tune per source, no deploy. See the retries documentation.
  • Failed-payload visibility: Issues capture the full original webhook (headers, body, query, timestamp) alongside the retry trail. Full-text search across event history.
  • JavaScript transformations: In-flight enrichment via the Transformations editor, written in JS, so familiar territory for Node developers.
  • Serverless-friendly: No worker process to keep alive. Event Gateway delivers over HTTP to any endpoint, including a Lambda, a Vercel function, or a Cloud Run service.
  • No Redis for you to operate: Durable queue, backpressure, and worker scaling are managed.
  • Local development with the CLI: hookdeck listen 3000 proxies to a local Express or NestJS server with no ngrok required, and the same source configuration works in CI and production. See the Hookdeck CLI documentation.
  • Free tier: Developer plan is $0/month with 10,000 events. Team plan starts at $39/month with a 99.999% uptime SLA.

How does Hookdeck Event Gateway compare to BullMQ?

Hookdeck Event Gateway moves the webhook concerns out of your application code and out of your operational surface. The handler stays idiomatic Node, but it does less. Signature verification, queueing, retries, and observability all happen upstream of your router.

The trade-off is that you're moving a responsibility from infrastructure you operate to a service you pay for. If your webhook volume is small and your team has bandwidth to maintain BullMQ and Redis, the savings may not justify the line item. If webhooks are causing pages, missed events, or hours of debugging, the calculation changes quickly.

CapabilityHookdeck Event GatewayBullMQ
Webhook-aware ingestion (signature, idempotency)✅ 120+ sources pre-configured❌ Hand-rolled per source
Per-source retry policy without code changes✅ Per-connection config❌ Code-defined
Failed-payload visibility✅ Issues + full-text searchℹ️ Bull Board + manual logs
Replay individual events✅ One-clickℹ️ Manual via dashboard or shell
Serverless deployment✅ HTTP delivery to functions❌ Needs a long-running worker
Redis / worker operations✅ Managed❌ You run Redis + worker processes
Local development✅ Via hookdeck listenℹ️ Plus ngrok and Redis locally
Self-hostable
Suits non-webhook background workCron, flows, general async jobs

Try Hookdeck Event Gateway free

Webhook ingestion, durable queueing, and per-source retries without operating Redis

pg-boss and Bee-Queue

pg-boss is a Postgres-backed Node queue, attractive if you'd rather not add Redis and want transactional inserts alongside your business data. Bee-Queue is a lighter, faster Redis queue focused on short, high-throughput jobs.

For pure background work, both are reasonable. For webhook retries specifically, they don't change the underlying problem: you still need a broker (Postgres or Redis), still run worker processes, still hand-roll signature verification, still have an ack-before-queue gap, and still inspect failed work through a job lens rather than a webhook lens.

AWS SQS, Google Cloud Tasks

Pushing webhook retries to cloud-native queues, typically SQS with a Lambda consumer or Cloud Tasks with a Cloud Run handler, trades Redis operations for IAM, queue config, and visibility tooling. It also fits serverless far better than BullMQ, which is often the real reason Node teams look here.

This shifts the operational burden but doesn't close the webhook-specific gaps. Signature verification still lives in your function. Idempotency keys still need a deduplication store. Per-source retry policy still requires multiple queues with different DLQ configs. Replay still means writing tooling on top of CloudWatch and the SQS console. For a deeper look, see AWS SQS alternatives for webhooks.

Convoy

Convoy is an open-source self-hosted webhook gateway. It's the closest like-for-like to Hookdeck Event Gateway: subscriptions instead of connections, signature verification, retry policies, and replay.

The trade-offs are a smaller pre-configured source library, a less mature observability stack (no full-text search across event history, no integrated alerting equivalent to Issues), and you operate it yourself, Postgres, Redis, the Convoy service, and any metrics tooling. It's a good option when self-hosting is a hard requirement. For a deeper look, see the comparison of webhook gateway solutions.

When to keep BullMQ

BullMQ is still the right answer in several scenarios:

  • Webhooks are a small fraction of an established BullMQ footprint. If most of your queue is non-webhook work (image processing, emails, flows, cron), the marginal savings of moving them out aren't worth fragmenting the stack.
  • Self-hosting is mandatory. Compliance, data residency, or air-gapped deployments rule out a managed service. BullMQ + Redis is well-understood.
  • You need BullMQ flows alongside webhook retries. If parent-child job graphs and webhook work share a worker pool by design, splitting them complicates ops more than it saves.
  • Webhook volume is low and stable. No spikes, rate limit pressure, provider with aggressive retry policies. BullMQ with attempts and backoff is fine, you have other things to work on.

The migration argument gets stronger when webhook volume grows, when more providers come online, when you move to serverless and have nowhere to run a worker, when on-call gets paged for webhook-related incidents, or when "what payload caused this?" becomes a recurring question.

Migrating Express webhook retries to Hookdeck Event Gateway

The shape of the migration is small. The architectural diff:

Before:

flowchart LR
    A[Stripe] --> B[Express route<br/>signature verification + retry policy in code]
    B --> C[queue.add webhook]
    C --> D[BullMQ worker]
    D --> E[Handler logic]

After:

flowchart LR
    A[Stripe] --> B[Hookdeck<br/>signature verification + retry policy as config]
    B --> C[Express route]
    C --> D[Handler logic]

The steps:

  1. Create a Hookdeck source for Stripe (or whichever provider). Hookdeck generates a webhook URL and handles signature verification automatically.
  2. Create a connection from the source to your Express endpoint. Configure the retry policy on the connection, exponential backoff, max attempts, retry window.
  3. Update the webhook URL in the Stripe dashboard to point at Hookdeck Event Gateway. Run in shadow alongside the existing BullMQ path for a few days.
  4. Remove the queue.add call from the webhook route. The route now processes events synchronously, which Event Gateway has already verified, deduplicated, and queued.
  5. Keep BullMQ for everything else.

For the broader pattern, see managed webhook gateway vs DIY queue-backed infrastructure.

Hookdeck Event Gateway is the managed answer for webhook retries

BullMQ isn't broken, but it wasn't purpose-built for webhooks. That is felt every time there's av event spike, a Heroku H12, a serverless deploy, or a missing signature lands in your incident channel. The Node ecosystem has been working around that gap for years with custom attempts configs, dedupe stores, raw-body middleware, and careful Redis tuning.

Hookdeck Event Gateway is managed webhook infrastructure that takes the work out of the Node application. Signature verification, durable queueing, per-source retry policies, replay, and observability happen upstream of your Express, NestJS, or Fastify code. Your route stays simple. BullMQ stays for what it's actually good at (image processing, emails, flows, non-webhook background work).

If you're testing this from a local dev server, run hookdeck listen 3000 to forward webhooks to localhost, no ngrok required, and the same source configuration works in CI and production.

Try Hookdeck Event Gateway for free

Webhook ingestion, durable queueing, retries, and observability, without operating BullMQ or Redis

FAQs

What is the best alternative to BullMQ for webhook retries?

It depends on your priorities. If you want to stop running Redis and a worker pool and want webhook-aware retries with per-source policies and full payload visibility, Hookdeck Event Gateway is the strongest fit. It sits in front of your Express or NestJS handler rather than behind it, and it works with serverless deployments where BullMQ workers don't fit. If you want to stay inside the Node task-queue paradigm but want Postgres-backed durability, pg-boss is a solid choice. If self-hosting is non-negotiable, Convoy is the open-source option.

Can I keep BullMQ for background jobs and use something else just for webhook retries?

Yes, and this is the most common pattern. BullMQ stays in the stack for image processing, emails, flows, and cron; webhook ingestion and retries move upstream to a dedicated webhook gateway like Hookdeck Event Gateway. Your Express route receives events that have already been verified, deduplicated, and queued, so the queue.add call for webhook work disappears.

Why not just use BullMQ's attempts and backoff for webhook retries?

It works, but it leaves three problems unsolved. First, your application has to ack the webhook before the job hits Redis and events can be lost in that gap. Second, signature verification, idempotency, and source-specific retry policy live in your code, copy-pasted per provider. Third, when a retry fails, you have a BullMQ job ID, not a webhook record so debugging means correlating Bull Board with provider logs. A webhook gateway moves all three concerns out of application code.

Does Hookdeck Event Gateway replace Redis if I move webhook retries off BullMQ?

For the webhook half of the workload, yes. Hookdeck Event Gateway provides durable queueing, retries, and dead-letter behavior as a managed service, so you don't need Redis for webhook ingestion. Most teams keep Redis and BullMQ for non-webhook background work (image processing, emails, flows) and let Hookdeck Event Gateway own the webhook boundary.


Gareth Wilson

Gareth Wilson

Product Marketing

Multi-time founding marketer, Gareth is PMM at Hookdeck and author of the newsletter, Community Inc.