Gareth Wilson Gareth Wilson

Why most webhook-triggered agents don't need a workflow engine

Published


Before you set up a durable workflow engine, ask yourself whether what you have is really a workflow? Or is it a webhook handler with an LLM call inside it?

Oftentimes, it's the second. And that doesn't need a workflow engine but a reliable queue and a handler.

This article covers when to use a durable workflow versus a webhook gateway — so you don't end up overbuilding.

What a workflow engine is actually for

Workflow engines like Inngest, Trigger.dev, Hatchet, Temporal are a real category of tool solving a real set of problems. They make code durable, which is a specific technical property: the execution state is persisted between steps, so your function can crash halfway through a seven-step process, come back up, and resume at step five instead of step one.

They do this so you can build things like:

  • A customer onboarding workflow that takes three days to complete, with four asynchronous waits for external systems.
  • A content moderation pipeline where a human has to approve or reject before the task proceeds.
  • A long-running training or scraping job that's measured in hours and must not lose progress to infrastructure churn.

These are real workflows. They have state, they have structure, they have phases that need to survive restarts. A workflow engine is the right tool.

What most "agent workflows" actually look like

Pull up the code you were about to port to Inngest. Describe what it does, step by step.

Does it look like this?

// Webhook arrives. We call an LLM. We write one row. We send one thing.
app.post('/webhook', async (req, res) => {
  const result = await agentSDK.run('classify this support ticket', req.body);
  await db.tickets.update({ id: req.body.id, category: result.category });
  await slack.notify(`#support`, `New ${result.category} ticket`);
  res.status(200).send('OK');
});

That's three discrete operations. A model call, a database write, a notification. The pattern is LLM-in-a-handler: do the complex thing, then do one or two simple things with the result.

You'll find variations of this, such as:

  • Classify inbound webhook → update a record → trigger a Zap/Make/n8n hop.
  • Summarise a GitHub PR → post the summary to Slack → tag a reviewer.
  • Triage a Stripe dispute webhook → call an LLM for a draft response → save to a queue for human review.

These are not workflows. They're webhook handlers that happen to call an LLM somewhere in the middle. The "steps" are function calls, not phases. They finish in seconds, not days. There's no human in the loop. There's no compensation. There's no state that needs to outlive the request.

If that's the shape of your code, a workflow engine could be over-tooling.

The real problem is more boring than durable execution

Zoom in on what's actually breaking in the naive pattern. It's almost always one of three things:

  1. The LLM call takes longer than your HTTP timeout. Load balancer or serverless runtime kills the request at 30 seconds. The webhook producer interprets that as a failure and retries. You end up running the agent twice.
  2. Your process crashes or restarts mid-handler. Deploy happens during execution, container gets killed. Work is lost. Webhook producer retries. Agent runs again.
  3. A downstream service fails transiently. Slack API 503s. DB connection times out. Handler returns 500. Webhook producer retries on an unforgiving curve or gives up entirely.

Every one of these is a delivery and retry problem, not an execution-state problem. The fix is:

  • Put the webhook on a durable queue immediately, outside your application.
  • Give your handler enough time to do its work (seconds to minutes).
  • Retry delivery with your curve, not the provider's.
  • Make the handler idempotent at the event level, so a retry is safe.
  • Give yourself observability into what was delivered, what succeeded, what failed.

That's a queue with good ergonomics, not a workflow engine.

What a queue + handler gets you

Line up the features people quote as reasons to adopt a workflow engine, and check how many of them you actually need.

Durable execution state. A workflow engine checkpoints between steps. Useful if you have steps. If your handler is three calls, the checkpoint boundary is "did it finish or not." That's a handler return code, not durable state.

Per-step retries. Retrying a failed step instead of re-running the whole function is genuinely useful when your steps are expensive and independent. It's overkill if the actions are cheap or they're already idempotent at the handler level.

Retries in general. A decent webhook gateway gives you delivery retries with backoff, dead-letter handling, and manual replay. You don't need a workflow engine for this.

Observability. Workflow engines give you step-level traces. Useful when there are steps. For a handler, an event-level trace ("this webhook arrived, this is what your code did, here's the response") is often more useful and much easier to reason about.

Long-running work. A workflow engine lets your code run for hours. But if your code doesn't need to run for hours (say if the LLM call is 45 seconds and the writes are milliseconds) you don't need that capability.

Scheduled and delayed tasks. A workflow engine can schedule a follow-up three days from now. If you don't have that requirement, you're not using the feature.

Human-in-the-loop / waitpoints. The killer feature of modern runtimes.

For the three-function handler shape, the honest feature list you actually need is: durable queue, retries with backoff, idempotency key, event-level observability, reasonable timeout. A webhook gateway can give you all five.

When you should reach for a workflow engine

Pick a durable runtime when two or more of the following are true:

  • Your workflow has five or more discrete steps. Real steps — things that can succeed or fail independently and whose success you want to remember across restarts.
  • Your workflow waits for external events: human approvals, other services, long-running background processes. If you have a waitpoint, you want a waitpoint primitive.
  • Your workflow runs for longer than a few minutes and you care about surviving deploys and container restarts without losing progress.
  • You need compensating transactions — saga patterns where failure of a late step has to undo earlier side effects in a principled way.
  • You have scheduled or time-delayed follow-ups (send a nudge three days after this event) that aren't a natural fit for a cron + queue.
  • You have expensive, non-idempotent per-step work where re-running is costly enough to justify check-pointing at step boundaries, like multi-dollar LLM calls or large data transforms.

If only one of those applies, you can probably still get there with a queue and some care. If two or more apply, the runtime is earning its keep.

Starting with a queue doesn't paint you into a corner

A common objection to this kind of YAGNI argument is: "but what if I need a runtime later?" Answer: you can add one - webhook gateways and runtimes work well together.

A webhook gateway in front of your handler doesn't commit you to handler-only architecture. The moment a workflow actually emerges in your code (you start needing that third-day follow-up, or a human approval, or a proper saga) you put a durable runtime behind the gateway and migrate the handlers that need it. The gateway keeps doing its job unchanged. Your webhook producers don't notice. Your ingress URLs don't change. Your signature verification, dedup, replay, and observability stay where they are.

Starting with a queue is a cheap decision that keeps the expensive decision for later.

Wrapping up

The gap between "my webhook handler times out" and "I need a durable workflow engine" can be larger than you think. If you're at the "my handler timed out again" moment right now and want the smallest thing that works, Hookdeck's free tier handles ingress in a few minutes, and the Hookdeck CLI lets you debug the whole thing on your laptop before you ship. When — or if — the workflow shape emerges in your code, the runtime will still be there.


Gareth Wilson

Gareth Wilson

Product Marketing

Multi-time founding marketer, Gareth is PMM at Hookdeck and author of the newsletter, Community Inc.