How Developers Connect Webhooks to AI Agents, MCP Servers, and LLM Tools

Most of the AI systems being shipped are not autonomous. They sit and wait for something to happen (a customer pays, a CI run fails, a Batch job finishes, another agent finishes its part of a task), and then they do something about it. The thing that wakes them up is almost always a webhook.

That makes webhooks one of the essential parts of the agent stack that's also one of the least examined. Teams spend weeks choosing a model, a framework, and a vector store, then wire the whole thing to the outside world with a single app.post('/webhook', ...) handler and hope for the best.

This article maps the different ways developers actually connect webhooks to AI agents, MCP servers, and LLM-based tools like Claude. There are more of them than you'd expect, they fail in the same handful of ways, and knowing which pattern you're in tells you what infrastructure you actually need.

A map of where webhooks show up

It helps to stop thinking about "AI webhooks" as one thing. In a real system, webhooks show up at four distinct boundaries, and they point in two directions: events coming into your AI system, and events going out of it.

PatternDirectionWho sendsExample
Provider callbacksInboundOpenAI, Gemini, Anthropic"Your Batch job finished"
Event-triggered agentsInboundStripe, GitHub, Shopify, your own services"A dispute was opened; draft a response"
MCP async resultsInboundThird-party APIs behind an MCP server"The payment the agent started just succeeded"
Live-session pushesInboundAnything, into a running agent"CI just went red while you're at the terminal"
Agent and platform notificationsOutboundYour agent or AI platform"The generation you asked for is ready"
Agent-to-agent updatesOutboundOne agent to another"Task delegated to me is done"

But the AI framing doesn't really change the engineering much. A checkout.session.completed event that wakes an agent has the same delivery semantics as one that updates a database row. The model call in the middle is new; the plumbing around it is the same webhook plumbing you've always had to get right. We'll come back to that. First, the patterns.

Pattern 1: AI providers calling you back

The most direct way webhooks and LLMs meet is the model provider sending you a webhook. As model work moved from synchronous chat completions toward long-running, agentic jobs (Batch runs over thousands of prompts, fine-tuning, background responses, deep research, video generation, hour-long agent sessions), polling an API for "are we done yet?" stopped being viable. So OpenAI, Google, and Anthropic all shipped webhooks.

If you've integrated one, the others feel familiar, because the three converged on nearly identical design choices:

  • Thin payloads. The delivery contains an event type and a resource ID, not the result. You call back into the API (or read from cloud storage) to hydrate the full object. This avoids serving stale data on a retry.
  • At-least-once delivery. All three explicitly warn that duplicates happen. You dedupe on a delivery ID.
  • No ordering guarantees. Lifecycle events can arrive out of sequence. You sort by a timestamp on the envelope, or better, treat each event as "go check current state."
  • Signed, time-bounded deliveries. HMAC (or JWT) signatures plus a roughly five-minute freshness window to resist replay attacks.
  • Generous-but-slow retries. 24 to 72 hours of exponential backoff, which protects the provider's infrastructure but means recovery after you fix a bug can be slow.

Where they differ is in the details that bite you in production. We've written a full guide to each:

  • OpenAI webhooks follow the Standard Webhooks spec, cover five event families (Background Responses, Batch, fine-tuning, evals, Realtime SIP), retry for up to 72 hours, and ship no first-party CLI for local development.
  • Gemini webhooks are also Standard Webhooks, but with a twist: two configuration models with two signing schemes. Static endpoints are signed with HMAC-SHA256; per-job dynamic endpoints are signed with RS256 JWTs you verify against Google's JWKS endpoint. Results land in Google Cloud Storage as gs:// pointers, and configuration is API-only.
  • Claude (Managed Agent) webhooks cover agent session and vault-credential lifecycle events, ship SDK verification helpers in seven languages, and add an auto-disable behaviour: roughly twenty consecutive failures and Anthropic switches your endpoint off until you re-enable it by hand.

The takeaway across all three is the same: a model provider's webhook is a notification, not a system of record. Verification, deduplication, ordering tolerance, the retry window, secret rotation, and the local-dev feedback loop are all yours to own. The thin-payload design even nudges you toward the right architecture: acknowledge fast, hydrate the full object asynchronously, and reconcile against the API for anything you might have missed.

Pattern 2: Webhooks that trigger an agent

This is the pattern most people mean by "AI automation": an external event fires, and an agent runs in response. Triage a Stripe dispute. Summarize a GitHub pull request and tag a reviewer. Classify an inbound support ticket and route it. The webhook is the trigger; the agent is the work.

The naive version is the one everyone writes first:

app.post('/webhook', async (req, res) => {
  const result = await agentSDK.run('classify this ticket', req.body);
  await db.tickets.update({ id: req.body.id, category: result.category });
  await slack.notify('#support', `New ${result.category} ticket`);
  res.status(200).send('OK');
});

In development it works. In production it works until the day the model call takes 47 seconds instead of 4, your load balancer times the request out at 30, the webhook provider reads that as a failed delivery and retries, and your agent runs twice. Now a customer has two Slack pings and you have two database writes.

A lot of teams reach for a durable workflow engine at exactly this moment, and that's the trap. Often, most webhook-triggered agents don't need a workflow engine. The code above isn't a workflow. It's three function calls that finish in seconds, with no human in the loop, no compensation logic, and no state that needs to outlive the request. The failures it suffers (timeout-then-retry, crash mid-handler, transient downstream error) are delivery and retry problems, not execution-state problems. What it needs is a durable queue in front of it, retries on a curve you control, an idempotency key, event-level observability, and a reasonable timeout. That's a webhook gateway and a handler, not a workflow DSL.

When the workflow really is a workflow

Sometimes the work genuinely is multi-step and long-lived: five or more steps that can each fail independently, a human approval in the middle, a follow-up scheduled three days out, expensive non-idempotent steps you don't want to re-run on retry. That's when a durable runtime earns its place. Established players like Inngest, Trigger.dev, Temporal, Hatchet, and Restate journal each step so an agent can crash at step four of six and resume at step four, not step one.

The mistake is treating gateways and runtimes as competitors. They solve different halves of the problem: a gateway handles ingress (verify, dedupe, queue, retry delivery, replay), a runtime handles execution (checkpoint between steps, per-step retries, waitpoints, scheduling). Our deep dive on webhook gateways versus durable runtimes walks through where each shines and why mature systems usually run both in sequence: producer to gateway to runtime to side effects.

For a concrete end-to-end build, GitHub + Trigger.dev + Claude shows the two-layer pattern in full: GitHub source authentication and deduplication at the gateway edge, durable Claude-powered tasks behind it for PR reviews, issue labeling, and Slack summaries, with two routing options (a single task router versus per-event routing configured at the edge).

One detail: some agent endpoints have no idempotency key at all. Anthropic's inbound routines.fire, which lets external systems kick off a Claude Code routine, is one: every call creates a new session and burns quota. If a GitHub event reaches it three times, you get three agent sessions and three times the quota. Deduplicating on the upstream event ID before you trigger the agent is what keeps a flaky delivery from turning into a surprise bill.

Pattern 3: Webhooks and MCP servers

The Model Context Protocol is how agents reach tools and data: an MCP server wraps an API (Stripe, GitHub, Shopify) and exposes it as tools any compatible client can call. It's become the default integration layer for agents, and it has a structural blind spot that webhooks expose.

MCP assumes the world is synchronous. A client calls tools/call, the server runs the operation, a result comes back. That's perfect for reads: fetch a customer, list issues, check a balance. But when an MCP tool starts something that finishes later (initiates a payment, creates a deployment, kicks off a fulfillment), the immediate response is just an acknowledgment. The real result arrives later, as a webhook from the third-party service. As Hookdeck's own writeup on building reliable MCP servers puts it, MCP gives you two patterns to support, synchronous request-response and event-driven callbacks, and the spec only handles the first well.

The protocol is closing part of this gap. The 2025-11-25 spec introduced Tasks, a "call-now, fetch-later" primitive: when a tool call can't return immediately, the server hands back a taskId and the client polls through states (working, input_required, completed, failed, cancelled). The 2026 work pushed further toward stateless, horizontally scalable servers over Streamable HTTP. But Tasks solve the problem from the agent's side: they give the client a way to wait. They don't solve the server's problem: reliably receiving the inbound webhook that says the work is done, verifying it, deduplicating it, and correlating it back to the original tool call. Native webhook support is still listed under "Triggers and Event-Driven Updates" on the roadmap, not in the spec.

So in practice, MCP and webhooks are complementary, and you wire them together yourself. A webhook fires, an agent gets triggered, and the agent uses MCP servers to carry out the multi-step work. Or, on the server side, you put webhook infrastructure between the third-party service and your MCP server so it receives clean, deduplicated, durably-queued events and turns them into Task state changes or client notifications, instead of every MCP server reimplementing ingestion, dedup, retries, and observability from scratch. The enterprise MCP gateways that have appeared (Kong, Lunar, TrueFoundry) focus almost entirely on the outbound tool-call side; inbound event ingestion is the half they leave out.

Pattern 4: Pushing webhooks into a live agent session

A newer pattern flips the timing: instead of an event triggering a fresh agent run, the event is pushed into an agent that's already running. Anthropic's Claude Code Channels (a research preview) do exactly this. A channel is an MCP server that Claude Code spawns as a subprocess and listens to for notifications. Wire a webhook into it and Claude Code can react to a CI failure, a payment, or a GitHub push while you're not even at the terminal.

The catch is the familiar one: the external service needs a public URL, and your channel server runs on localhost. That's the classic local-webhook-development problem, and it's exactly what the Hookdeck CLI was built for. It gives the local channel server a stable public URL (so you configure the provider once, not on every restart), full request inspection, and one-click event replay, which is the feature that matters most here. Iterating on how an agent reacts to a payload usually means tweaking, restarting, and re-sending the same event; replay turns that loop from minutes (push another commit, wait for the real event) into seconds. Our walkthrough builds the whole chain (channel server, public URL, GitHub events, Claude reacting), and you can send realistic mock events from the Hookdeck Console before touching a real provider.

Pattern 5: Agents and platforms sending webhooks out

Everything so far has been inbound. The other half of the story is agents and AI platforms emitting events.

Two cases are worth separating. The first is an AI platform notifying its own users: you've built a product where a generation, a transcription, or an analysis takes a while, and you want to tell customers when it's done. That's outbound webhook delivery, the same job Stripe does when it sends you invoice.paid, and it carries its own hard problems: per-tenant endpoints, signature schemes your customers can verify, retries with backoff, and a way for customers to see and debug their own deliveries. Building that well is enough work that there's a whole category for it; Hookdeck Outpost is purpose-built for the sending side, with native delivery to HTTP, queues, and event buses.

The second case is agents talking to each other. Google's Agent2Agent (A2A) protocol, now under the Linux Foundation, is the emerging standard for cross-framework agent communication, and it leans on webhooks for exactly the long-running case where polling falls down. When a delegated task takes hours, or the requesting agent disconnects, A2A delivers asynchronous progress through push notifications to a secure, client-supplied webhook. A useful mental model: MCP is how an agent uses its hands, and A2A is how two agents shake them. Webhooks are how one agent tells another that the work it handed off is finally done.

The patterns differ; the failure modes don't

Step back and the five patterns share a spine. Whatever the AI framing (provider callback, agent trigger, MCP async result, live-session push, agent-to-agent update), the same five operational problems recur, because underneath they're all webhooks:

  1. Authentication. Every inbound delivery is an unauthenticated POST to a public URL until you verify it. The providers sign with HMAC-SHA256 or RS256 JWTs and add a freshness window to resist replay; you have to actually check both the signature and the timestamp, with a constant-time comparison, and handle overlapping secrets during rotation. Skipping verification means your agent does real work on receipt of any HTTP request from anyone.
  2. Delivery guarantees and duplicates. Every provider here is explicitly at-least-once, not exactly-once; they all warn that duplicates happen. Exactly-once delivery is effectively impossible over an unreliable network, so the contract is "we'll get it to you, possibly more than once," and the burden of idempotency moves to you. You dedupe on the delivery ID and make every side effect idempotent at the destination with unique constraints, upserts, or natural keys. (More in our guide to webhook idempotency.)
  3. Ordering. None of these systems guarantee order. A completed event can land before the event that preceded it. State machines that assume sequential delivery get into impossible states; sort by the envelope timestamp, or treat each event as a trigger to re-fetch current state. (See why ordering is hard.)
  4. Retries, backpressure, and recovery. Provider retry windows run 24 to 72 hours on slow exponential backoff, which is fine for short outages but painful when you've fixed a bug and are waiting hours for the next attempt. A model call or a flash-sale event spike can also bury your handler. You want a durable queue that absorbs bursts, retries on your curve, dead-letters what won't deliver, and lets you replay on demand instead of waiting for the provider.
  5. Observability. When an agent produces no result, the first question is whether the webhook even arrived. Without event-level visibility into what was sent, whether the signature was valid, whether a duplicate was dropped, and where delivery failed, you're debugging with console.error.

This is the real reason "AI webhooks" aren't special: the model is the new part, but the reliability work is the same work webhook developers have always had to do, now sitting on the critical path of systems that act on their own.

Choosing the right approach

A short decision guide, with the caveat that real systems are messier than any flowchart:

  • A provider is calling you back (OpenAI, Gemini, Claude async jobs): acknowledge fast, hydrate asynchronously, dedupe on the delivery ID, and reconcile against the API. Read the provider-specific guide for the details that differ.
  • An event triggers a short agent task (one to three side effects, finishes in seconds to a couple of minutes, no human in the loop): a webhook gateway in front of a plain handler is enough. Don't reach for a workflow engine yet.
  • An event triggers a real workflow (5+ steps, waitpoints, scheduling, expensive non-idempotent steps): use a durable runtime for execution, with a gateway in front for ingress.
  • You're building an MCP server that triggers async actions: put webhook infrastructure between the third-party service and your server, so inbound events are verified, deduplicated, and queued before they reach your tool logic.
  • You're pushing events into a live agent session (Claude Code Channels and similar): use a CLI tunnel with stable URLs and replay so your dev loop stays in seconds, not minutes.
  • Your agent or platform sends events out: that's outbound delivery (Outpost) or agent-to-agent push notifications (A2A), and the sending side has its own reliability surface.

Across all of these, one principle holds: start with the smallest reliable thing. A well-queued, well-retried, well-observed webhook into a stateless handler covers a surprising amount of agent work, and it doesn't paint you into a corner: when a genuine workflow emerges, you put a runtime behind the gateway and the producers never notice.

How Hookdeck helps

Every pattern in this article shares an inbound (and sometimes outbound) webhook problem, and that's the layer Hookdeck handles so your agent code doesn't have to. Event Gateway verifies signatures for 120+ providers, deduplicates at the edge, queues every event durably, retries on a curve you control, and gives you full-text searchable, replayable observability over everything that arrives, whether the consumer downstream is a plain handler, a durable runtime, an MCP server, or a live Claude Code session. Outpost does the same for the events your platform sends out. And the Hookdeck CLI gives you a stable public URL with one-click replay for local development.