Gareth Wilson Gareth Wilson

Anthropic shipped webhooks for Claude Managed Agents. Here's what they unlock.

Published


Anthropic announced a wave of new capabilities for Claude Managed Agents — dreaming, outcomes, multiagent orchestration, and, quietly tucked into the list, webhooks. So now you can define an outcome, let the agent run, and get notified by a webhook when it's done.

The addition of webhooks turns Managed Agents from something you watch into something you wire up. But the moment you start wiring agents into the rest of your stack, you discover what every webhook-producing platform has discovered before you: at-least-once delivery, signature freshness windows, ordering, retries, and dead endpoints are your problem now.

So let's unpack what Anthropic shipped, the two webhook directions that matter, and where the reliability work lives.

What Anthropic shipped

The new Managed Agent webhooks are configured in the Claude Console. You register an HTTPS endpoint, pick the event types you care about, and Anthropic generates a signing secret prefixed whsec_ that you only see once.

The events split into two families:

Session eventssession.status_run_started, session.status_idled, session.status_rescheduled, session.status_terminated, plus the multiagent equivalents session.thread_created, session.thread_idled, session.thread_terminated, and the new session.outcome_evaluation_ended for when an agent's work passes or fails its rubric.

Vault events — credential lifecycle events including the very ops-relevant vault_credential.refresh_failed.

The payloads are deliberately thin: just the event type and id. To get the full object, you call back into the Anthropic API. Anthropic's stated reason is to avoid serving stale data on retries — and that retry behaviour is the part worth reading carefully.

A few things from the docs that anyone integrating should know:

  • Delivery is at-least-once. Anthropic retries failures and the retry carries the same event.id. You're expected to dedupe on it.
  • Ordering is not guaranteed. A session.status_idled event might land before the session.outcome_evaluation_ended that caused it. Sort by created_at if order matters.
  • The signature header is X-Webhook-Signature. The SDK helper rejects payloads older than 5 minutes — replay protection, but it also means delivery latency is your problem.
  • Only 2xx responses count as a successful ack. 3xx redirects are treated as failures. After ~20 consecutive failures (or any private-IP DNS resolution, or any redirect) the endpoint is auto-disabled and has to be re-enabled in Console.

If you've integrated Stripe or GitHub before, none of this is shocking. If you haven't, it's a fast lesson in why webhook infrastructure exists.

The other webhook direction

Outbound notifications are only half the story. The other interesting half is the part that turns Claude into the brain of an event-driven workflow — the routines fire endpoint.

A "routine" is a saved Claude Code session: a prompt, a set of repos, connectors, and tools. Routines can be triggered from claude.ai/code, on a schedule, or by an HTTP POST from anywhere. So when Sentry fires, when a Stripe charge disputes, when a Linear ticket lands, when a GitHub Action fails, you can fire a routine and let Claude take it from there.

Two webhook directions, two different concerns:

DirectionWhat it doesWho fires it
OutboundAnthropic → your appTells your app an agent's status changedAnthropic
InboundYour providers → AnthropicTriggers an agent to do workGitHub, Stripe, Sentry, Linear, your app…

Both lanes need the same things — verification, dedup, retries, observability.

What this unlocks

The use cases are clear if you stop thinking about webhooks as plumbing and start thinking about them as event triggers for Claude:

  • A GitHub issue or CI failure fires a routine that triages, labels, and proposes a fix.
  • A Sentry alert fans into a multiagent investigation across deploys, logs, and metrics.
  • A Stripe dispute or failed payment triggers a routine that drafts a contextual customer reply and posts it for review in Slack.
  • A Zendesk or Intercom ticket fires a classification routine that routes to the right tier.
  • A session.outcome_evaluation_ended event tells your Slack channel that an agent's draft is ready for human review.
  • A vault_credential.refresh_failed event triggers your ops runbook.

The pattern is the same in every case:

flowchart LR
  provider[Provider] -->|Webhook| agent[Agent does the work] -->|Webhook| human[Human/System notified]

Where this gets messy

Anthropic's reliability promises are deliberately minimal: at-least-once delivery, no ordering guarantees, no idempotency key on the inbound routines.fire side at all. The docs are explicit: every retried call to fire a routine creates a new Claude session. Basically, dedupe upstream or pay twice.

You can solve this with code, and plenty of teams will reach for a durable runtime to do it. We think this is usually overkill: most "agent workflows" are a webhook handler with an LLM call inside, and what they actually need is a durable queue, idempotent delivery, and retries with backoff. When you do need real workflow state, the gateway and the runtime do different jobs: gateways own ingress, runtimes own execution. Starting with the gateway keeps your options open.

How Hookdeck can help

Hookdeck Event Gateway sits between in both directions:

Inbound (provider → Hookdeck → routines.fire). Event Gateway verifies signatures from 120+ providers, dedupes on the provider's event ID, transforms the payload into the text field routines.fire expects, and retries with backoff when Anthropic returns a 429. Because routines.fire has no idempotency key, edge dedup at the gateway is what stops a single GitHub event from spawning three Claude sessions and burning quota. We walked through this pattern with GitHub, Trigger.dev, and Claude.

Outbound (Anthropic → Hookdeck → your app). Event Gateway verifies the X-Webhook-Signature once at the edge, queues events durably so a slow downstream consumer doesn't trip the 5-minute freshness window, fans out a single event to multiple subscribers, and replays anything that failed when you fix the bug. The auto-disable threshold means a flaky receiver can blackhole all your agent notifications - Event Gateway absorbs the volatility before Anthropic disables the endpoint.

For local development, the Hookdeck CLI gives you a stable public URL, full request inspection, and one-key event replay, which pairs especially nicely with Claude Code Channels when you want webhook events pushed straight into a live coding session.

Webhooks turning Managed Agents from a UI feature into an integration primitive is a big shift. The same boring infrastructure problems you'd solve for Stripe or GitHub apply here too, just with higher per-event stakes because each event is an agent run, not a row update.

If you're wiring an agent into your stack, start with Event Gateway. You can decide later whether you also need a runtime.