How Managed Webhook Services Implement Exactly-Once Delivery
If you've spent any time building event-driven systems, you've probably encountered the promise of exactly-once webhook delivery. It sounds like the gold standard—every event processed once and only once, with no duplicates and no gaps. But if you've tried to implement it yourself, you know the reality is messier than the marketing.
In distributed systems, exactly-once delivery is theoretically impossible. The Two Generals' Problem demonstrates this elegantly: two parties communicating over an unreliable channel can never be certain the other received their message, because the acknowledgment itself can be lost. If a protocol could guarantee exactly-once delivery, it would solve this unsolvable problem.
So what do managed webhook services actually do when they claim to offer exactly-once semantics? They don't break the laws of distributed computing. Instead, they layer a set of infrastructure-level mechanisms (deduplication, idempotency support, ordering controls, and intelligent retries) on top of an at-least-once delivery foundation. The result is a system that achieves exactly-once processing in practice, even if the network underneath remains stubbornly unreliable.
This article breaks down how these mechanisms work, where they fall short, and what you still need to handle in your own application code.
Why exactly-once delivery is impossible (and why that's okay)
Before getting into implementation, it helps to understand why strict exactly-once semantics are off the table at the network level.
Webhook delivery is an HTTP request from a sender to a receiver. The sender fires the request and waits for a response. Three things can happen:
- The receiver processes the event and returns a 200. Everyone is happy.
- The request never arrives. The sender times out and retries.
- The receiver processes the event, but the 200 response is lost in transit. The sender times out and retries—and the receiver now sees what looks like a new delivery of the same event.
Scenario three is the crux of the problem. The sender has no way to distinguish "the receiver never got it" from "the receiver got it but I didn't get the acknowledgment." The only safe choice is to retry, which means at-least-once delivery is the strongest guarantee you can make over an unreliable network.
This isn't a webhook-specific limitation. The FLP impossibility result (published by Fischer, Lynch, and Patterson in 1985) proved that even a single faulty process makes distributed consensus impossible in an asynchronous system. If you could guarantee exactly-once delivery, you could solve the Two Generals' Problem—and since that's provably unsolvable, exactly-once delivery is too.
The good news: you don't actually need exactly-once delivery. You need exactly-once processing. And that's an engineering problem with practical solutions.
At-least-once vs exactly-once: what the terms actually mean
The terminology around event delivery semantics can be confusing because different systems use the same words to mean different things. Here's a working taxonomy for webhook delivery guarantees:
| Guarantee | What it means | Trade-off |
|---|---|---|
| At-most-once | Fire and forget. No retries. If delivery fails, the event is lost. | Simple but unreliable. Acceptable for non-critical telemetry. |
| At-least-once | The sender persists the event and retries until it receives an acknowledgment. The receiver may see duplicates. | Reliable but requires the receiver to handle duplicates. |
| Exactly-once processing | At-least-once delivery combined with idempotent consumers that ensure duplicate deliveries don't produce duplicate side effects. | Reliable and safe, but requires cooperation between infrastructure and application code. |
When managed webhook services talk about exactly-once, they're talking about that third row. The infrastructure provides at-least-once delivery with deduplication and idempotency support. Your application provides idempotent handlers. Together, you get exactly-once processing—which, for all practical purposes, is what you actually want.
This is the same approach used by Apache Kafka's exactly-once semantics, where idempotent producers and transactional writes combine to prevent duplicate processing across brokers and consumers. Kafka achieves this within a closed system it controls end-to-end. Webhooks cross network boundaries you don't control, which makes the problem harder—and makes the infrastructure-level support from a managed service more valuable.
How managed services build toward exactly-once processing
A managed webhook gateway like Hookdeck Event Gateway sits between your webhook providers and your application. Every event passes through its infrastructure, which means it can apply reliability mechanisms before an event ever hits your endpoint. Here's how each piece works.
Webhook deduplication
Webhook deduplication is the first line of defense against duplicate processing. The idea is straightforward: if the service has already seen an identical event recently, it drops the duplicate before delivering it.
Hookdeck supports three deduplication strategies:
- Exact match: Compares the full payload. If two events are byte-for-byte identical within the deduplication window, the second one is dropped. This is effective against retry storms—where a sender rapidly retransmits the same event due to a timeout.
- Include fields: Compares only specific fields you designate, such as a request ID or transaction reference. This lets you define what "duplicate" means for your domain.
- Exclude fields: Ignores specified fields (like timestamps or sequence counters that change on each delivery attempt) and compares everything else.
When an event arrives, Hookdeck computes a hash based on your chosen strategy and checks it against a configurable time window—anywhere from 1 second to 1 hour. If a matching hash exists within that window, the event is suppressed.
This is effective, but it has limits. A deduplication window of one hour won't catch a duplicate that arrives two hours later. And if the provider sends two events with different payloads that represent the same logical operation (for example, two payment.updated events with slightly different metadata), field-level deduplication might not catch them. Deduplication is a load-reduction optimization, not a correctness guarantee. It cuts down the duplicates your application needs to handle, but it doesn't eliminate them entirely.
Webhook idempotency
Webhook idempotency is where correctness lives. An idempotent operation produces the same result regardless of how many times you execute it. If your webhook handler is idempotent, receiving the same event three times has the same effect as receiving it once.
Managed services support idempotency in two ways.
Infrastructure-provided idempotency keys. Every event Hookdeck delivers includes an X-Hookdeck-EventID header—a stable, unique identifier for that event. Even if network conditions cause Hookdeck to deliver the same event twice, your handler can use this ID to detect the duplicate. The pattern is simple:
- Extract the event ID from the header.
- Check your database for a record of that ID.
- If found, skip processing and return 200.
- If not found, process the event and record the ID.
This is the same approach Stripe recommends for their webhook events. Every Stripe event has a unique id field, and their documentation explicitly states that endpoints might receive the same event more than once. The industry consensus is clear: idempotent webhook processing is a receiver-side responsibility, and the infrastructure's job is to give you the tools to do it efficiently.
Database-level enforcement. A common implementation uses a database constraint to enforce uniqueness:
CREATE TABLE processed_events (
idempotency_key TEXT PRIMARY KEY,
processed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Attempt to insert the event ID. If the insert succeeds, the event is new. If it violates the unique constraint, it's a duplicate. This pattern is simple, battle-tested, and works with any relational database.
For high-throughput systems, you might use a Redis set with TTL expiration instead of a database table. The trade-off is durability—if Redis restarts, you lose your deduplication state. For most webhook volumes, a database table with periodic cleanup of old entries (anything older than your maximum retry window) is the pragmatic choice.
Webhook retries
Webhook retries are what make at-least-once delivery possible. If the receiver doesn't acknowledge delivery with a success status code, the sender tries again.
The sophistication is in how you retry. Naive retry logic—just resend every 5 seconds—creates problems at scale. If a destination goes down, all pending events start retrying simultaneously, creating a thundering herd that can keep the destination down even after it recovers.
Hookdeck provides configurable retry strategies:
- Exponential backoff: Each retry waits twice as long as the last. The first retry might happen after 30 seconds, the next after 1 minute, then 2 minutes, and so on. This gives struggling endpoints breathing room to recover.
- Linear intervals: Retries at fixed intervals. Useful when you know your destination's recovery time is predictable.
- Custom schedules: You define exact retry times. Useful for aligning retries with known maintenance windows.
On top of this, you could add jitter—random variance added to retry intervals—which prevents synchronized retry storms when many events are failing simultaneously. Retry windows can extend up to a week, with up to 50 delivery attempts per event.
But retries interact with idempotency and deduplication in ways that matter. A retry is, by definition, a duplicate delivery. If your deduplication window is shorter than your retry window, late retries will pass through deduplication and hit your application. This is by design—the deduplication layer handles the common case (rapid retransmissions), and your idempotent handler catches everything else.
Webhook ordering
Webhook ordering is where things get particularly tricky for distributed systems webhook delivery. In theory, if a customer.created event is followed by a customer.updated event, you'd want to process them in that order. In practice, there's no guarantee they'll arrive that way.
Webhooks are typically delivered over HTTP, which is stateless and connectionless. A provider might send two events from different servers, or a network path for the first event might be slower than the second. You cannot guarantee the order in which you receive webhooks.
Managed services offer a few mechanisms to help:
Concurrency control. Hookdeck lets you set a delivery rate per destination, including a concurrency limit. Setting concurrency to 1 gives you serial delivery—events are delivered one at a time, in the order Hookdeck received them. This doesn't guarantee they arrived at Hookdeck in order, but it prevents the reordering that parallel delivery introduces.
Timestamp-based conflict resolution. Most webhook payloads include a timestamp. Your handler can compare the event's timestamp against the last-processed timestamp for that resource. If the incoming event is older, skip it. If it's newer, process it. This ensures your system always reflects the most recent state, even when events arrive out of order.
Fetch-before-process pattern. Instead of trusting the webhook payload, use it as a signal to fetch the current state from the provider's API. This sidesteps ordering entirely—you always get the latest data, regardless of which event triggered the fetch. The downside is added latency and API rate limit consumption, but for critical data it's the most robust approach.
The tension here is real: strict ordering requires serial processing, which limits throughput. In most systems, the right answer is to design your handlers to tolerate out-of-order delivery rather than forcing the infrastructure to prevent it.
Webhook failure modes that break guarantees
Even with all these mechanisms in place, webhook failure modes can undermine your delivery guarantees. Understanding where things go wrong helps you design for resilience rather than hoping for perfection.
Destination outages
When your endpoint goes down, events queue up. A managed service like Hookdeck will detect backpressure—the growing delay between event arrival and delivery—and can open an issue to notify your team. But during extended outages, the retry window for early events may expire before the destination recovers.
The circuit breaker pattern provides a safety valve here. When a destination consistently fails, the circuit breaker stops sending requests, preventing wasted retries and giving the destination time to recover. Once the destination starts responding again, the circuit transitions to a half-open state, cautiously resuming delivery before fully opening the flow.
Poison events
Some events will never process successfully—maybe the payload references a resource that doesn't exist in your system, or triggers a code path with an unhandled edge case. Without intervention, these events retry indefinitely, consuming resources and clogging the pipeline.
The solution is a dead-letter queue. After a configured number of failed attempts, the event is moved to a separate queue where it can be inspected, debugged, and either fixed and replayed or discarded. Hookdeck's event management and retry system gives you visibility into failed deliveries and the ability to manually retry or discard events, which serves a similar purpose.
Split-brain deduplication
If your deduplication state lives in a single node and that node becomes unavailable, you temporarily lose deduplication. Events that would have been suppressed get through. This is a general problem with any stateful deduplication layer, whether managed or self-hosted. The mitigation is application-level idempotency—your last line of defense that works regardless of infrastructure state.
Provider-side duplicates
Sometimes the duplication happens before events reach the managed service. A provider might generate two distinct events for the same logical operation due to internal retries or eventual consistency in their own systems. These events will have different IDs, so neither infrastructure-level deduplication nor event ID-based idempotency will catch them. Your handler needs domain-level idempotency—checking, for example, that a payment hasn't already been recorded for a given invoice, regardless of which event triggered it.
Network partitions
In a network partition scenario, nodes in the delivery pipeline can't communicate with each other. The managed service might believe an event wasn't delivered (because it never received the acknowledgment), while the destination actually processed it. The service retries, and your application sees a duplicate. Again, idempotent handlers are the answer.
The layered defense model
If you've noticed a theme, it's this: webhook reliability comes from layers, not from any single mechanism. Here's how the layers stack up:
| Layer | Mechanism | What it catches | What it misses |
|---|---|---|---|
| Infrastructure deduplication | Hash-based comparison within a time window | Retry storms, rapid re-transmissions | Late duplicates, semantically equivalent events with different payloads |
| Idempotency keys | Unique event ID in headers | Any exact re-delivery of the same event | Provider-side duplicates with different IDs |
| Application-level idempotency | Domain-specific checks (e.g., "has this invoice been paid?") | Everything, including provider-side duplicates | Nothing—this is the last line of defense |
| Ordering controls | Concurrency limits, timestamp comparison | Out-of-order processing | Events that arrive at the infrastructure out of order from the provider |
| Retry logic | Exponential backoff with jitter | Transient failures, brief outages | Extended outages beyond the retry window, poison events |
| Circuit breakers and DLQs | Failure detection and event isolation | Sustained destination failures, poison events | Intermittent failures that don't trip the breaker |
No single layer is sufficient. Infrastructure-level deduplication reduces load on your application. Idempotency keys catch what deduplication misses. Application-level idempotency catches everything else. This is why the distinction between at-least-once vs exactly-once delivery matters less in practice than whether you've built all the layers.
What this means for your architecture
If you're evaluating webhook infrastructure—whether to build it yourself or use a managed service—here are the practical takeaways:
Accept at-least-once delivery as your foundation. Every serious distributed systems webhook delivery implementation starts here. Trying to build true exactly-once at the network level is a fool's errand. Build on at-least-once and invest in the layers above it.
Implement idempotent handlers regardless of what your infrastructure provides. Even if your managed service deduplicates 99% of duplicates, you need to handle the other 1%. Use event IDs as idempotency keys, store them in your database, and design your state mutations to be safe when applied multiple times.
Use infrastructure deduplication to protect your application, not to replace idempotency. A deduplication window of 5 minutes will catch most retry storms and accidental re-deliveries. Think of it as a performance optimization that reduces the load on your idempotent handlers, not as a correctness mechanism.
Design for out-of-order delivery. Unless you absolutely need serial processing (and are willing to accept the throughput cost), build handlers that compare timestamps or fetch current state from the source API. Assume events can arrive in any order and design accordingly.
Favor upserts over inserts. When a webhook handler updates your database, use INSERT ... ON CONFLICT UPDATE rather than separate "check then insert" logic. Upserts are naturally idempotent and handle both new events and duplicates gracefully.
Monitor your delivery pipeline. Managed services provide metrics for delivery rate, error rate, queue length, and response latency. Use them. A growing queue or rising error rate is an early warning that your exactly-once processing guarantees are under stress.
Conclusion
Exactly-once webhook delivery as a network-level guarantee doesn't exist. The physics of distributed systems won't allow it. But exactly-once processing—ensuring that every event has its intended effect once and only once—is absolutely achievable with the right combination of infrastructure and application-level safeguards.
Managed webhook services like Hookdeck don't claim to break the laws of distributed computing. What they do is handle the heavy lifting of persistence, retries, deduplication, and delivery management so that you can focus on the one thing only your application can do: process events idempotently. The infrastructure reduces the duplicates your code needs to handle from "many" to "almost none." Your idempotent handlers catch the rest.
That's the honest answer to what exactly-once means in practice. Not a magic guarantee, but a well-engineered stack of mechanisms that, together, give you the webhook delivery guarantees your system needs.
Gain control over your webhooks
Try Hookdeck to handle your webhook security, observability, queuing, routing, and error recovery.