Deduplication

View as Markdown

Deduplication removes redundant events by comparing them against previously processed events within a configurable time window. You control what makes events "identical" - from exact payload matching to comparing only specific fields that matter to your use case.

Deduplication addresses common scenarios where redundant events cause unnecessary processing:

  • Multi-app installations: Deduplicate by request ID when multiple apps send the same webhook
  • Noisy updates: Deduplicate by everything except volatile fields like inventory or timestamps
  • Request redelivery: Remove duplicate webhooks and events that are sent multiple times by the producer

Events identified as duplicates are ignored and not delivered to the destination. You can view events that were ignored as part of your Requests.

Deduplication strategies

Hookdeck offers two deduplication strategies within a 1 second to 1 hour time window:

  • Exact deduplication: The entire event is the key
  • Field-based deduplication: Choose which fields define the key (via inclusion or exclusion)

Deduplication is a best-effort feature and is not guaranteed. Always implement idempotent request handling in your destination.

Exact deduplication

The entire payload serves as the deduplication key. Events must be identical to be considered duplicates.

{
  "type": "deduplicate",
  "window": 60000
}

Use when: You want to drop perfectly identical webhooks, such as retry storms.

Field-based deduplication

Define the deduplication key by either including specific fields or excluding volatile ones.

Include fields

Only specified fields serve as the deduplication key. Events with matching values in these fields are considered duplicates.

{
  "type": "deduplicate",
  "window": 300000,
  "include_fields": ["headers.x-request-id"]
}

Use when: You have a unique identifier that defines duplicate events, like request IDs or composite keys.

Exclude fields

Everything except specified fields serves as the deduplication key.

{
  "type": "deduplicate",
  "window": 300000,
  "exclude_fields": ["body.updated_at", "body.inventory_quantity"]
}

Use when: You want to ignore events where only non-essential fields changed.

Time window

The time window defines how long Hookdeck remembers previously seen events when applying deduplication rules.

When an event arrives, Hookdeck computes a hash based on your deduplication strategy. Hookdeck then checks whether the same hash has been seen within the configured time window:

  • If a match is found: The new event is discarded and marked as a duplicate
  • If no match is found: The event is delivered and the hash is stored for the duration of the window

Events are automatically evicted from the deduplication cache once their time window has elapsed.

Configuration

You can set the time window between 1 second and 1 hour.

Choosing a window size

  • Shorter windows (e.g., 1 minute)
    Allow legitimate retries through after a short period. However, retries that happen after the window expires will be delivered again.

  • Longer windows (e.g., 1 hour)
    Better suppress retries that might occur well after the original event. Be cautious: if your source system legitimately emits multiple events with the same identifiers in that timeframe, they may be discarded as duplicates.

Changing the deduplication configuration resets the deduplication cache for the connection, so all events are treated as new from that point forward.

Field path resolution

When using field-based deduplication:

  • Field paths must start with headers, body, query, or path
  • Fields not present in the payload are treated as empty strings
  • Objects and arrays are converted to JSON strings for comparison
  • For non-JSON bodies (e.g., XML), body resolves to full content, but body.field resolves to empty string
  • Booleans, numbers, and strings are compared as their respective types

Limitations

  • Time windows are limited to 1 second to 1 hour
  • Changing the deduplication configuration resets the deduplication cache for the connection

Deduplication is not guaranteed due to distributed system constraints. Always implement idempotent request handling in your destination.

Create a deduplication rule

Apply deduplication rules to a connection, just like any other rule.

  1. Open the connection rules configuration.
  2. Click Add Rule and select Deduplication.
  3. Configure the deduplication settings:
    • Set the Time Window (1 second to 1 hour)
    • Select a Strategy:
      • Exact: Compare entire payloads
      • Include fields: Specify fields to use as the key
      • Exclude fields: Specify fields to ignore
  4. Click Save to apply your changes.
POST /2025-07-01/connections
{
  "name": "shopify-products",
  "source_id": "src_xyz",
  "destination_id": "dest_abc",
  "rules": [
    {
      "type": "deduplicate",
      "window": 300000,
      "exclude_fields": ["body.updated_at", "body.inventory_quantity"]
    }
  ]
}

Validation rules:

  • window: Required, between 60000ms (1 min) and 3600000ms (1 hour)
  • Cannot specify both include_fields and exclude_fields
  • Field paths must start with headers, body, query, or path

Redundant events received on the connection are now removed before delivery.

Edit a deduplication rule

Edit a deduplication rule to change how redundant events are detected.

  1. Open the connection rules configuration.
  2. Click the deduplication rule to edit.
  3. Modify the time window or deduplication strategy.
  4. Click Save to apply your changes.

Changing the deduplication configuration resets the deduplication state. All historical events are ignored, and deduplication starts fresh.

Delete a deduplication rule

Delete a deduplication rule by following the instructions for configuring connection rules and clicking the trash icon to remove the deduplication rule from the connection rules.

Example scenarios

Scenario 1: Multi-app webhook deduplication

Deduplicate webhooks from multiple app installations using request ID:

{
  "type": "deduplicate",
  "window": 60000,
  "include_fields": ["headers.x-request-id"]
}

Only the request ID determines if events are redundant, regardless of other payload differences.

Scenario 2: Filtering noisy Shopify updates

Ignore product updates where only inventory or timestamps changed:

{
  "type": "deduplicate",
  "window": 300000,
  "exclude_fields": [
    "body.variants[].inventory_quantity",
    "body.updated_at",
    "headers.x-shopify-webhook-id"
  ]
}

Events are considered redundant if everything except these volatile fields matches within 5 minutes.

Scenario 3: Multi-tenant composite keys

Use multiple fields to create tenant-specific deduplication:

{
  "type": "deduplicate",
  "window": 600000,
  "include_fields": [
    "body.store_id",
    "body.product_id",
    "body.action"
  ]
}

Events are redundant only when the same store, product, and action combination occurs within 10 minutes.

Deduplication Patterns ->

Explore common deduplication strategies and their use cases.