Deduplication Implementation Guide

View as Markdown

Webhook Deduplication goes far beyond simple duplicate removal. This guide shows you how to implement deduplication strategies that solve real business problems: reducing processing costs from Stripe payment retries, filtering noise from Shopify product updates, and building robust Events handling systems.

Understanding deduplication's dual purpose

Deduplication serves two distinct patterns that many developers miss:

Duplicate suppression: Preventing identical events from processing multiple times (payment retries, network failures)
Noise suppression: Filtering out irrelevant changes while preserving meaningful updates (inventory fluctuations vs product title changes)

This dual nature explains why deduplication is more powerful than simple exact matching - it's "duplicate suppression + noise suppression" working together.

Deduplication vs Filters: When to use each

The key difference between deduplication and filters isn't just functionality - it's about guarantees and state:

Aspect	Filters	Deduplication
Behavior	Stateless	Stateful
Guarantees	Always guaranteed	Best-effort, rare false negatives possible
Time awareness	No temporal component	Time-windowed comparison
Use case	Block or allow specific event types forever	Reduce retries and noise within time windows

Decision framework: Use Filters when you never want certain events. Use deduplication when you want to reduce duplicate processing and noise within time windows.

Critical principle: Deduplication doesn't replace idempotency in your application logic. It's a load-reduction tool, not a correctness guarantee.

Implementation pattern 1: Stripe payment retries

Business scenario: Your payment processing system receives duplicate Stripe webhooks during network issues, leading to potential double-charging or failed reconciliation.

Real-world problem: Stripe retries webhook deliveries when acknowledgments aren't received. During network connectivity issues, you might receive identical payment_intent.succeeded events multiple times.

Configuration strategy

Use Stripe event ID for reliable payment deduplication:

{
  "type": "deduplicate",
  "window": 3600000,
  "include_fields": ["body.id"]
}

This configuration uses Stripe's unique event ID as the deduplication key, ensuring duplicate delivery attempts within 1 hour are suppressed while being explicit about the matching criteria.

Implementation considerations

Why this works: Each Stripe event has a unique id field that remains constant across retry attempts, making it the most reliable deduplication key for payment events.

Window sizing: 1 hour accounts for extended network issues while preventing legitimate duplicate payments processed hours apart.

Downstream handling: Even with deduplication, implement idempotent payment processing using Stripe's event IDs.

Monitoring approach

Track these metrics to validate your implementation:

Duplicate detection rate by event type
Payment processing latency before/after deduplication
False negative incidents (duplicates that weren't caught)
Revenue impact from prevented duplicate processing

Implementation pattern 2: Shopify product updates

Business scenario: Your product sync system receives Shopify products/update webhooks for every field change, but you only care about changes to title, description, and metafields - not inventory fluctuations.

Real-world problem: Shopify fires products/update webhooks for inventory changes, price updates, and administrative fields. Processing every update creates unnecessary API calls and database writes.

Configuration strategy

Use field-based deduplication focusing on business-critical fields:

{
  "type": "deduplicate",
  "window": 300000,
  "include_fields": [
    "body.title",
    "body.body_html", 
    "body.metafields",
    "body.vendor",
    "body.product_type"
  ]
}

This configuration only compares title, description, metafields, vendor, and product type. Changes to inventory, timestamps, or other operational fields won't trigger processing.

Alternative: Exclude volatile fields

If your use case requires most fields except known noisy ones:

{
  "type": "deduplicate", 
  "window": 300000,
  "exclude_fields": [
    "body.updated_at",
    "body.variants[].updated_at",
    "body.variants[].inventory_quantity",
    "body.variants[].old_inventory_quantity"
  ]
}

Implementation considerations

Business impact: Reduces product sync API calls while preserving all meaningful product changes.

Field selection: Include stable business identifiers. Exclude timestamp fields and inventory quantities that change frequently without representing meaningful business updates.

Window sizing: 5 minutes balances duplicate suppression with allowing legitimate rapid updates to different product aspects.

Production deployment strategy

Baseline measurement: Track current webhook volume and processing costs
Subset testing: Start with specific stores, tenants, or event types using filters
Monitoring: Watch for missed important updates and adjust field lists
Full deployment: Apply to all traffic once confident in configuration

Implementation pattern 3: IoT sensor data processing

Business scenario: Manufacturing system processing sensor data from thousands of devices where network issues cause duplicate telemetry submissions.

Real-world problem: Intermittent connectivity causes sensors to resend readings, creating data quality issues and inflating storage costs.

Configuration strategy

For effective deduplication with millisecond-level timestamp variations, combine Transformations with deduplication:

Step 1: Transform to normalize timestamps

addHandler("transform", (request, context) => {
  // Round timestamp to nearest minute for deduplication
  const originalTimestamp = new Date(request.body.reading_timestamp);
  const roundedTimestamp = new Date(
    Math.floor(originalTimestamp.getTime() / 60000) * 60000
  );
  
  request.body.deduplication_timestamp = roundedTimestamp.toISOString();
  return request;
});

Step 2: Deduplicate using normalized timestamp

{
  "type": "deduplicate",
  "window": 600000,
  "include_fields": [
    "body.device_id",
    "body.deduplication_timestamp", 
    "body.sensor_type"
  ]
}

Key insight: Transformation runs first to create a rounded timestamp field, then deduplication uses this normalized timestamp instead of the original millisecond-precise reading_timestamp.

Implementation considerations

Timestamp normalization: Round to appropriate intervals (1-5 minutes) based on expected retry patterns and acceptable data loss.

Rule ordering: Ensure transformation rule appears before deduplication rule in your Connections configuration.

Implementation pattern 4: Multi-app Shopify builder

Business scenario: Shopify application builder has developed multiple successful apps that register for overlapping webhook events. When installed on the same store, each app receives identical events, creating excessive noise and processing overhead.

Real-world problem: Your app builder platform has created apps for inventory management, order fulfillment, and customer analytics. All three apps register for orders/created webhooks. When a merchant installs all three apps, you receive the same order event three times - once per app installation.

Configuration strategy

Exclude app-specific identifiers to deduplicate at the store level:

{
  "type": "deduplicate",
  "window": 300000,
  "exclude_fields": [
    "headers.x-shopify-webhook-id",
    "headers.x-shopify-event-id", 
    "headers.x-shopify-triggered-at",
    "headers.x-shopify-hmac-sha256"
  ]
}

Key insight: By excluding the webhook ID, event ID, timestamp, and signature fields that can change across duplicate events, you deduplicate based on the actual event content only. This ensures you receive each meaningful business event only once per store, regardless of how many of your apps are installed.

Alternative: Include store-specific fields

For more explicit control, include only the fields that matter for business logic:

{
  "type": "deduplicate",
  "window": 300000,
  "include_fields": [
    "headers.x-shopify-shop-domain",
    "body.id",
    "body.order_number",
    "body.total_price"
  ]
}

Implementation considerations

Why this works: Each app installation generates unique webhook IDs, event IDs, and other meta information. However, the underlying business event (order, product update, etc.) remains the same. Excluding these technical identifiers focuses deduplication on business content.

Window sizing: 5 minutes accounts for delivery timing differences between app installations while preventing suppression of legitimate rapid events.

Business impact:

Noise reduction: Eliminates redundant processing of identical business events
Cost efficiency: Reduces compute and storage costs for multi-app installations
Simplified architecture: Single webhook endpoint handles all apps with automatic deduplication
Improved reliability: Reduces downstream system load and potential race conditions

Monitoring approach: Track the ratio of received events to unique business events to measure deduplication effectiveness across different store configurations.

Advanced patterns

Multi-connection coordination

Challenge: E-commerce platform with separate Connections for orders, products, and customers, where Events might relate across connections.

Solution: Use connection-specific deduplication with shared business keys:

{
  "type": "deduplicate", 
  "window": 300000,
  "include_fields": [
    "headers.x-shopify-shop-domain",
    "body.id",
    "body.admin_graphql_api_id"
  ]
}

Apply identical rules across related connections to maintain consistency.

Event sourcing integration

Pattern: Microservices publishing domain events where duplicate prevention must preserve event ordering.

Configuration: Include causation and correlation IDs:

{
  "type": "deduplicate",
  "window": 180000, 
  "include_fields": [
    "body.aggregate_id",
    "body.event_version",
    "body.causation_id"
  ]
}

Key insight: Include aggregate version to prevent suppressing legitimate sequential events while catching network-level duplicates.

Troubleshooting common issues

"Events that look identical weren't deduplicated"

Root causes:

Events arrived outside the configured window
Subtle field differences in timestamps or metadata
Best-effort mechanism experienced rare miss

Investigation steps:

Check event timestamps against window configuration
Compare raw payloads for field-level differences
Verify included/excluded field paths are correct
Review Requests timeline for processing order

"Important events are being suppressed"

Root causes:

Overly aggressive field exclusion
Business logic changed but deduplication config didn't update
Legitimate rapid updates being treated as duplicates

Solutions:

Refine field lists to be more specific
Reduce time window if legitimate updates happen quickly
Add business-specific fields to distinguish meaningful changes

Summary

Implementing effective webhook deduplication follows these key strategies:

Choose the right approach: Use exact deduplication for payment retry scenarios (Stripe body.id) and field-based deduplication for noise reduction (Shopify product updates excluding updated_at).

Start with proven patterns: Begin with the Shopify product update configuration for e-commerce or Stripe payment retry setup for payment processing to see immediate impact.

Combine with related features: Integrate transformations for timestamp normalization and maintain downstream idempotency for complete reliability.

Monitor and optimize: Track duplicate detection rates and processing volume reduction. For example, review the Ignored events within metrics.

Deduplication documentation - Complete feature documentation
Connection Rules - How to configure deduplication rules
Filters - Find out more about filters and when to use them
Transformations - Transform event payloads within Hookdeck

<- Previous Customize Outgoing Webhook Headers Next -> Define Event Types for Outgoing Webhooks

Deduplication Implementation Guide

Understanding deduplication's dual purpose

Deduplication vs Filters: When to use each

Implementation pattern 1: Stripe payment retries

Configuration strategy

Implementation considerations

Monitoring approach

Implementation pattern 2: Shopify product updates

Configuration strategy

Alternative: Exclude volatile fields

Implementation considerations

Production deployment strategy

Implementation pattern 3: IoT sensor data processing

Configuration strategy

Implementation considerations

Implementation pattern 4: Multi-app Shopify builder

Configuration strategy

Alternative: Include store-specific fields

Implementation considerations

Advanced patterns

Multi-connection coordination

Event sourcing integration

Troubleshooting common issues

"Events that look identical weren't deduplicated"

"Important events are being suppressed"

Summary

Related resources