Deduplication Implementation Guide
Webhook Deduplication goes far beyond simple duplicate removal. This guide shows you how to implement deduplication strategies that solve real business problems: reducing processing costs from Stripe payment retries, filtering noise from Shopify product updates, and building robust Events handling systems.
Understanding deduplication's dual purpose
Deduplication serves two distinct patterns that many developers miss:
- Duplicate suppression: Preventing identical events from processing multiple times (payment retries, network failures)
- Noise suppression: Filtering out irrelevant changes while preserving meaningful updates (inventory fluctuations vs product title changes)
This dual nature explains why deduplication is more powerful than simple exact matching - it's "duplicate suppression + noise suppression" working together.
Deduplication vs Filters: When to use each
The key difference between deduplication and filters isn't just functionality - it's about guarantees and state:
Aspect | Filters | Deduplication |
---|---|---|
Behavior | Stateless | Stateful |
Guarantees | Always guaranteed | Best-effort, rare false negatives possible |
Time awareness | No temporal component | Time-windowed comparison |
Use case | Block or allow specific event types forever | Reduce retries and noise within time windows |
Decision framework: Use Filters when you never want certain events. Use deduplication when you want to reduce duplicate processing and noise within time windows.
Critical principle: Deduplication doesn't replace idempotency in your application logic. It's a load-reduction tool, not a correctness guarantee.
Implementation pattern 1: Stripe payment retries
Business scenario: Your payment processing system receives duplicate Stripe webhooks during network issues, leading to potential double-charging or failed reconciliation.
Real-world problem: Stripe retries webhook deliveries when acknowledgments aren't received. During network connectivity issues, you might receive identical payment_intent.succeeded
events multiple times.
Configuration strategy
Use Stripe event ID for reliable payment deduplication:
{
"type": "deduplication",
"window": 3600000,
"include_fields": ["body.id"]
}
This configuration uses Stripe's unique event ID as the deduplication key, ensuring duplicate delivery attempts within 1 hour are suppressed while being explicit about the matching criteria.
Implementation considerations
Why this works: Each Stripe event has a unique id
field that remains constant across retry attempts, making it the most reliable deduplication key for payment events.
Window sizing: 1 hour accounts for extended network issues while preventing legitimate duplicate payments processed hours apart.
Downstream handling: Even with deduplication, implement idempotent payment processing using Stripe's event IDs.
Monitoring approach
Track these metrics to validate your implementation:
- Duplicate detection rate by event type
- Payment processing latency before/after deduplication
- False negative incidents (duplicates that weren't caught)
- Revenue impact from prevented duplicate processing
Implementation pattern 2: Shopify product updates
Business scenario: Your product sync system receives Shopify products/update
webhooks for every field change, but you only care about changes to title, description, and metafields - not inventory fluctuations.
Real-world problem: Shopify fires products/update
webhooks for inventory changes, price updates, and administrative fields. Processing every update creates unnecessary API calls and database writes.
Configuration strategy
Use field-based deduplication focusing on business-critical fields:
{
"type": "deduplication",
"window": 300000,
"include_fields": [
"body.title",
"body.body_html",
"body.metafields",
"body.vendor",
"body.product_type"
]
}
This configuration only compares title, description, metafields, vendor, and product type. Changes to inventory, timestamps, or other operational fields won't trigger processing.
Alternative: Exclude volatile fields
If your use case requires most fields except known noisy ones:
{
"type": "deduplication",
"window": 300000,
"exclude_fields": [
"body.updated_at",
"body.variants[].updated_at",
"body.variants[].inventory_quantity",
"body.variants[].old_inventory_quantity"
]
}
Implementation considerations
Business impact: Reduces product sync API calls while preserving all meaningful product changes.
Field selection: Include stable business identifiers. Exclude timestamp fields and inventory quantities that change frequently without representing meaningful business updates.
Window sizing: 5 minutes balances duplicate suppression with allowing legitimate rapid updates to different product aspects.
Production deployment strategy
- Baseline measurement: Track current webhook volume and processing costs
- Subset testing: Start with specific stores, tenants, or event types using filters
- Monitoring: Watch for missed important updates and adjust field lists
- Full deployment: Apply to all traffic once confident in configuration
Implementation pattern 3: IoT sensor data processing
Business scenario: Manufacturing system processing sensor data from thousands of devices where network issues cause duplicate telemetry submissions.
Real-world problem: Intermittent connectivity causes sensors to resend readings, creating data quality issues and inflating storage costs.
Configuration strategy
For effective deduplication with millisecond-level timestamp variations, combine Transformations with deduplication:
Step 1: Transform to normalize timestamps
addHandler("transform", (request, context) => {
// Round timestamp to nearest minute for deduplication
const originalTimestamp = new Date(request.body.reading_timestamp);
const roundedTimestamp = new Date(
Math.floor(originalTimestamp.getTime() / 60000) * 60000
);
request.body.deduplication_timestamp = roundedTimestamp.toISOString();
return request;
});
Step 2: Deduplicate using normalized timestamp
{
"type": "deduplication",
"window": 600000,
"include_fields": [
"body.device_id",
"body.deduplication_timestamp",
"body.sensor_type"
]
}
Key insight: Transformation runs first to create a rounded timestamp field, then deduplication uses this normalized timestamp instead of the original millisecond-precise reading_timestamp
.
Implementation considerations
Timestamp normalization: Round to appropriate intervals (1-5 minutes) based on expected retry patterns and acceptable data loss.
Rule ordering: Ensure transformation rule appears before deduplication rule in your Connections configuration.
Implementation pattern 4: Multi-app Shopify builder
Business scenario: Shopify application builder has developed multiple successful apps that register for overlapping webhook events. When installed on the same store, each app receives identical events, creating excessive noise and processing overhead.
Real-world problem: Your app builder platform has created apps for inventory management, order fulfillment, and customer analytics. All three apps register for orders/created
webhooks. When a merchant installs all three apps, you receive the same order event three times - once per app installation.
Configuration strategy
Exclude app-specific identifiers to deduplicate at the store level:
{
"type": "deduplication",
"window": 300000,
"exclude_fields": [
"headers.x-shopify-webhook-id",
"headers.x-shopify-event-id",
"headers.x-shopify-triggered-at",
"headers.x-shopify-hmac-sha256"
]
}
Key insight: By excluding the webhook ID, event ID, timestamp, and signature fields that can change across duplicate events, you deduplicate based on the actual event content only. This ensures you receive each meaningful business event only once per store, regardless of how many of your apps are installed.
Alternative: Include store-specific fields
For more explicit control, include only the fields that matter for business logic:
{
"type": "deduplication",
"window": 300000,
"include_fields": [
"headers.x-shopify-shop-domain",
"body.id",
"body.order_number",
"body.total_price"
]
}
Implementation considerations
Why this works: Each app installation generates unique webhook IDs, event IDs, and other meta information. However, the underlying business event (order, product update, etc.) remains the same. Excluding these technical identifiers focuses deduplication on business content.
Window sizing: 5 minutes accounts for delivery timing differences between app installations while preventing suppression of legitimate rapid events.
Business impact:
- Noise reduction: Eliminates redundant processing of identical business events
- Cost efficiency: Reduces compute and storage costs for multi-app installations
- Simplified architecture: Single webhook endpoint handles all apps with automatic deduplication
- Improved reliability: Reduces downstream system load and potential race conditions
Monitoring approach: Track the ratio of received events to unique business events to measure deduplication effectiveness across different store configurations.
Advanced patterns
Multi-connection coordination
Challenge: E-commerce platform with separate Connections for orders, products, and customers, where Events might relate across connections.
Solution: Use connection-specific deduplication with shared business keys:
{
"type": "deduplication",
"window": 300000,
"include_fields": [
"headers.x-shopify-shop-domain",
"body.id",
"body.admin_graphql_api_id"
]
}
Apply identical rules across related connections to maintain consistency.
Event sourcing integration
Pattern: Microservices publishing domain events where duplicate prevention must preserve event ordering.
Configuration: Include causation and correlation IDs:
{
"type": "deduplication",
"window": 180000,
"include_fields": [
"body.aggregate_id",
"body.event_version",
"body.causation_id"
]
}
Key insight: Include aggregate version to prevent suppressing legitimate sequential events while catching network-level duplicates.
Troubleshooting common issues
"Events that look identical weren't deduplicated"
Root causes:
- Events arrived outside the configured window
- Subtle field differences in timestamps or metadata
- Best-effort mechanism experienced rare miss
Investigation steps:
- Check event timestamps against window configuration
- Compare raw payloads for field-level differences
- Verify included/excluded field paths are correct
- Review Requests timeline for processing order
"Important events are being suppressed"
Root causes:
- Overly aggressive field exclusion
- Business logic changed but deduplication config didn't update
- Legitimate rapid updates being treated as duplicates
Solutions:
- Refine field lists to be more specific
- Reduce time window if legitimate updates happen quickly
- Add business-specific fields to distinguish meaningful changes
Summary
Implementing effective webhook deduplication follows these key strategies:
Choose the right approach: Use exact deduplication for payment retry scenarios (Stripe body.id
) and field-based deduplication for noise reduction (Shopify product updates excluding updated_at
).
Start with proven patterns: Begin with the Shopify product update configuration for e-commerce or Stripe payment retry setup for payment processing to see immediate impact.
Combine with related features: Integrate transformations for timestamp normalization and maintain downstream idempotency for complete reliability.
Monitor and optimize: Track duplicate detection rates and processing volume reduction. For example, review the Ignored events within metrics.
Related resources
- Deduplication documentation - Complete feature documentation
- Connection Rules - How to configure deduplication rules
- Filters - Find out more about filters and when to use them
- Transformations - Transform event payloads within Hookdeck