Deduplication Implementation Guide
Webhook Deduplication goes far beyond simple duplicate removal. This guide shows you how to implement deduplication strategies that solve real business problems: reducing processing costs from Stripe payment retries, filtering noise from Shopify product updates, and building robust Events handling systems.
Understanding deduplication's dual purpose
Deduplication serves two distinct patterns that many developers miss:
- Duplicate suppression: Preventing identical events from processing multiple times (payment retries, network failures)
- Noise suppression: Filtering out irrelevant changes while preserving meaningful updates (inventory fluctuations vs product title changes)
This dual nature explains why deduplication is more powerful than simple exact matching - it's "duplicate suppression + noise suppression" working together.
Deduplication vs Filters: When to use each
The key difference between deduplication and filters isn't just functionality - it's about guarantees and state:
Aspect | Filters | Deduplication |
---|---|---|
Behavior | Deterministic, stateless | Probabilistic, stateful |
Guarantees | Always guaranteed | Best-effort, rare false negatives possible |
Time awareness | No temporal component | Time-windowed comparison |
Use case | Block or allow specific event types forever | Reduce retries and noise within time windows |
Performance | Minimal overhead | Some overhead for state management |
Decision framework: Use Filters when you never want certain events. Use deduplication when you want to reduce duplicate processing and noise within time windows.
Critical principle: Deduplication doesn't replace idempotency in your application logic. It's a load-reduction tool, not a correctness guarantee.
Implementation pattern 1: Stripe payment retries
Business scenario: Your payment processing system receives duplicate Stripe webhooks during network issues, leading to potential double-charging or failed reconciliation.
Real-world problem: Stripe retries webhook deliveries when acknowledgments aren't received. During network connectivity issues, you might receive identical payment_intent.succeeded
events multiple times.
Configuration strategy
Use Stripe event ID for reliable payment deduplication:
{
"type": "deduplication",
"window": 3600000,
"include_fields": ["body.id"]
}
This configuration uses Stripe's unique event ID as the deduplication key, ensuring duplicate delivery attempts within 1 hour are suppressed while being explicit about the matching criteria.
Implementation considerations
Why this works: Each Stripe event has a unique id
field that remains constant across retry attempts, making it the most reliable deduplication key for payment events.
Window sizing: 1 hour accounts for extended network issues while preventing legitimate duplicate payments processed hours apart.
Downstream handling: Even with deduplication, implement idempotent payment processing using Stripe's event IDs.
Monitoring approach
Track these metrics to validate your implementation:
- Duplicate detection rate by event type
- Payment processing latency before/after deduplication
- False negative incidents (duplicates that weren't caught)
- Revenue impact from prevented duplicate processing
Implementation pattern 2: Shopify product updates
Business scenario: Your product sync system receives Shopify products/update
webhooks for every field change, but you only care about changes to title, description, and metafields - not inventory fluctuations.
Real-world problem: Shopify fires products/update
webhooks for inventory changes, price updates, and administrative fields. Processing every update creates unnecessary API calls and database writes.
Configuration strategy
Use field-based deduplication focusing on business-critical fields:
{
"type": "deduplication",
"window": 300000,
"include_fields": [
"body.title",
"body.body_html",
"body.metafields",
"body.vendor",
"body.product_type"
]
}
This configuration only compares title, description, metafields, vendor, and product type. Changes to inventory, timestamps, or other operational fields won't trigger processing.
Alternative: Exclude volatile fields
If your use case requires most fields except known noisy ones:
{
"type": "deduplication",
"window": 300000,
"exclude_fields": [
"body.updated_at",
"body.variants[].updated_at",
"body.variants[].inventory_quantity",
"body.variants[].old_inventory_quantity"
]
}
Implementation considerations
Business impact: Reduces product sync API calls while preserving all meaningful product changes.
Field selection: Include stable business identifiers. Exclude timestamp fields and inventory quantities that change frequently without representing meaningful business updates.
Window sizing: 5 minutes balances duplicate suppression with allowing legitimate rapid updates to different product aspects.
Production deployment strategy
- Baseline measurement: Track current webhook volume and processing costs
- Subset testing: Start with specific stores, tenants, or event types using filters
- Monitoring: Watch for missed important updates and adjust field lists
- Full deployment: Apply to all traffic once confident in configuration
Implementation pattern 3: IoT sensor data processing
Business scenario: Manufacturing system processing sensor data from thousands of devices where network issues cause duplicate telemetry submissions.
Real-world problem: Intermittent connectivity causes sensors to resend readings, creating data quality issues and inflating storage costs.
Configuration strategy
For effective deduplication with millisecond-level timestamp variations, combine Transformations with deduplication:
Step 1: Transform to normalize timestamps
addHandler("transform", (request, context) => {
// Round timestamp to nearest minute for deduplication
const originalTimestamp = new Date(request.body.reading_timestamp);
const roundedTimestamp = new Date(
Math.floor(originalTimestamp.getTime() / 60000) * 60000
);
request.body.deduplication_timestamp = roundedTimestamp.toISOString();
return request;
});
Step 2: Deduplicate using normalized timestamp
{
"type": "deduplication",
"window": 600000,
"include_fields": [
"body.device_id",
"body.deduplication_timestamp",
"body.sensor_type"
]
}
Key insight: Transformation runs first to create a rounded timestamp field, then deduplication uses this normalized timestamp instead of the original millisecond-precise reading_timestamp.
Implementation considerations
Timestamp normalization: Round to appropriate intervals (1-5 minutes) based on expected retry patterns and acceptable data loss.
Rule ordering: Ensure transformation rule appears before deduplication rule in your Connections configuration.
Implementation pattern 4: Multi-tenant Shopify app
Business scenario: Shopify app installed across hundreds of merchant stores, funneling all webhooks into a single Hookdeck Connections for centralized processing.
Real-world problem: Your Shopify app receives product updates from 500+ stores. Multiple stores often have identical product catalogs (franchises, dropshippers) or sync from the same ERP system, creating duplicate processing overhead.
Strategy 1: Cross-store deduplication
Suppress identical events across all stores when stores have similar catalogs:
{
"type": "deduplication",
"window": 300000,
"include_fields": [
"body.handle",
"body.title",
"body.product_type",
"body.vendor"
]
}
Use case: Franchise network where multiple stores sell identical products. Only process one product update regardless of which store sent it.
Key insight: Using body.handle
(product slug) instead of body.id
allows deduplication across stores where the same logical product may have different Shopify IDs but identical handles and attributes.
Strategy 2: Per-store deduplication
Maintain store isolation while reducing noise within each store:
{
"type": "deduplication",
"window": 300000,
"include_fields": [
"headers.x-shopify-shop-domain",
"body.id",
"body.title"
]
}
Use case: Independent stores with unique catalogs. Deduplicate retries within each store but process identical products from different stores separately.
Strategy 3: Hybrid approach
Global deduplication for common actions, per-store for specific updates:
{
"type": "deduplication",
"window": 600000,
"include_fields": [
"body.product_type",
"body.vendor",
"body.tags"
]
}
Business impact:
- Efficiency: Reduces downstream API calls for franchise/dropship scenarios
- Cost reduction: Lower compute and database costs for catalog processing
- Consistency: Ensures meaningful product changes are processed once per logical unit
- Flexibility: Single connection handles hundreds of stores with appropriate deduplication scoping
Architecture benefits: Simplified connection management, centralized monitoring, and flexible tenant isolation through field selection.
Production deployment best practices
Production deployment methodology
Phase 1: Traffic analysis
- Identify current duplication patterns
- Measure baseline processing volumes
- Analyze peak traffic characteristics
Phase 2: Proof of concept
- Create separate test Connections with Filters to route specific subsets (e.g.,
body.store_id
matching test stores) - Validate field selection accuracy on real data
- Measure performance impact on limited scope
- Compare processing volumes between test and production connections
Phase 3: Full deployment
- Apply deduplication rules to production connections
- Monitor duplicate suppression effectiveness
- Adjust configuration based on observed patterns
Monitoring and alerting
Key metrics to track:
- Duplicate detection rate by source/connection
- Processing volume reduction percentage
- False negative incidents
- Memory usage and cache performance
Alerting thresholds:
- Duplicate rate drops below expected baseline (potential configuration issues)
- Memory usage exceeds 80% of allocated cache
- Processing latency increases beyond acceptable thresholds
Advanced patterns
Multi-connection coordination
Challenge: E-commerce platform with separate Connections for orders, products, and customers, where Events might relate across connections.
Solution: Use connection-specific deduplication with shared business keys:
{
"type": "deduplication",
"window": 300000,
"include_fields": [
"headers.x-shopify-shop-domain",
"body.id",
"body.admin_graphql_api_id"
]
}
Apply identical rules across related connections to maintain consistency.
Event sourcing integration
Pattern: Microservices publishing domain events where duplicate prevention must preserve event ordering.
Configuration: Include causation and correlation IDs:
{
"type": "deduplication",
"window": 180000,
"include_fields": [
"body.aggregate_id",
"body.event_version",
"body.causation_id"
]
}
Key insight: Include aggregate version to prevent suppressing legitimate sequential events while catching network-level duplicates.
Troubleshooting common issues
"Events that look identical weren't deduplicated"
Root causes:
- Events arrived outside the configured window
- Subtle field differences in timestamps or metadata
- Best-effort mechanism experienced rare miss
Investigation steps:
- Check event timestamps against window configuration
- Compare raw payloads for field-level differences
- Verify included/excluded field paths are correct
- Review Requests timeline for processing order
"Important events are being suppressed"
Root causes:
- Overly aggressive field exclusion
- Business logic changed but deduplication config didn't update
- Legitimate rapid updates being treated as duplicates
Solutions:
- Refine field lists to be more specific
- Reduce time window if legitimate updates happen quickly
- Add business-specific fields to distinguish meaningful changes
Configuration validation checklist
Before production deployment:
Field selection validation:
- [ ] Test with real webhook samples
- [ ] Verify business-critical fields are included
- [ ] Confirm noisy fields are properly excluded
- [ ] Validate field paths match actual webhook structure
Business logic validation:
- [ ] Confirm deduplication aligns with business requirements
- [ ] Test edge cases (rapid updates, network issues)
- [ ] Verify downstream systems handle reduced event volume
- [ ] Validate monitoring and alerting coverage
Summary
Implementing effective webhook deduplication follows these key strategies:
Choose the right approach: Use exact deduplication for payment retry scenarios (Stripe body.id
) and field-based deduplication for noise reduction (Shopify product updates excluding updated_at
).
Start with proven patterns: Begin with the Shopify product update configuration for e-commerce or Stripe payment retry setup for payment processing to see immediate impact.
Deploy incrementally: Use the production deployment methodology - start with proof of concept on filtered traffic, measure impact, then scale to full production.
Combine with related features: Integrate transformations for timestamp normalization and maintain downstream idempotency for complete reliability.
Monitor and optimize: Track duplicate detection rates and processing volume reduction using the configuration validation checklist to ensure optimal performance.
Related resources
- Deduplication documentation - Complete feature documentation
- Connection Rules - How to configure deduplication rules
- Filters - Find out more about filters and when to use them
- Transformations - Transform event payloads within Hookdeck