Gareth Wilson Gareth Wilson

Building CRM enrichment with HubSpot and OpenAI

Published


A prospect fills out a "Request a demo" form on your marketing site. Within seconds of the contact being created in HubSpot, an AI enrichment step has identified the company from the email domain, classified the prospect's role from their job title, looked up the company's funding stage and headcount, scored the lead from 1 to 5 against your ICP, summarised what they wrote in the free-text field, and written all of it back to the HubSpot record as custom properties. By the time the SDR opens HubSpot in the morning, the lead is already triaged — they know which prospects to call first and what to say.

That's the workflow most B2B teams are after with HubSpot, an LLM, and a thin glue layer. The HubSpot side is easy. The LLM side is easy. The glue (what happens between the contact-created event and the property-update API call) is where it falls over, especially as you start ingesting leads from a second, third, and fourth source.

This guide walks through that glue layer end to end: the architecture, seven concrete steps to wire it up, and the production concerns you'll hit the moment you start running real volume.

The flow

flowchart TB
    A[HubSpot<br/>contact created/updated] --> B[HubSpot<br/>webhook]
    A2[Other lead sources<br/>Typeform, ads,<br/>partner APIs] --> C
    B -->|POST batch| C[Hookdeck<br/>inbound source]
    C -->|filter + transform<br/>+ rate limit| D[OpenAI<br/>enrichment handler]
    D -.->|LLM call| E[GPT-4 / GPT-4o]
    D -->|POST enriched record| F[Hookdeck<br/>callback source]
    F -->|transform + retry| G[HubSpot API<br/>property update]

There are two webhook flows that need to be reliable:

  1. HubSpot's contact event into the AI step — must not drop an event, and must not waste tokens on leads that don't deserve enrichment. HubSpot batches events and signs them with x-hubspot-signature-v3, but it doesn't filter, it doesn't rate-limit your downstream, and it doesn't deduplicate against prior runs.
  2. The enriched record back into HubSpot — must reach HubSpot reliably. HubSpot's API has rate limits and occasional 5xx; a failed update means the SDR sees an un-triaged lead.

Most teams build this with a POST /webhooks/hubspot endpoint that calls OpenAI inline, then calls HubSpot's API to write the result back. That's enough to demo. It's not enough to run, and the reasons it isn't enough are where Hookdeck Event Gateway helps.

What you'll need

  • A HubSpot account with a private app and webhook subscriptions enabled
  • An OpenAI API key with access to a chat completions model
  • A Hookdeck Event Gateway account — the free tier covers this entire workflow at low volume
  • Hookdeck CLI installed: npm install hookdeck-cli -g or brew install hookdeck/hookdeck/hookdeck
  • A handler somewhere — a Cloudflare Worker, a Vercel function, or a small Node service — that receives the lead payload, calls OpenAI, and POSTs the result back

Step 1: Create the Hookdeck Event Gateway source for HubSpot

In the Hookdeck Event Gateway dashboard:

  • Create Connection → New Source
  • Type: HubSpot (Event Gateway has a pre-configured source for HubSpot — see the HubSpot webhooks guide)
  • Name: hubspot-contact-events
  • Hookdeck will ask for your HubSpot client secret so it can verify the x-hubspot-signature-v3 header on every request

Copy the generated source URL.

Step 2: Subscribe HubSpot webhooks to the Hookdeck URL

In HubSpot, open your Private App → Webhooks → Create subscription:

  • Target URL: paste the Hookdeck source URL from step 1
  • Subscriptions: enable contact.creation and contact.propertyChange (and any others you care about)
  • Active: yes

HubSpot batches events into a single request, so a single payload can carry multiple contact creations. Your handler — or Hookdeck's transformation — needs to handle that.

If you want to centralise lead ingestion across additional sources (a Typeform, a Webflow form, an ad-network webhook, a partner API), point them all at the same Hookdeck source. The transformation step in step 4 normalises every shape into one canonical payload before it reaches OpenAI.

Step 3: Add the destination — your OpenAI enrichment handler

Your handler is a small service that:

  1. Receives the canonical lead payload
  2. Calls OpenAI with an enrichment prompt
  3. Parses the model's structured output
  4. POSTs the enriched record to a second Hookdeck source (step 6)

A minimal Cloudflare Worker version:

export default {
  async fetch(request, env) {
    const lead = await request.json();

    const completion = await fetch('https://api.openai.com/v1/chat/completions', {
      method: 'POST',
      headers: {
        'authorization': `Bearer ${env.OPENAI_API_KEY}`,
        'content-type': 'application/json',
      },
      body: JSON.stringify({
        model: 'gpt-4o-mini',
        response_format: { type: 'json_object' },
        messages: [
          { role: 'system', content: ENRICHMENT_PROMPT },
          { role: 'user', content: JSON.stringify(lead) },
        ],
      }),
    });

    const result = await completion.json();
    const enrichment = JSON.parse(result.choices[0].message.content);

    // Send enriched record back through Hookdeck for write-back to HubSpot
    await fetch(env.HOOKDECK_CALLBACK_URL, {
      method: 'POST',
      headers: { 'content-type': 'application/json' },
      body: JSON.stringify({
        contact_id: lead.hubspot_contact_id,
        enrichment,
      }),
    });

    return new Response('ok', { status: 200 });
  },
};

Configure the destination in Hookdeck Event Gateway:

  • Type: HTTP
  • URL: your handler URL
  • Authentication: an HTTP header carrying a shared secret so your handler can verify the request came through Hookdeck

Step 4: Add filter, transformation, and rate-limit rules

This is where Hookdeck Event Gateway does most of its work.

Transformation — normalize HubSpot's batched, ID-only payload into a canonical lead shape your handler expects. HubSpot's webhook payload only contains contact IDs and property names, not values. The transformation step can fetch full contact details from HubSpot's API and emit a richer payload:

addHandler('transform', async (request, context) => {
  const events = request.body; // array of HubSpot events
  const enrichmentPayloads = [];

  for (const event of events) {
    if (event.subscriptionType !== 'contact.creation') continue;

    // Fetch contact details from HubSpot
    const contact = await fetch(
      `https://api.hubapi.com/crm/v3/objects/contacts/${event.objectId}?properties=email,firstname,lastname,jobtitle,company,notes_last_contacted`,
      { headers: { authorization: `Bearer ${context.secrets.HUBSPOT_TOKEN}` } }
    ).then(r => r.json());

    enrichmentPayloads.push({
      source: 'hubspot',
      hubspot_contact_id: event.objectId,
      email: contact.properties.email,
      first_name: contact.properties.firstname,
      last_name: contact.properties.lastname,
      job_title: contact.properties.jobtitle,
      company: contact.properties.company,
      notes: contact.properties.notes_last_contacted,
      received_at: new Date().toISOString(),
    });
  }

  request.body = enrichmentPayloads;
  return request;
});

If you're also ingesting from Typeform or an ad webhook, add if/else branches that map each source into the same canonical shape.

Filter — drop leads that don't deserve enrichment. Personal email domains, missing job titles, internal test contacts — none of those should burn OpenAI tokens:

{
  "body": [
    {
      "email": {
        "$exists": true,
        "$not": {
          "$regex": "@(gmail|yahoo|hotmail|outlook|aol|icloud)\\.com$"
        }
      },
      "job_title": {
        "$exists": true,
        "$ne": null
      }
    }
  ]
}

You can tune the filter as your ICP definition evolves — every change is a configuration update, not a redeploy.

Rate limit — protect OpenAI from bursty HubSpot batches. A new ad campaign or a marketing automation can fire hundreds of contact.creation events in a minute:

  • Rate: 5 per second
  • Burst: 10

Hookdeck Event Gateway queues the rest and feeds them to your handler at a sustainable rate.

Retry policy — handle transient OpenAI errors and your handler's cold-starts:

  • Initial delay: 30 seconds
  • Max attempts: 10
  • Max age: 24 hours
  • Apply on status codes: 408, 429, 500, 502, 503, 504

Step 5: Test the inbound leg locally with the CLI

Before hitting the live OpenAI API, route the inbound connection to the CLI:

hookdeck login
hookdeck listen 3000 hubspot-contact-events

Run a local server that prints the transformed payload and stubs the OpenAI call:

// inspect.js
const http = require('http');
http.createServer((req, res) => {
  let body = '';
  req.on('data', chunk => body += chunk);
  req.on('end', () => {
    console.log('Canonical lead:', JSON.parse(body));
    res.writeHead(200);
    res.end('ok');
  });
}).listen(3000);

Trigger a test contact creation in HubSpot. The CLI streams the transformed payload to your terminal so you can verify the shape before your handler sees it. Press r to replay the most recent event without re-triggering — useful when you're iterating on the prompt or the canonical shape.

Once it looks correct, point the destination back at your real handler.

Step 6: Wire the enriched record back through Hookdeck

When the handler finishes, it POSTs the enriched record to a second Hookdeck Event Gateway source that fans it out to HubSpot and any other system that needs to know.

Create a second connection:

  • Source: an HTTP source named lead-enrichment-results
  • Destination: HubSpot's /crm/v3/objects/contacts/{id} endpoint, with the API key as a custom auth header

A transformation reshapes the enrichment output into HubSpot's property-update format:

addHandler('transform', (request, context) => {
  const { contact_id, enrichment } = request.body;

  request.url = `https://api.hubapi.com/crm/v3/objects/contacts/${contact_id}`;
  request.method = 'PATCH';
  request.body = {
    properties: {
      ai_lead_score: String(enrichment.score),
      ai_company_size: enrichment.company_size,
      ai_funding_stage: enrichment.funding_stage,
      ai_icp_fit: enrichment.icp_fit,
      ai_summary: enrichment.summary,
      ai_enriched_at: new Date().toISOString(),
    },
  };

  return request;
});

Retry policy — aggressive on this leg. Losing the write-back means the SDR sees an un-triaged lead:

  • Initial delay: 15 seconds
  • Max attempts: 15
  • Max age: 72 hours

If you want enrichment results to also reach Slack (for high-scoring leads) or a data warehouse, add additional connections from lead-enrichment-results with filters on enrichment.score.

Step 7: Run the full chain end to end

Create a test contact in HubSpot with a real business email. You should see, in order:

  1. HubSpot fires a contact.creation event into hubspot-contact-events
  2. Hookdeck Event Gateway fetches the contact's properties, normalises into the canonical shape, filters, and delivers to your handler
  3. The handler calls OpenAI and posts the enriched record to lead-enrichment-results
  4. Hookdeck Event Gateway transforms into HubSpot's PATCH format and writes back
  5. The HubSpot contact record now shows the AI-enriched properties

If anything fails (a malformed prompt, an OpenAI rate limit, a HubSpot 5xx) the Hookdeck Event Gateway dashboard tells you exactly where, with full request/response payloads at every step.

Why Hookdeck and not just a try/except in your app server?

Three properties of CRM enrichment workflows make a direct integration the wrong choice once you're past the demo:

Lead events come from many sources, not just HubSpot. Over time you'll add a Typeform, an ad-network webhook, a Clearbit form, a partner API. Hookdeck Event Gateway consolidates all of them into a single ingestion point with a single canonical shape, so your enrichment logic doesn't grow a tangle of source-specific branches. Adding a new source becomes a new connection in the dashboard with a new transformation, not a refactor of the enrichment service.

Filtering keeps low-quality leads out of the AI step. OpenAI tokens have a cost. Personal email domains, missing job titles, test entries, and duplicate submissions are all noise. Hookdeck Event Gateway's filter rules cut them out before they reach your handler — saving tokens, money, and engineering time spent figuring out why your lead score column is full of null.

Replay lets you re-process leads when the prompt is updated. Enrichment prompts evolve. You realise the score should weight headcount more heavily, or you want to add a "buying intent" field. Without Hookdeck Event Gateway, re-running enrichment on the last 30 days of contacts means writing a backfill script. You can then select the time range in the dashboard and click Replay — every contact created since the cutoff gets re-enriched against the new prompt.

You can build all of this on your own: a queue, a retry worker, a transformation step, a filter engine, an observability layer, a replay tool. That's the work Hookdeck Event Gateway collapses into a connection in a dashboard. The hours you don't spend building that infrastructure are hours you spend on the enrichment prompt, the ICP definition, and the downstream SDR experience.

Going to production

Observability that finds the silent failures. Hookdeck Event Gateway's Issues feature surfaces failure patterns automatically — repeated retries, signature verification failures, payload spikes. Set up a Slack alert when the enrichment write-back starts failing, so it doesn't go undetected for a week and SDRs lose trust in the scores.

Dedupe across sources. If the same prospect fills out two forms or hits two ad campaigns, you'll get two contact-creation events. Use Hookdeck Event Gateway's filtering and your handler's idempotency check (against hubspot_contact_id and a recent timestamp) to avoid double-enrichment.

Cost control. Cap your daily enrichment volume with a rate limit at the source so an accidentally-imported CSV of 50,000 contacts doesn't run up an OpenAI bill overnight. Adjust the cap as you scale.

Handle PII responsibly. Names, emails, job titles, and any free-text notes pass through this pipeline. Hookdeck Event Gateway supports payload redaction so sensitive fields don't appear in dashboards or logs. Configure it before you go live.

What to build next

This pattern generalizes: The same architecture handles ticket classification in Zendesk, contract analysis triggered by a DocuSign event, transcript summarisation from a Gong webhook, and dozens of other "event happens, AI reasons about it, result writes back somewhere" workflows.

If you're building any of this, the fastest way to get past the demo phase is to stop maintaining your own webhook infrastructure. Start with the Hookdeck free tier (you can run this entire workflow without paying anything until you hit real volume) and use the CLI to keep your development loop fast.


Gareth Wilson

Gareth Wilson

Product Marketing

Multi-time founding marketer, Gareth is PMM at Hookdeck and author of the newsletter, Community Inc.