Gareth Wilson

Building AI code review with GitHub and Claude

Published May 14, 2026

An engineer opens a pull request at 2:47pm: 312 lines changed across four files, a new background-job handler and the migration that supports it. Within thirty seconds of the PR being opened, Claude has read the full diff, identified that the new job handler doesn't wrap its database calls in a transaction, flagged that the migration adds an index without CONCURRENTLY on a table that takes locks during normal traffic, noticed a console.log that was clearly meant to be removed before commit, and posted three inline comments on the exact lines — plus a summary comment at the top of the PR that explains its reasoning. The author reads the feedback, fixes the transaction wrapping, adds the CONCURRENTLY, deletes the console.log, and pushes a new commit. Human reviewers, when they get to it, are reviewing a cleaner PR and can focus on the design choices that actually need their judgment.

That's the workflow engineering teams are building with GitHub and an LLM right now. The GitHub side is straightforward — pull_request webhooks fire reliably and are well-documented. The Claude side is one API call. The tricky part is the middle: filtering to the PRs you actually want reviewed, surviving spikes when an automated dependency-update bot opens 40 PRs at once, respecting Claude rate limits across multiple repos, and being able to replay every PR through a new reviewer prompt the day after you update it.

This guide walks through the glue layer end to end: the architecture, seven concrete steps to wire it up, and the production concerns that you need to get ahead of.

The flow

flowchart TB
    A[Engineer opens<br/>or updates PR] --> B[GitHub webhook<br/>pull_request event]
    B -->|POST signed JSON| C[Hookdeck<br/>inbound source]
    C -->|filter + transform<br/>+ rate limit| D[Claude<br/>code review handler]
    D -.->|fetch diff| E[GitHub API]
    D -.->|messages API| F[claude-sonnet-4]
    D -->|POST review| G[Hookdeck<br/>callback source]
    G -->|route| H1[GitHub API<br/>inline comments]
    G -->|route| H2[GitHub API<br/>summary comment]
    G -->|route| H3[Metrics<br/>review log]

There are two webhook flows that need to be reliable:

GitHub's pull_request events into the AI step — must filter aggressively (you don't want to review every Dependabot PR), respect rate limits (Claude and GitHub both have them), and queue during bursts.
The AI's review back into GitHub as comments — must reach GitHub reliably even when GitHub's own API is rate-limiting you. PRs without comments aren't reviewed PRs, and silently failing on the write-back makes the entire pipeline pointless.

Most teams build this with a GitHub App handler that calls Claude inline, then calls the GitHub API to post comments. That might be enough to demo, but at organization-scale you need something like Hookdeck in the middle.

What you'll need

A GitHub organization with admin access to install a GitHub App and configure webhooks (see the GitHub webhooks guide)
An Anthropic API key with access to a Claude model
A Hookdeck Event Gateway account — the free tier covers this workflow at low volume
Hookdeck CLI installed: npm install hookdeck-cli -g or brew install hookdeck/hookdeck/hookdeck
A handler endpoint that fetches the diff, calls Claude, and POSTs the review back

Step 1: Create the Hookdeck source for GitHub

In the Hookdeck dashboard:

Create Connection → New Source
Type: GitHub (Hookdeck has a pre-configured source that handles x-hub-signature-256 verification)
Name: github-pr-events
Provide your GitHub webhook secret

Copy the generated source URL.

Step 2: Register the GitHub webhook

If you're using a GitHub App, the webhook is configured in the App settings. If you're using a per-repo or organisation webhook:

Settings → Webhooks → Add webhook
Payload URL: paste the Hookdeck source URL
Content type: application/json
Secret: the same secret you gave Hookdeck
Events: select "Let me select individual events" → Pull requests
Save

Open a draft PR or push a commit to an existing one to fire a test event. The full payload should appear in the Hookdeck dashboard within a second. The test and replay GitHub webhooks guide covers local testing in more depth.

Step 3: Add the destination — your Claude code review handler

The handler:

Receives the canonical PR payload
Fetches the diff from the GitHub API (the webhook itself doesn't contain the full diff)
Calls Claude with a code review prompt that returns structured inline comments + a summary
POSTs the result to a second Hookdeck source for fan-out

A minimal handler:

export default {
  async fetch(request, env) {
    const event = await request.json();

    // Fetch the diff
    const diffResponse = await fetch(
      `https://api.github.com/repos/${event.repo}/pulls/${event.pr_number}`,
      {
        headers: {
          authorization: `Bearer ${env.GITHUB_TOKEN}`,
          accept: 'application/vnd.github.v3.diff',
          'user-agent': 'acme-code-review',
        },
      }
    );
    const diff = await diffResponse.text();

    const response = await fetch('https://api.anthropic.com/v1/messages', {
      method: 'POST',
      headers: {
        'x-api-key': env.ANTHROPIC_API_KEY,
        'anthropic-version': '2023-06-01',
        'content-type': 'application/json',
      },
      body: JSON.stringify({
        model: 'claude-sonnet-4-5',
        max_tokens: 4096,
        system: CODE_REVIEW_PROMPT,
        messages: [{
          role: 'user',
          content: [
            `Repository: ${event.repo}`,
            `PR title: ${event.title}`,
            `PR description: ${event.body || '(none)'}`,
            `Diff:\n${diff}`,
          ].join('\n\n'),
        }],
      }),
    });

    const result = await response.json();
    const review = JSON.parse(result.content[0].text);

    await fetch(env.HOOKDECK_CALLBACK_URL, {
      method: 'POST',
      headers: { 'content-type': 'application/json' },
      body: JSON.stringify({
        repo: event.repo,
        pr_number: event.pr_number,
        head_sha: event.head_sha,
        review,
      }),
    });

    return new Response('ok', { status: 200 });
  },
};

Configure the Hookdeck Event Gateway destination:

Type: HTTP
URL: your handler URL
Authentication: an HTTP header carrying a shared secret

Step 4: Add filter, transformation, and rate-limit rules

This is where Hookdeck Event Gateway does most of the work for code review specifically.

Transformation — flatten GitHub's verbose payload to what your handler needs:

addHandler('transform', (request, context) => {
  const body = request.body;

  request.body = {
    event_type: body.action, // opened, synchronize, ready_for_review, labeled, etc.
    repo: body.repository.full_name,
    pr_number: body.pull_request.number,
    title: body.pull_request.title,
    body_text: body.pull_request.body,
    head_sha: body.pull_request.head.sha,
    base_branch: body.pull_request.base.ref,
    author: body.pull_request.user.login,
    is_draft: body.pull_request.draft,
    additions: body.pull_request.additions,
    deletions: body.pull_request.deletions,
    changed_files: body.pull_request.changed_files,
    labels: body.pull_request.labels.map(l => l.name),
  };

  return request;
});

Filter — only review the PRs you actually want reviewed. The rules below skip drafts, very large PRs, PRs from automation bots, and PRs targeting non-default branches:

{
  "body": {
    "event_type": { "$in": ["opened", "synchronize", "ready_for_review"] },
    "is_draft": false,
    "author": { "$not": { "$regex": "(\\[bot\\]$|dependabot|renovate)" } },
    "changed_files": { "$lt": 50 },
    "additions": { "$lt": 1500 },
    "labels": { "$not": { "$elemMatch": { "$eq": "skip-ai-review" } } }
  }
}

The skip-ai-review label gives developers an opt-out. Add a force-ai-review label and a parallel rule for the opposite.

Rate limit — protect Claude and GitHub's API limits when a refactor PR cascades into 20 dependent PRs:

Rate: 5 per second
Burst: 15

Retry policy — Claude and GitHub both rate-limit; handle both with exponential backoff:

Initial delay: 30 seconds
Max attempts: 15
Max age: 12 hours
Apply on status codes: 408, 429, 500, 502, 503, 504, 529

For code review specifically, a 12-hour max age makes sense: if the review doesn't happen in 12 hours, the PR has probably moved on, and a stale review is worse than no review.

Step 5: Test the inbound leg locally with the CLI

Route the inbound connection to the CLI:

hookdeck login
hookdeck listen 3000 github-pr-events

A local inspector:

// inspect.js
const http = require('http');
http.createServer((req, res) => {
  let body = '';
  req.on('data', chunk => body += chunk);
  req.on('end', () => {
    console.log('Canonical PR event:', JSON.parse(body));
    res.writeHead(200);
    res.end('ok');
  });
}).listen(3000);

Open a real test PR on a sandbox repo. Verify the canonical event lands and that drafts, bot PRs, and oversized PRs are filtered out as expected. Press r to replay the same event when iterating on the prompt or transformation.

Step 6: Wire reviews back through Hookdeck

The review fans out:

GitHub — inline comments on specific lines, plus a top-level summary comment
Metrics — log to a database for "how often does the AI catch a real issue?" reporting

Create a second connection with the source pr-review-results and one destination per downstream.

For inline comments, a transformation builds the GitHub API call. GitHub's review API accepts a list of comments in a single reviews call, which is what you want — it's one network round-trip per review, not one per comment:

addHandler('transform', (request, context) => {
  const { repo, pr_number, head_sha, review } = request.body;

  request.url = `https://api.github.com/repos/${repo}/pulls/${pr_number}/reviews`;
  request.method = 'POST';
  request.headers = {
    ...request.headers,
    authorization: `Bearer ${context.secrets.GITHUB_TOKEN}`,
    accept: 'application/vnd.github.v3+json',
    'user-agent': 'acme-code-review',
    'content-type': 'application/json',
  };
  request.body = {
    commit_id: head_sha,
    body: review.summary,
    event: 'COMMENT', // never auto-block; let humans decide
    comments: review.inline_comments.map(c => ({
      path: c.path,
      position: c.position,
      body: c.body,
    })),
  };

  return request;
});

A second connection from the same source writes a row per review to a metrics database for later analysis ("what categories of issue does the AI catch most? Which authors push back?").

Retry policy on these callbacks:

Initial delay: 15 seconds
Max attempts: 10
Max age: 12 hours

Step 7: Run the full chain end to end

Open a PR on your sandbox repo with a deliberate issue — an obvious console.log, a missing await, a query inside a loop. You should see, in order:

GitHub fires pull_request.opened into github-pr-events
Hookdeck Event Gateway verifies, filters (assume it passes), transforms, and delivers to your handler
Your handler fetches the diff and calls Claude
The review lands in pr-review-results
GitHub receives the review with inline comments on the offending lines and a summary at the top
The metrics database logs the review

If anything fails, the Hookdeck dashboard tells you exactly where, with payload and response visibility at every hop.

Why Hookdeck and not just a GitHub App handler?

Three properties of code-review workflows make a direct integration the wrong choice once you're past the demo:

PR events come from every repo at once. A Dependabot mass-update opens 40 PRs in five minutes. A monorepo refactor produces a cascade of dependent PRs. Without queueing and rate limiting, your handler tries to make 40 concurrent Claude calls and hits the limit on the first second. Hookdeck Event Gateway queues the rest and feeds them through at a sustainable rate — every PR still gets reviewed, just smoothly over a couple of minutes instead of all at once.

Filtering keeps cost and noise under control. Most teams don't want every PR reviewed — drafts, bot PRs, doc-only PRs, oversized PRs are all candidates to skip. Hookdeck Event Gateway filters cut them out before they hit your handler, so you don't burn tokens on chore: bump @types/node. Adjusting the filter as your standards evolve is a configuration change, not a redeploy of the handler.

Replay lets you re-review when the prompt changes. Review prompts evolve constantly — new patterns to catch, new languages added, new internal standards. Event Gateway's replay lets you re-run a week of PRs through the new prompt against a preview destination — useful for tuning before flipping live. It also lets you backfill reviews on PRs that were skipped during an outage.

You can build all of this on your own: a queue, a retry worker, a filter engine, a transformation step, an observability layer, a replay tool. That's the work Hookdeck Event Gateway collapses into a connection in a dashboard. The hours you don't spend on infrastructure are hours you can spend on the review prompt and the developer experience instead.

Going to production

Observability that engineering will use. Hookdeck's Issues feature surfaces failure patterns. A spike in 403s from the GitHub API means your token is rotating or rate-limited — surface it in a #dev-infra Slack channel before developers notice the silence.

Tune the filter as you learn. The initial filter rules will be wrong. Some teams want a 50-file PR reviewed; some don't want Dependabot reviewed, others want it reviewed by a different (lighter) prompt. Iterate; each change is a dashboard update.

Replay deliberately when the prompt changes. When you add a new rule (e.g. "flag any new SQL query that doesn't use parameterised arguments"), replay the last week of PRs against a preview destination first. Check the false-positive rate before flipping live.

Handle source code carefully. PR diffs may contain secrets, credentials, or proprietary algorithms. Configure Hookdeck's payload redaction on fields that commonly contain such data, and ensure the destination handler doesn't log the diff to a long-retention sink.

Plan for GitHub's secondary rate limits. GitHub enforces both primary (5,000/hr) and secondary (varies) limits. The rate limit on the callback connection should keep you well within them — typically 2-3 reviews per second is comfortable for an organization under 200 engineers.

What to build next

This pattern generalizes: Apply it to commit-message linting (push events), release notes generation (release.published), CI failure triage (workflow_run.completed), or issue triage (issues.opened). The plumbing stays the same; the prompts and the destinations change.

If you're building any of this, the fastest way to get past the demo phase is to stop maintaining your own webhook infrastructure. Start with the Hookdeck free tier (you can run this entire workflow without paying anything until you hit real volume) and use the CLI to keep your development loop fast.