Building AI code review with GitHub and Claude
An engineer opens a pull request at 2:47pm: 312 lines changed across four files, a new background-job handler and the migration that supports it. Within thirty seconds of the PR being opened, Claude has read the full diff, identified that the new job handler doesn't wrap its database calls in a transaction, flagged that the migration adds an index without CONCURRENTLY on a table that takes locks during normal traffic, noticed a console.log that was clearly meant to be removed before commit, and posted three inline comments on the exact lines — plus a summary comment at the top of the PR that explains its reasoning. The author reads the feedback, fixes the transaction wrapping, adds the CONCURRENTLY, deletes the console.log, and pushes a new commit. Human reviewers, when they get to it, are reviewing a cleaner PR and can focus on the design choices that actually need their judgment.
That's the workflow engineering teams are building with GitHub and an LLM right now. The GitHub side is straightforward — pull_request webhooks fire reliably and are well-documented. The Claude side is one API call. The tricky part is the middle: filtering to the PRs you actually want reviewed, surviving spikes when an automated dependency-update bot opens 40 PRs at once, respecting Claude rate limits across multiple repos, and being able to replay every PR through a new reviewer prompt the day after you update it.
This guide walks through the glue layer end to end: the architecture, seven concrete steps to wire it up, and the production concerns that you need to get ahead of.
The flow
flowchart TB
A[Engineer opens<br/>or updates PR] --> B[GitHub webhook<br/>pull_request event]
B -->|POST signed JSON| C[Hookdeck<br/>inbound source]
C -->|filter + transform<br/>+ rate limit| D[Claude<br/>code review handler]
D -.->|fetch diff| E[GitHub API]
D -.->|messages API| F[claude-sonnet-4]
D -->|POST review| G[Hookdeck<br/>callback source]
G -->|route| H1[GitHub API<br/>inline comments]
G -->|route| H2[GitHub API<br/>summary comment]
G -->|route| H3[Metrics<br/>review log]
There are two webhook flows that need to be reliable:
- GitHub's
pull_requestevents into the AI step — must filter aggressively (you don't want to review every Dependabot PR), respect rate limits (Claude and GitHub both have them), and queue during bursts. - The AI's review back into GitHub as comments — must reach GitHub reliably even when GitHub's own API is rate-limiting you. PRs without comments aren't reviewed PRs, and silently failing on the write-back makes the entire pipeline pointless.
Most teams build this with a GitHub App handler that calls Claude inline, then calls the GitHub API to post comments. That might be enough to demo, but at organization-scale you need something like Hookdeck in the middle.
What you'll need
- A GitHub organization with admin access to install a GitHub App and configure webhooks (see the GitHub webhooks guide)
- An Anthropic API key with access to a Claude model
- A Hookdeck Event Gateway account — the free tier covers this workflow at low volume
- Hookdeck CLI installed:
npm install hookdeck-cli -gorbrew install hookdeck/hookdeck/hookdeck - A handler endpoint that fetches the diff, calls Claude, and POSTs the review back
Step 1: Create the Hookdeck source for GitHub
In the Hookdeck dashboard:
- Create Connection → New Source
- Type: GitHub (Hookdeck has a pre-configured source that handles
x-hub-signature-256verification) - Name:
github-pr-events - Provide your GitHub webhook secret
Copy the generated source URL.
Step 2: Register the GitHub webhook
If you're using a GitHub App, the webhook is configured in the App settings. If you're using a per-repo or organisation webhook:
- Settings → Webhooks → Add webhook
- Payload URL: paste the Hookdeck source URL
- Content type:
application/json - Secret: the same secret you gave Hookdeck
- Events: select "Let me select individual events" → Pull requests
- Save
Open a draft PR or push a commit to an existing one to fire a test event. The full payload should appear in the Hookdeck dashboard within a second. The test and replay GitHub webhooks guide covers local testing in more depth.
Step 3: Add the destination — your Claude code review handler
The handler:
- Receives the canonical PR payload
- Fetches the diff from the GitHub API (the webhook itself doesn't contain the full diff)
- Calls Claude with a code review prompt that returns structured inline comments + a summary
- POSTs the result to a second Hookdeck source for fan-out
A minimal handler:
export default {
async fetch(request, env) {
const event = await request.json();
// Fetch the diff
const diffResponse = await fetch(
`https://api.github.com/repos/${event.repo}/pulls/${event.pr_number}`,
{
headers: {
authorization: `Bearer ${env.GITHUB_TOKEN}`,
accept: 'application/vnd.github.v3.diff',
'user-agent': 'acme-code-review',
},
}
);
const diff = await diffResponse.text();
const response = await fetch('https://api.anthropic.com/v1/messages', {
method: 'POST',
headers: {
'x-api-key': env.ANTHROPIC_API_KEY,
'anthropic-version': '2023-06-01',
'content-type': 'application/json',
},
body: JSON.stringify({
model: 'claude-sonnet-4-5',
max_tokens: 4096,
system: CODE_REVIEW_PROMPT,
messages: [{
role: 'user',
content: [
`Repository: ${event.repo}`,
`PR title: ${event.title}`,
`PR description: ${event.body || '(none)'}`,
`Diff:\n${diff}`,
].join('\n\n'),
}],
}),
});
const result = await response.json();
const review = JSON.parse(result.content[0].text);
await fetch(env.HOOKDECK_CALLBACK_URL, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
repo: event.repo,
pr_number: event.pr_number,
head_sha: event.head_sha,
review,
}),
});
return new Response('ok', { status: 200 });
},
};
Configure the Hookdeck Event Gateway destination:
- Type: HTTP
- URL: your handler URL
- Authentication: an HTTP header carrying a shared secret
Step 4: Add filter, transformation, and rate-limit rules
This is where Hookdeck Event Gateway does most of the work for code review specifically.
Transformation — flatten GitHub's verbose payload to what your handler needs:
addHandler('transform', (request, context) => {
const body = request.body;
request.body = {
event_type: body.action, // opened, synchronize, ready_for_review, labeled, etc.
repo: body.repository.full_name,
pr_number: body.pull_request.number,
title: body.pull_request.title,
body_text: body.pull_request.body,
head_sha: body.pull_request.head.sha,
base_branch: body.pull_request.base.ref,
author: body.pull_request.user.login,
is_draft: body.pull_request.draft,
additions: body.pull_request.additions,
deletions: body.pull_request.deletions,
changed_files: body.pull_request.changed_files,
labels: body.pull_request.labels.map(l => l.name),
};
return request;
});
Filter — only review the PRs you actually want reviewed. The rules below skip drafts, very large PRs, PRs from automation bots, and PRs targeting non-default branches:
{
"body": {
"event_type": { "$in": ["opened", "synchronize", "ready_for_review"] },
"is_draft": false,
"author": { "$not": { "$regex": "(\\[bot\\]$|dependabot|renovate)" } },
"changed_files": { "$lt": 50 },
"additions": { "$lt": 1500 },
"labels": { "$not": { "$elemMatch": { "$eq": "skip-ai-review" } } }
}
}
The skip-ai-review label gives developers an opt-out. Add a force-ai-review label and a parallel rule for the opposite.
Rate limit — protect Claude and GitHub's API limits when a refactor PR cascades into 20 dependent PRs:
- Rate:
5per second - Burst:
15
Retry policy — Claude and GitHub both rate-limit; handle both with exponential backoff:
- Initial delay:
30 seconds - Max attempts:
15 - Max age:
12 hours - Apply on status codes:
408, 429, 500, 502, 503, 504, 529
For code review specifically, a 12-hour max age makes sense: if the review doesn't happen in 12 hours, the PR has probably moved on, and a stale review is worse than no review.
Step 5: Test the inbound leg locally with the CLI
Route the inbound connection to the CLI:
hookdeck login
hookdeck listen 3000 github-pr-events
A local inspector:
// inspect.js
const http = require('http');
http.createServer((req, res) => {
let body = '';
req.on('data', chunk => body += chunk);
req.on('end', () => {
console.log('Canonical PR event:', JSON.parse(body));
res.writeHead(200);
res.end('ok');
});
}).listen(3000);
Open a real test PR on a sandbox repo. Verify the canonical event lands and that drafts, bot PRs, and oversized PRs are filtered out as expected. Press r to replay the same event when iterating on the prompt or transformation.
Step 6: Wire reviews back through Hookdeck
The review fans out:
- GitHub — inline comments on specific lines, plus a top-level summary comment
- Metrics — log to a database for "how often does the AI catch a real issue?" reporting
Create a second connection with the source pr-review-results and one destination per downstream.
For inline comments, a transformation builds the GitHub API call. GitHub's review API accepts a list of comments in a single reviews call, which is what you want — it's one network round-trip per review, not one per comment:
addHandler('transform', (request, context) => {
const { repo, pr_number, head_sha, review } = request.body;
request.url = `https://api.github.com/repos/${repo}/pulls/${pr_number}/reviews`;
request.method = 'POST';
request.headers = {
...request.headers,
authorization: `Bearer ${context.secrets.GITHUB_TOKEN}`,
accept: 'application/vnd.github.v3+json',
'user-agent': 'acme-code-review',
'content-type': 'application/json',
};
request.body = {
commit_id: head_sha,
body: review.summary,
event: 'COMMENT', // never auto-block; let humans decide
comments: review.inline_comments.map(c => ({
path: c.path,
position: c.position,
body: c.body,
})),
};
return request;
});
A second connection from the same source writes a row per review to a metrics database for later analysis ("what categories of issue does the AI catch most? Which authors push back?").
Retry policy on these callbacks:
- Initial delay:
15 seconds - Max attempts:
10 - Max age:
12 hours
Step 7: Run the full chain end to end
Open a PR on your sandbox repo with a deliberate issue — an obvious console.log, a missing await, a query inside a loop. You should see, in order:
- GitHub fires
pull_request.openedintogithub-pr-events - Hookdeck Event Gateway verifies, filters (assume it passes), transforms, and delivers to your handler
- Your handler fetches the diff and calls Claude
- The review lands in
pr-review-results - GitHub receives the review with inline comments on the offending lines and a summary at the top
- The metrics database logs the review
If anything fails, the Hookdeck dashboard tells you exactly where, with payload and response visibility at every hop.
Why Hookdeck and not just a GitHub App handler?
Three properties of code-review workflows make a direct integration the wrong choice once you're past the demo:
PR events come from every repo at once. A Dependabot mass-update opens 40 PRs in five minutes. A monorepo refactor produces a cascade of dependent PRs. Without queueing and rate limiting, your handler tries to make 40 concurrent Claude calls and hits the limit on the first second. Hookdeck Event Gateway queues the rest and feeds them through at a sustainable rate — every PR still gets reviewed, just smoothly over a couple of minutes instead of all at once.
Filtering keeps cost and noise under control. Most teams don't want every PR reviewed — drafts, bot PRs, doc-only PRs, oversized PRs are all candidates to skip. Hookdeck Event Gateway filters cut them out before they hit your handler, so you don't burn tokens on chore: bump @types/node. Adjusting the filter as your standards evolve is a configuration change, not a redeploy of the handler.
Replay lets you re-review when the prompt changes. Review prompts evolve constantly — new patterns to catch, new languages added, new internal standards. Event Gateway's replay lets you re-run a week of PRs through the new prompt against a preview destination — useful for tuning before flipping live. It also lets you backfill reviews on PRs that were skipped during an outage.
You can build all of this on your own: a queue, a retry worker, a filter engine, a transformation step, an observability layer, a replay tool. That's the work Hookdeck Event Gateway collapses into a connection in a dashboard. The hours you don't spend on infrastructure are hours you can spend on the review prompt and the developer experience instead.
Going to production
Observability that engineering will use. Hookdeck's Issues feature surfaces failure patterns. A spike in 403s from the GitHub API means your token is rotating or rate-limited — surface it in a #dev-infra Slack channel before developers notice the silence.
Tune the filter as you learn. The initial filter rules will be wrong. Some teams want a 50-file PR reviewed; some don't want Dependabot reviewed, others want it reviewed by a different (lighter) prompt. Iterate; each change is a dashboard update.
Replay deliberately when the prompt changes. When you add a new rule (e.g. "flag any new SQL query that doesn't use parameterised arguments"), replay the last week of PRs against a preview destination first. Check the false-positive rate before flipping live.
Handle source code carefully. PR diffs may contain secrets, credentials, or proprietary algorithms. Configure Hookdeck's payload redaction on fields that commonly contain such data, and ensure the destination handler doesn't log the diff to a long-retention sink.
Plan for GitHub's secondary rate limits. GitHub enforces both primary (5,000/hr) and secondary (varies) limits. The rate limit on the callback connection should keep you well within them — typically 2-3 reviews per second is comfortable for an organization under 200 engineers.
What to build next
This pattern generalizes: Apply it to commit-message linting (push events), release notes generation (release.published), CI failure triage (workflow_run.completed), or issue triage (issues.opened). The plumbing stays the same; the prompts and the destinations change.
If you're building any of this, the fastest way to get past the demo phase is to stop maintaining your own webhook infrastructure. Start with the Hookdeck free tier (you can run this entire workflow without paying anything until you hit real volume) and use the CLI to keep your development loop fast.