ESC

AI-powered search across all blog posts and tools

AI · April 22, 2026

AI PR review on GitHub: what we automate, what we never let it touch, and what we got back

Closing the case study. The PR review rules we wrote, the four lines AI never crosses, and the bottlenecks from Post 1 — revisited.

☕ 9 min read 📅 April 22, 2026
  • Copilot code review handles obvious quality issues so humans can focus on judgment
  • AI never blocks a merge — only humans hold that authority
  • Security, schema, sharing rules, and money-touching code stay humans-only
  • PR cycle time drops because the obvious nits are handled before a human looks
  • Revisit your governance every quarter — the line moves as the tools improve

The Salesforce DX MCP server post wrapped on a tension we had not resolved. Agentic workflows generate more code, faster, across more files at once — which makes the bottleneck on the way out, the code review queue, worse not better the moment the rest of the stack starts working. A senior staring down a 14-file PR is not a productivity story.

This post closes the case study by automating the review itself, with hard rules for what the automation is allowed to decide. GitHub Copilot code review went GA in 2026. We turned it on for the repo, drew four lines it is not allowed to cross, and revisited the four bottlenecks the series opened with.


Setup

Copilot code review is the GA feature that lets Copilot act as a reviewer on a pull request, leaving inline comments and a top-level summary the way a human reviewer does (concept docs). We use the auto-review variant — every PR opened against the default branch gets a Copilot review automatically, no per-PR opt-in.

Enabling Copilot code review on the repo

Exact UI path, taken from the auto-review configuration docs:

  1. Repo Settings → Rules → Rulesets → New branch ruleset.
  2. Enforcement Status: Active.
  3. Add target branches. For us, the default branch (main) and the long-lived release branches.
  4. Under Branch rules, enable Automatically request Copilot code review.
  5. Optional, both of which we turned on: Review new pushes (Copilot re-reviews when the author pushes new commits) and Review draft pull requests (feedback arrives during the messy phase, not after).

The ruleset auto-enables Copilot on every PR against those branches, regardless of whether the PR author has a Copilot license. The per-PR alternative (“Request a review” → Copilot) requires individuals to remember, which means it would not happen consistently.

A forward note worth pinning: from 2026-06-01, Copilot code review runs consume GitHub Actions minutes against your billing. If you are reading this after that date, factor it into the rollout calculus.

Branch protection rules we kept

Copilot code review does not replace the protections we already had. On every protected branch we still require: at least one human reviewer with write access, the full CI suite green (Apex tests, LWC Jest, lint, metadata diff), and force-pushes blocked. Copilot’s review is additional signal, never a substitute for those gates.

Path-based reviewer assignment

A CODEOWNERS file at the repo root routes specific paths to specific humans. Anything under force-app/main/default/objects/ routes to the architect. Anything matching *Sharing* or under permissionsets/ routes to the security-minded senior. Anything under classes/billing/ routes to two reviewers. Copilot still reviews these PRs; the merge waits on the named human.


The review rules we wrote

We pinned these to the team wiki the day we turned auto-review on. They are short on purpose.

  1. AI flags style and obvious quality issues. Always. Missing null checks, unbulkified SOQL, dead branches, tests that assert nothing, with sharing omitted on a class that handles user-supplied input. If Copilot can catch it in the first pass, we want it caught there.
  2. AI suggests test additions. Sometimes. When Copilot proposes a missing test case, we read it. We accept maybe a third; the rest are either already covered or not worth the maintenance weight.
  3. AI never blocks a merge. Only humans block.
  4. AI comments are advisory; the author may dismiss with a one-line reason. A dismissal is signal. Three dismissals in a row on the same kind of comment means our prompt context needs tuning — not that the author is being stubborn.

Rule 3 is the one that matters most, and the one most teams misunderstand. Per the Copilot code review concept docs, Copilot leaves a review of type Comment — not Approve, not Request changes. That is a structural fact about the feature, not a policy choice: a Comment-type review cannot count toward required approvals and cannot block a merge. Even if we wanted Copilot to gate the merge, the feature does not let it. We like that constraint, and we wrote Rule 3 to make the alignment explicit rather than rely on people remembering the implementation detail.

The custom-instructions file we keep at .github/copilot-instructions.md is short. The fragment that matters for review tone:

# Repo-wide Copilot review instructions

## What to comment on
- Bulkification: flag SOQL/DML inside loops; flag List<sObject>
  mutations without batched DML.
- Sharing: flag `without sharing` on any class that takes user
  input; flag missing FLS / CRUD checks on data writes.
- Tests: flag tests that assert only on `true == true` or that
  do not assert at all.
- Naming: flag method names that promise behaviour the body does
  not deliver (e.g. `bulkSafeSave` that DMLs inside a loop).

## What to skip
- Style nits already enforced by our linter.
- Redundant null checks on framework objects (`System.UserInfo`).
- Comments asking the author to "consider" something without a
  concrete suggestion.

## Tone
- One-line comments. No paragraphs. If a fix needs a paragraph,
  the author needs a human.

That last line is doing real work. We added it after a week of paragraph-long Copilot comments that no one read.


The four lines AI never crosses

The rules above are the floor. The four below are the ceiling: categories where AI’s review comments are still allowed to appear, but where the decision sits with a named human, routed through CODEOWNERS, with no ambiguity about who signs off.

1. Security paths

Anything touching with sharing / without sharing, WITH SECURITY_ENFORCED in SOQL, FLS / CRUD checks, or OAuth flows. Why: the cost of a missed security gap is unbounded. We can audit a human’s reasoning for a sharing-rule decision when an auditor asks three months later. We cannot meaningfully audit “the model thought it looked fine.” CODEOWNERS routes those paths to the security-minded senior; Copilot’s comments still appear, but the merge waits on the named human.

2. Schema changes

sObject creates, field type changes, relationship changes — anything under force-app/main/default/objects/. Why: a field type change is a decision the team lives with for years; every report, integration, and test class downstream pays the cost of getting it wrong. Copilot reads the diff in front of it; it cannot read the data warehouse pipeline that depends on the field. CODEOWNERS routes the path to the architect.

3. Sharing rule changes

Manual sharing, criteria-based sharing, role hierarchy edits, group membership changes. Why: a one-line change can hand visibility to thousands of users if the criteria expand a touch beyond what was intended. We have not yet seen an AI tool that reasons about cumulative sharing impact across role hierarchy, public groups, and territory management at the same time. CODEOWNERS routes sharing paths to two humans, not one.

4. Money-touching code

Opportunity stage transitions, quote calculation, invoice generation — anything producing a number a customer or finance team will see. Why: a wrong stage transition rule does not break loudly. It quietly mis-forecasts a quarter, and finance finds out at close. Copilot will flag unbulkified SOQL in the same file; it will not notice that the rounding direction on a discount calculation just flipped from banker’s rounding to half-up. CODEOWNERS routes classes/billing/ and classes/forecasting/ to two reviewers from a named pool.

⚠️ The line we hold

AI’s job is to clear the noise. Humans hold the line on consequence. The minute the model is signing off on a sharing change, a schema migration, or a stage-transition rule, we are not using a tool. We are outsourcing judgement onto a system that cannot be cross-examined when something breaks. CODEOWNERS is the structural enforcement; the four categories above are the policy.


What changed: the four bottlenecks revisited

The series opened with four bottlenecks AI dev tools were supposed to dent. We are deliberately not putting numbers on the dent — we did not run a controlled experiment, we ran a release cycle. Here is what we observed, in the same order Post 1 named them.

PR review latency. The shape of a review changed before the timing did. Copilot comments land within minutes of the PR opening: missing null checks, unbulkified SOQL, a test asserting on the wrong field. By the time a human opens the PR, the surface noise is addressed or dismissed, and the senior’s twenty minutes go to things that need a brain. PRs that used to sit for two or three days now usually clear within a day. Architectural diffs still take longer; the difference is that the trivial PRs no longer wait behind them.

Junior dev ramp-up. The junior is not learning the platform faster because Copilot taught him. He is learning faster because Copilot’s review of his PR catches the small-pattern mistakes before a senior has to. He sees the comment, fixes it, ships. By week two or three he is independently shipping non-trivial features. The senior teaching loop still happens for the genuinely hard questions, just no longer interrupted by “how do I structure a Database.SaveResult loop again.”

Repetitive scaffolding. Mostly addressed by Copilot autocomplete and Vibes in earlier posts (our Agentforce platform primer covers where Vibes sits in the wider platform); what review-time AI added was a consistency check that flags scaffold output drifting from team patterns. The reclaimed time is not vanishing into the next ticket — it is going into design conversations and into tests we used to skip because they felt like overhead.

Test-class drudgery. Data setup is genuinely a solved problem now, between Vibes generating the factory and Copilot review flagging when the factory drifts from current schema. The assertion logic, which encodes what the test is actually verifying, still gets written by humans. We notice this less as time saved and more as a quality shift: tests are being written that we used to skip because the setup was too painful.

The honest version: every one of these is a qualitative shift, not a measurement. Anyone who tells you they cut PR cycle time by exactly 53% with Copilot is either running a different experiment than we did, or making the number up.


What we’d do differently, and what we’d refuse next

If we ran this rollout again, we would turn on Copilot code review first, not last. The series order — autocomplete, then platform-aware assistant, then agentic workflows, then review — is the order we did it in, but the leverage is the wrong way around. Review automation acts on every PR regardless of who or what wrote the code. Adding it earlier would have taken some of the rougher Copilot output from Post 2 off the senior’s desk faster.

The thing we would refuse, even though vendors are pushing hard: AI-driven merge automation. There is a class of tooling already pitching “Copilot approves the PR if the diff is small enough and CI is green.” We will not turn it on. The Comment-type-review boundary in Rule 3 exists because the platform built a structural guardrail; the moment we install something that bridges over it, we are back to outsourcing judgement, and the audit story falls apart the first time something goes wrong in production.

That closes the case study. Five posts, one stack, one team. The thread running through all of it, from the four bottlenecks AI actually fixes through to here, is that an AI dev tool is only as useful as the boundary you draw around it, and only as safe as the human review at the other end. We turned a lot of things on. The thing we did not turn off, on any of the five layers, was the senior reading the diff before it ships.

How did this article make you feel?

Comments

Salesforce Tip

🎉

You finished this article!

What to read next

Contents