There are five of us. One architect, two seniors, one mid, one junior. Two-week sprints. Enterprise B2B SaaS org running Sales Cloud, Service Cloud, and a stack of custom Apex grown over years. We ship features, fix production incidents, and live with whatever AI tooling we plug in.
This is the opening post of a 5-part series on how we wired an AI dev stack into our day-to-day. Before we get to specific tools (those come in the next four posts), we want to be honest about what the question even is.
Our thesis after a couple of release cycles: AI dev tools fix four very specific bottlenecks really well, and they tempt teams to misuse them on three things they are quietly bad at. Get those seven calls right and the rest of the rollout is mostly logistics.
The four wins worth chasing
These bottlenecks existed before any AI tooling. AI did not invent them; it just turned out to be the cheapest way to dent them.
1. PR review latency
Symptom: a senior opens a PR for a non-trivial Apex class on Tuesday afternoon. The architect is in a design review, the other senior is on a customer escalation, the mid flags it back as “needs senior eyes.” By Friday the PR is still sitting there with a merge conflict forming. We have all watched a sprint bleed two days because three good developers could not be in the same headspace at the same hour.
What AI does here: it picks up the obvious stuff within ten minutes of the PR opening. Missing null checks, an unbulkified SOQL pattern that snuck in, a test that asserts on true == true, a method named bulk-safe that is not. By the time a human gets to the PR the surface noise is addressed, and the senior spends their 20 minutes on what needs a brain: is this the right abstraction, does it handle the multi-currency edge case, did we just leak a sharing rule? Latency drops because the human review got shorter, not because we hired more humans.
2. Junior dev ramp-up
Symptom: our junior is good. He is also six months into Salesforce-specific work, so he still pauses to look up things the rest of us internalised years ago — the right way to handle Database.SaveResult, why with sharing matters here but not there, how to avoid the trigger-recursion footgun. Each pause used to be a Slack DM to a senior, which broke the senior’s flow too. One junior question often cost two people fifteen minutes.
What AI does here: it answers most of those questions in-line, with examples aware of the file the junior is looking at. The hard questions, the tradeoff ones, still come to us, but the basic pattern questions resolve without anyone else’s day breaking. Six weeks in, his PRs come in cleaner and he is asking better questions. Not because AI taught him the platform; because it removed the friction from the loop where he was teaching himself.
3. Repetitive Apex/LWC scaffolding
Symptom: every new LWC is roughly 70% boilerplate. Folder structure, .js-meta.xml, @api declarations, wire service plumbing, the standard error-handling shape we use across the codebase. Same with new Apex services — class header, constructor injection pattern, @TestVisible mock wiring. None of it is hard, but a senior dev typing it out costs the same as a senior dev solving the actual problem the file is supposed to solve.
What AI does here: it produces the boilerplate at the speed of “tab to accept.” We write the meaningful part: the business logic, the specific query, the actual UI. AI scaffolds, humans think. We feel less typing fatigue at the end of the day, and the boilerplate is more consistent because it is being generated from the same patterns already in the codebase.
4. Test-class drudgery
Symptom: writing data setup for a test that needs an Account, a Contact, an Opportunity, three OpportunityLineItems, a custom Quote with two Quote Lines, and the right pricebook entries is half an hour of plumbing before the assertion you actually care about. Multiply by the 75% coverage rule and test plumbing becomes a real chunk of every sprint.
What AI does here: it generates the setup scaffolding by reading the sObject schema and the existing test factory. We write the assertions — the part encoding what the test verifies. The interesting work (what should this code do, what edge case are we protecting) stays human. The boring work (build twelve interlocking records so the trigger fires) becomes a function call.
The three traps we refuse
The wins above are real. They are also the only places we let AI lead. Three categories never get owned by the model — at most AI is a supervised typing assistant.
1. Business-logic decisions
When a sales ops manager says “deals over $50k from existing customers should skip the second approval step but only if the AE has hit quota this quarter,” that is not a technical question with a right answer. It is a policy decision with tradeoffs that affect commission, forecasting, and which deals slip. AI will generate a flow that implements something plausible-looking — and miss that the real question, what counts as “existing customer,” has three different answers depending on which department you ask. The team owns those decisions, in conversation with the business. AI does not get to vote.
2. Schema design
Adding a field is almost free. Adding a new sObject is a decision you live with for years. Every report, every integration, every test class downstream pays the cost. AI optimises for the schema that looks cleanest in the file open right now. It does not weigh how this object integrates with the data warehouse, or the fact that the object you are about to duplicate is already half-supporting this use case. Schema decisions get the architect, a whiteboard, and at least one person from data ops. They do not get an autocomplete suggestion.
3. Security review of new code paths
Sharing-rule changes, FLS-aware queries, CRUD checks, anything touching a field marked PII, any new class that runs without sharing. We do not let AI sign off on any of it. The reason is not that AI is bad at security; it is that the cost of a missed security gap is unbounded, and we cannot meaningfully verify the model’s confidence per decision. Human security review is slower, but it is auditable and we know what the reviewer was looking for. If we are going to be wrong, we want to be wrong on the record.
On every win above, AI leads the work and a human verifies. On every trap, humans lead the decision and AI is a typing assistant at most. The minute that ordering flips, with AI deciding business policy, signing off on a sharing change, or proposing a new sObject, we are no longer using a tool. We are outsourcing judgement. That is the line we will not cross.
Why we didn’t just turn on Copilot
The naive version of this rollout is “give everyone a license, see what happens.” We tried a soft version of that for two weeks before getting serious, and the failure mode was the same one every time: overlapping tools without orchestration rules.
The IDE autocomplete suggested one shape for a method, the chat tool suggested a different shape, and the in-platform AI nudged toward a third. All fluent enough to look credible. With no shared rules for which tool owns which task, every developer chose differently, and code review turned into “why did you use that tool here” instead of “is this the right design.” The architect ended up doing more, not less.
The fix was not to abandon any of the tools. It was to write down, explicitly, which tool owns which task, in what order, with what verification. That is what the next four posts work out. The starting point was our existing mental model of how AI fits the org, especially the Agentforce platform primer we wrote earlier — without that shared vocabulary of agents, topics, and guardrails, none of the rest of the stack made sense to discuss.
What’s coming in the next four posts
Four posts, one tool layer per post, each one covering what we tried, what worked, what we threw out:
- GitHub Copilot for Salesforce devs — the IDE autocomplete and chat layer. Where it earned its keep, the three rules we wrote for when to accept a suggestion, and the Apex-specific failure modes we hit until we tuned the prompts.
- Layering Agentforce Vibes on top of Copilot — what changes when the platform itself is AI-native. How we use Vibes for the Salesforce-specific parts (flows, prompt templates, agent topics) without it stepping on the IDE tooling.
- From autocomplete to agentic — the Salesforce DX MCP server — moving from “AI suggests text” to “AI runs commands against the org.” What we let it do, what we did not, and how we kept the junior’s machine from accidentally deploying to production.
- AI PR review on GitHub + governance + results — AI as first-pass reviewer on every PR, the governance rules we wrote, and the honest measurement of what changed.
By Post 5 we’ll measure what this rollout actually changed.
How did this article make you feel?
Comments
Salesforce Tip