ESC

AI-powered search across all blog posts and tools

AI · April 15, 2026

From autocomplete to agentic: the Salesforce DX MCP server

Where Copilot suggests a line, MCP builds a feature. Setting up the Salesforce DX MCP server (beta) and a worked example of scaffolding a custom object end-to-end.

☕ 11 min read 📅 April 15, 2026
  • MCP shifts AI from line-by-line suggestion to multi-file feature workflows
  • The Salesforce DX MCP server (beta) connects Claude Code, Cursor, and other MCP-aware clients to your org
  • A single natural-language prompt can scaffold an sObject, perm set, LWC, controller, and tests
  • Watch out for org-permission scope, accidental destructive deploys, and runaway costs
  • Pair MCP workflows with mandatory PR review (covered in Post 5)

Layering Agentforce Vibes on top of Copilot closed on a real ceiling. Two assistants, one editor, one rule per file, and we still spent the better part of a sprint hand-stitching the seven files a single new feature touched. Both assistants suggested code well. Neither of them touched the org. Neither of them ran sf commands, neither of them retrieved metadata, and neither of them moved further than the boundary of the file we had open.

The Salesforce DX MCP server (beta) is what we reached for next. It is a different shape of tool. Where Copilot and Vibes finish a line, MCP finishes a feature: generate the sObject, write the permission set, scaffold the LWC, generate the controller, write the tests, deploy the bundle. One natural-language prompt, multiple files, a pause for review, then the next step. That is what “agentic” actually means in this corner of the stack: cross-file, cross-metadata workflows triggered by a single intent.

This post is the rollout story. What we let MCP do, what we did not, and the worked example we use when we explain it to a new hire.


What MCP actually is, in one paragraph

Model Context Protocol is the open standard that gives a large language model a stable way to call tools, read data, and trigger actions in another system. Think of it as the USB-C port for AI assistants: one plug, many devices. An MCP server exposes a catalogue of named tools (each with a schema for inputs and outputs); an MCP client (your editor, your CLI, your chat app) discovers those tools and can invoke them on the model’s behalf. The model never touches your system directly; it asks the client, the client calls the server, the server returns a structured result, and the model reads it. That layer of indirection is what makes MCP composable across vendors, and what makes it possible to swap in a Salesforce-aware server without rewriting the editor.

There are two MCP surfaces that get confused all the time, so it is worth drawing the line up front. The Salesforce DX MCP server is a small Node process that runs on your laptop, talks to the orgs you have already authorized with sf, and exposes about 60 developer-focused tools (deploy, retrieve, query, test, scaffold). That is the focus of this post, and it is in beta. The separate Salesforce Hosted MCP Servers (GA in April 2026) run on Salesforce infrastructure, are reached over OAuth, and are aimed at business teams and integrations rather than developer workflows. We mention them so the next time someone says “Salesforce MCP,” you can ask which one. The rest of this post is the local DX server.


Setting up the Salesforce DX MCP server

The Salesforce DX MCP server is beta. Salesforce documents it as a pilot or beta service subject to Salesforce’s beta terms, and we treat it accordingly: useful enough to be in our daily flow, not yet trusted to be a hard dependency for anything customer-facing, and pinned to specific scratch and sandbox aliases on every developer’s machine. Treat the setup that follows as a working baseline, not a load-bearing piece of infrastructure.

The server itself runs as an npx command. There is no global install; your MCP client launches it on demand:

npx -y @salesforce/mcp --orgs DevHubAlias,ScratchOrgAlias --toolsets orgs,metadata,data,users,testing

Two flags do the work. --orgs is required: it pins the server to one or more aliases you have already authorized with sf org login web. We always pass explicit aliases rather than the catch-all ALLOW_ALL_ORGS, because once the model has access to “everything,” it will eventually pick the wrong one. --toolsets is optional but worth using on day one: the server bundles tools into named groups (orgs, metadata, data, users, testing, code-analysis, lwc-experts, and more), and loading only the ones you need keeps both the prompt context and the blast radius small. The full toolset list is in the Salesforce DX Developer Guide; the GitHub source of truth lives at salesforcecli/mcp.

You wire the server into a client through that client’s MCP config. Three of the clients listed in the official README are common in our team:

With Claude Code

We use Claude Code, the Anthropic command-line agent (not Claude Desktop, which is a different product). A .mcp.json at the root of the Salesforce project keeps the configuration version-controlled per project:

{
  "mcpServers": {
    "salesforce-dx": {
      "command": "npx",
      "args": [
        "-y", "@salesforce/mcp",
        "--orgs", "ScratchOrgAlias",
        "--toolsets", "orgs,metadata,data,users,testing"
      ]
    }
  }
}

With Cursor

Cursor reads the same shape from ~/.cursor/mcp.json (or a project-scoped .cursor/mcp.json). Same npx command, same flags, same alias pinning rule. Restart Cursor after the file lands and the tools appear in the MCP panel.

With one of the other MCP-aware clients

The README also documents Cline, Trae, Windsurf, and Zed. The setup shape is the same: point the client at npx -y @salesforce/mcp with your --orgs and --toolsets. The server speaks standard MCP, so any MCP-aware client, including GitHub Copilot Chat, can host it; the README just does not enumerate Copilot Chat as a first-class example, so expect to do a little wiring of your own there.

ℹ️ No new auth: the server uses your CLI session

Authenticate first with sf org login web --alias ScratchOrgAlias (and --set-default-dev-hub if it is your Dev Hub). The MCP server does not implement its own OAuth flow; it uses whatever sessions the sf CLI has already cached for the aliases you pass to --orgs. If sf org display --target-org ScratchOrgAlias works in a terminal, the MCP server will work too. If it does not, fix the CLI auth first; the server has no other way in.

That is the whole setup. Two minutes of editor config, one sf login per org, one .mcp.json per project. The interesting work starts the moment the server is reachable.


Worked example: scaffold a feature end-to-end

Here is the example we walk new hires through. It is small enough to read in one sitting and large enough to show why MCP changes the shape of the work.

The brief: Sales Ops want a lightweight customer-feedback capture point in the service console. One custom object, three fields, a permission set, an LWC that captures the feedback, an Apex controller, and the test class. Without MCP, this is half a sprint of file shuffling. With MCP, it is one prompt, three review pauses, and a deploy.

The prompt we used, verbatim:

Scaffold a Customer_Feedback__c custom object with three fields: Comment__c (Long Text Area, 500), Sentiment__c (Picklist with values Positive, Neutral, Negative), and Source__c (Text, 80). Generate a permission set Customer_Feedback_User granting Read/Create/Edit on the object and all three fields. Build an LWC customerFeedbackCapture with a textarea, a sentiment picker, and a submit button, wired to an Apex controller CustomerFeedbackController with an @AuraEnabled saveFeedback(comment, sentiment, source) method. Generate a CustomerFeedbackControllerTest test class with a positive case and a missing-required-field negative case. Place all metadata under force-app/main/default/. Do not deploy yet; show me the diff first.

The model wrote each layer’s metadata XML and source files directly: the Customer_Feedback__c object and field definitions, the Customer_Feedback_User permission set, the Apex controller, and the test class. For the LWC layer it called orchestrate_lwc_component_creation (which fans out to create_lwc_component_from_prd and create_lwc_jest_tests under the hood). After each layer it called deploy_metadata in dry-run mode to surface the manifest diff, paused, and asked us to confirm before promoting the deploy from dry-run to actual. Below are the three diff blocks we read before approving the deploy.

Layer 1: sObject metadata + permission set
<!-- force-app/main/default/objects/Customer_Feedback__c/Customer_Feedback__c.object-meta.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<CustomObject xmlns="http://soap.sforce.com/2006/04/metadata">
    <label>Customer Feedback</label>
    <pluralLabel>Customer Feedback</pluralLabel>
    <nameField>
        <label>Feedback Number</label>
        <type>AutoNumber</type>
        <displayFormat>FB-{00000}</displayFormat>
    </nameField>
    <deploymentStatus>Deployed</deploymentStatus>
    <sharingModel>Private</sharingModel>
</CustomObject>
<!-- force-app/main/default/objects/Customer_Feedback__c/fields/Comment__c.field-meta.xml -->
<CustomField xmlns="http://soap.sforce.com/2006/04/metadata">
    <fullName>Comment__c</fullName>
    <label>Comment</label>
    <type>LongTextArea</type>
    <length>500</length>
    <visibleLines>5</visibleLines>
</CustomField>

<!-- Sentiment__c — Picklist (Positive / Neutral / Negative) -->
<!-- Source__c — Text(80) -->
<!-- force-app/main/default/permissionsets/Customer_Feedback_User.permissionset-meta.xml -->
<PermissionSet xmlns="http://soap.sforce.com/2006/04/metadata">
    <label>Customer Feedback User</label>
    <objectPermissions>
        <object>Customer_Feedback__c</object>
        <allowRead>true</allowRead>
        <allowCreate>true</allowCreate>
        <allowEdit>true</allowEdit>
    </objectPermissions>
    <fieldPermissions>
        <field>Customer_Feedback__c.Comment__c</field>
        <readable>true</readable>
        <editable>true</editable>
    </fieldPermissions>
    <!-- Sentiment__c, Source__c — same shape -->
</PermissionSet>

What it got right: the AutoNumber Name field with a sensible display format (we did not ask for it, the platform conventions implied it), sharingModel Private (correct default for a feedback object), explicit field permissions per field rather than just object-level. What we redirected: the model originally added allowDelete on the permission set; we asked it to remove that and it complied without re-deriving the rest.

Layer 2: LWC + Apex controller
<!-- force-app/main/default/lwc/customerFeedbackCapture/customerFeedbackCapture.html -->
<template>
    <lightning-card title="Capture Customer Feedback" icon-name="standard:feedback">
        <div class="slds-p-horizontal_medium slds-p-bottom_medium">
            <lightning-textarea
                label="Comment"
                value={comment}
                max-length="500"
                onchange={handleCommentChange}
                required>
            </lightning-textarea>

            <lightning-combobox
                label="Sentiment"
                value={sentiment}
                options={sentimentOptions}
                onchange={handleSentimentChange}
                required>
            </lightning-combobox>

            <lightning-button
                variant="brand"
                label="Submit Feedback"
                onclick={handleSubmit}
                disabled={isSubmitting}
                class="slds-m-top_medium">
            </lightning-button>
        </div>
    </lightning-card>
</template>
// customerFeedbackCapture.js
import { LightningElement } from 'lwc';
import { ShowToastEvent } from 'lightning/platformShowToastEvent';
import saveFeedback from '@salesforce/apex/CustomerFeedbackController.saveFeedback';

export default class CustomerFeedbackCapture extends LightningElement {
    comment = '';
    sentiment = '';
    isSubmitting = false;

    sentimentOptions = [
        { label: 'Positive', value: 'Positive' },
        { label: 'Neutral',  value: 'Neutral'  },
        { label: 'Negative', value: 'Negative' },
    ];

    handleCommentChange(e)   { this.comment   = e.target.value; }
    handleSentimentChange(e) { this.sentiment = e.detail.value; }

    async handleSubmit() {
        this.isSubmitting = true;
        try {
            await saveFeedback({
                comment:   this.comment,
                sentiment: this.sentiment,
                source:    'Service Console',
            });
            this.dispatchEvent(new ShowToastEvent({
                title: 'Thanks',
                message: 'Feedback captured.',
                variant: 'success',
            }));
            this.comment = ''; this.sentiment = '';
        } catch (err) {
            this.dispatchEvent(new ShowToastEvent({
                title: 'Could not save feedback',
                message: err?.body?.message ?? 'Unknown error',
                variant: 'error',
            }));
        } finally {
            this.isSubmitting = false;
        }
    }
}
// force-app/main/default/classes/CustomerFeedbackController.cls
public with sharing class CustomerFeedbackController {

    @AuraEnabled
    public static Id saveFeedback(String comment, String sentiment, String source) {
        if (String.isBlank(comment) || String.isBlank(sentiment)) {
            throw new AuraHandledException('Comment and sentiment are required.');
        }
        Customer_Feedback__c fb = new Customer_Feedback__c(
            Comment__c   = comment,
            Sentiment__c = sentiment,
            Source__c    = source
        );
        insert fb;
        return fb.Id;
    }
}

What it got right: with sharing defaulted on the controller, AuraHandledException for validation (so the LWC sees a clean error), the isSubmitting state on the button, the toast for both success and failure paths, the lightning-combobox rather than a hand-rolled select. What we redirected: the model initially used a generic Exception instead of AuraHandledException, which would have leaked the stack trace to the LWC. We asked for the change and the test class in the next step adapted to it.

Layer 3: Apex test class
// force-app/main/default/classes/CustomerFeedbackControllerTest.cls
@IsTest
private class CustomerFeedbackControllerTest {

    @IsTest
    static void saveFeedback_savesRecord_whenInputsValid() {
        Test.startTest();
        Id newId = CustomerFeedbackController.saveFeedback(
            'Loved the new portal flow.',
            'Positive',
            'Service Console'
        );
        Test.stopTest();

        Customer_Feedback__c saved = [
            SELECT Id, Comment__c, Sentiment__c, Source__c
            FROM Customer_Feedback__c
            WHERE Id = :newId
        ];
        System.assertEquals('Positive',         saved.Sentiment__c);
        System.assertEquals('Service Console',  saved.Source__c);
        System.assert(saved.Comment__c.contains('portal'));
    }

    @IsTest
    static void saveFeedback_throws_whenCommentBlank() {
        Boolean threw = false;
        try {
            Test.startTest();
            CustomerFeedbackController.saveFeedback('', 'Positive', 'Service Console');
            Test.stopTest();
        } catch (AuraHandledException e) {
            threw = true;
        }
        System.assert(threw, 'Expected AuraHandledException for blank comment.');
    }
}

What it got right: a positive-path assertion against the actual queried record (not just the returned Id), a negative-path assertion that catches AuraHandledException specifically rather than the base Exception, Test.startTest() / Test.stopTest() bracketing the call. What we redirected: the first version asserted on Comment__c with an exact string match; we asked for contains so the test does not break if a future trigger normalises punctuation.

After approving each layer, the final tool call was a deploy_metadata against the scratch org, this time without the dry-run flag. Two minutes from prompt to a working component in the org. The team member doing the demo went from “I do not see how this saves time” to asking for the config snippet before the meeting ended.


Where MCP wins over plain Copilot or Agentforce Vibes

Three things land different the moment you cross from autocomplete into agentic.

The first is cross-file, cross-metadata workflows. Copilot can write a great @AuraEnabled method. Vibes can write a better one because it knows the schema. Neither one can also create the sObject, generate the permission set, scaffold the LWC that calls the method, and write the test class, all in one prompt and all consistent with each other. MCP can. The five files in the worked example reference the same field names, the same picklist values, the same exception type, because one model produced them in one session.

The second is schema-aware code generation grounded in your live org. Vibes knows the schema as context. MCP can read it as a tool: run_soql_query against EntityDefinition and FieldDefinition at the moment of generation. When the model needs to know whether Account.Customer_Tier__c exists, it does not guess from training data; it asks. Stale-metadata bugs, the most common failure mode of every other Salesforce assistant, drop sharply.

The third is tests that match the controller it just generated. The test class in Layer 3 catches the exact AuraHandledException the controller in Layer 2 throws, with the field names from the sObject in Layer 1, because the same agentic loop produced all three. There is no drift between the controller’s contract and the test’s expectation. That alone removes a category of “the test was passing because it was lying” mistakes that quietly accumulates in any large Apex codebase.


Where MCP goes wrong (and how we fence it)

The ceiling is higher. So is the floor. These are the three failure modes we hit in the first month and the fences we wrote for each.

Org-permission scope

MCP can only do what the authenticated sf user can do, and that is the fence. We never authorize a production org on a developer’s machine that has the MCP server wired in. Every machine in the team has scratch and sandbox aliases authenticated; production lives on the architect’s machine, in a separate OS profile, with no MCP config in sight. The model cannot deploy where it has no token. That is structurally enforced by the absence of the alias, not by a rule we hope someone reads.

Accidental destructive deploys

Even on a sandbox, the model will occasionally propose a deploy that drops a field (because it inferred the field is no longer used) or removes a permission (because it tidied up the manifest). The fence is the manifest diff: we always require the model to surface the deploy package and we read it before approving. The worked example above does this explicitly with a deploy_metadata call in dry-run mode between scaffold and deploy. For destructive changes, the answer is almost always “no, leave the existing field, we will clean up later.” The agentic loop is fine with the redirection; it is much less fine with discovering after the fact that it dropped a field still referenced by a report.

⚠️ Never connect MCP to production

The single non-negotiable rule. Authenticate scratch orgs and sandboxes only. If an MCP-driven session ever needs to act against production, the action should leave the MCP loop entirely: open a normal change request, deploy through the regular release pipeline, with the regular human approvals. The local DX MCP server is a development accelerator, not a release tool. Treat it like an SSH session into your own laptop, not into the org.

Runaway costs

Long agentic loops can chain many tool calls and many model invocations. Each invocation is API tokens; each token costs money. The loops that scaffold a small feature in two minutes are cheap. The loops that try to refactor an entire trigger framework can spend an hour and a noticeable share of the month’s API budget before producing anything mergeable. The fence is intent-sized prompts: ask for one feature at a time, review at the diff boundary, and resist the temptation to say “now also clean up the related test classes” in the same session. If the work is large enough that one prompt cannot describe it cleanly, it is large enough to need a design doc, not a longer prompt.


Closing

Three tools in, the shape of the stack is starting to set. Copilot owns the line; Vibes owns the platform-aware file; MCP owns the cross-file feature. Each one earns its place because the next layer cannot do its job, and each one only earns it because we wrote the rules for what it does not do.

Next week closes the case study. AI PR review on GitHub + governance + results covers what happens when Copilot reviews every PR before a human does: the four things we never let it sign off on, the rule that kept its comments useful instead of noise, and the honest measurement of what changed across a release cycle. The thread that ties it back to here, and to the Agentforce platform primer: an agentic tool is only as safe as the boundary you draw around it.

How did this article make you feel?

Comments

Salesforce Tip

🎉

You finished this article!

What to read next

Contents