Salesforce Data Mask for Sandbox Privacy - Setup and the Fields Teams Always Forget

A full-copy sandbox is a privacy time-bomb. It has your real customer names, real email addresses, real phone numbers, and in regulated industries, it has data that in production is surrounded by access controls, audit logging, and encryption — but in the sandbox is available to any developer who can log in.

Salesforce Data Mask is the tool that fixes this. It’s a managed package. Installation is straightforward. Configuration takes an afternoon. Running it takes minutes after a sandbox refresh. There is no reason not to have this in place.

And yet — most teams either haven’t set it up, or have set it up but left three categories of fields unmasked, which defeats the purpose entirely. Here’s the setup and the three-field checklist that closes the gap.

How Data Mask Actually Works

Data Mask — Production Install, Sandbox Execution

Per Salesforce’s Data Mask help documentation, Data Mask is a managed package installed into an Unlimited or Enterprise production org. You configure your masking rules there. When you refresh a sandbox from that production org, the sandbox inherits the configuration, and you run the masking job from within the sandbox itself.

Two things about this architecture matter:

The unmasked window. Between the refresh completing and the masking job finishing, the sandbox is a full-fidelity copy of production. This is a real, finite exposure window — close it as fast as operationally possible.
The masking is irreversible. Unlike a stage-and-commit flow, Data Mask does a destructive replacement. Once run, the original values are gone from the sandbox. That’s the point.

The Four Masking Types

Per the Salesforce Data Mask Trailhead module, Data Mask supports four replacement strategies. Pick the right one per field:

Strategy	What it does	Good for
Anonymization	Replaces values with random library data (names, emails, addresses)	Name, Email, Phone, most PII
Pseudonymization	Replaces consistently — same input produces same output	Fields you join on in testing (same customer = same mask)
Pattern-matching	Replaces matching a regex or format (e.g., 16-digit card format)	Card numbers, account IDs
Deletion	Clears the field entirely	Fields you don’t need in sandbox at all

The three categories of fields teams consistently miss are the ones where the default “leave it alone” is the wrong choice, and the right choice is not obvious until you look at the field carefully.

The Three Fields Teams Forget

1. Long-Text and Rich-Text Fields

Custom Long Text Area and Rich Text Area fields are the biggest gap in most Data Mask rollouts. Teams dutifully mask Email, Phone, and Billing_Address, then leave Service_Notes__c unmasked.

Look inside a typical Notes field on a Case object. It will contain real customer names, real email addresses, real account numbers, the product version of a confidential contract, a medical complaint a patient made — everything the structured fields were designed to keep out of plain text. Developers who can’t see Patient.SSN__c but can read the notes are not actually constrained.

⚠️ What to do with long-text fields

Either anonymize to random library-generated text, or delete entirely if the field isn’t needed for sandbox testing. Do not leave as-is. If you must preserve some structure for testing (e.g., the notes still say “customer called about X”), build a pattern that replaces values but preserves the template.

2. Chatter Feed Posts and Email Message Bodies

Chatter feed content (FeedItem.Body) and email messages (EmailMessage.TextBody, EmailMessage.HtmlBody) are conversation records. They contain exactly the kinds of information the organization otherwise tries to protect — customer names, negotiated prices, confidential project codes, medical descriptions.

These objects are often missed because they are not the objects developers think of when designing masking rules. They come along for the ride in a sandbox refresh.

💡 Pro Tip

For Chatter and email bodies, deletion is usually the right answer unless the dev team has a specific use case for the content. If they do, anonymize — don’t leave the real conversations.

3. Legacy Data in Unusual Places

Every org I’ve worked with has at least one of these:

A custom Description__c field on a custom object that was used as a dumping ground five years ago
A set of custom checkbox fields where somebody stored categorical PII (Is_HIV_Positive__c) that should not exist even in production, let alone sandbox
SystemModstamp-adjacent text fields used for audit notes

The only reliable way to find these is to enumerate the fields — go through the object manager, list every text, long-text, and rich-text field, and classify each. Don’t trust “our schema is clean.” It isn’t.

ℹ️ The field audit that pays off

Before your first Data Mask configuration, run a one-time exercise: list every text-ish field in every object, classify as “clean”, “masked”, or “deleted”. Keep this inventory as a Custom Metadata Type. Review it quarterly — new fields added between reviews are the ones that become unmasked by default.

The Minimum Viable Setup

What to configure on day one

1. Install the managed package in production.

2. Assign Data Mask User permission set licences to the people who will configure and run the masking jobs.

3. Start with the standard objects that have PII: Contact, Lead, Account, Case, Opportunity, and any PersonAccount fields.

4. Configure masking rules for the fields that unambiguously contain PII:

Object	Field	Strategy
Contact	FirstName, LastName	Anonymization (library)
Contact	Email	Anonymization (email library)
Contact	Phone, MobilePhone	Anonymization (phone library)
Contact	MailingStreet, MailingCity, MailingPostalCode	Anonymization (address library)
Account	Name	Anonymization
Case	SuppliedEmail, SuppliedName, SuppliedPhone	Anonymization

5. Extend to long-text fields on Case, Opportunity, and your top five custom objects.

6. Add Chatter and email bodies — typically deletion.

7. Run the mask in a test sandbox, review the output, confirm nothing important broke, then roll out.

The Operational Habit

🚨 Real-World Scenario

Problem: A financial-services client had Data Mask configured and ran it “whenever they remembered after a sandbox refresh.” The on-call team refreshed a sandbox at 16:00 on a Friday, developers logged in Monday morning, and the masking job was not run until 14:00 that Tuesday. During that Tuesday morning, a developer’s screen was shared in a training session — full client account numbers on display.

Fix: Automate the masking job to run immediately at sandbox-refresh-complete. Treat the unmasked sandbox like an open vault. Minutes matter, not days.

The operational rule that prevents this: masking runs before the sandbox is made available to developers. Not “after the refresh.” Not “when the admin gets around to it.” Before access is granted.

This is typically achieved by:

Coupling the sandbox-refresh workflow with a masking-job trigger
Having the sandbox in a restricted access state until masking completes
Logging the masking run for compliance evidence

What Data Mask Does Not Do

Four limits that matter

1. Data Mask does not mask the schema. Field names, object labels, and custom metadata travel as-is. If your field name itself is sensitive (SSN_Last_Four__c on a custom object), rename it before migrating.

2. Data Mask does not encrypt — it destroys and replaces. If you need reversible masking for a specific test scenario, Data Mask is not the tool. Look at vendor alternatives or custom patterns.

3. Data Mask only runs in sandboxes. It cannot mask production data. Production PII controls are a separate concern — Shield Platform Encryption, field-level security, access controls.

4. Data Mask does not mask files. Attachments, Files, ContentDocuments carry their original content. If PDFs in your org contain customer PII, you need a separate strategy for those.

Where the Official Docs Live

Secure Your Sandbox Data with Salesforce Data Mask (help) — the canonical overview
Salesforce Data Mask Trailhead module — step-by-step setup, masking-type examples
Salesforce Data Masking Techniques (Trailhead unit) — the four strategies in depth

The Trailhead module is the most practical starting point — it walks through the four strategies with concrete examples.

A full-copy sandbox has just finished refreshing. When is the right time to run Data Mask?

Which category of field is most often missed in Data Mask configurations?

When did your team last run Data Mask on your full-copy sandbox — and do you have the automation to guarantee it runs before developers log in next time?

How did this article make you feel?

Comments

Salesforce Tip

We'll be right back

Salesforce Data Mask for Sandbox Privacy - Setup and the Fields Teams Always Forget

How Data Mask Actually Works

The Four Masking Types

The Three Fields Teams Forget

1. Long-Text and Rich-Text Fields

2. Chatter Feed Posts and Email Message Bodies

3. Legacy Data in Unusual Places

The Minimum Viable Setup

The Operational Habit

What Data Mask Does Not Do

Where the Official Docs Live

Comments

Contents

We'll be right back

How Data Mask Actually Works

The Four Masking Types

The Three Fields Teams Forget

1. Long-Text and Rich-Text Fields

2. Chatter Feed Posts and Email Message Bodies

3. Legacy Data in Unusual Places

The Minimum Viable Setup

The Operational Habit

What Data Mask Does Not Do

Where the Official Docs Live

Comments

Related Articles

You finished this article!

Contents