A full-copy sandbox is a privacy time-bomb. It has your real customer names, real email addresses, real phone numbers, and in regulated industries, it has data that in production is surrounded by access controls, audit logging, and encryption — but in the sandbox is available to any developer who can log in.
Salesforce Data Mask is the tool that fixes this. It’s a managed package. Installation is straightforward. Configuration takes an afternoon. Running it takes minutes after a sandbox refresh. There is no reason not to have this in place.
And yet — most teams either haven’t set it up, or have set it up but left three categories of fields unmasked, which defeats the purpose entirely. Here’s the setup and the three-field checklist that closes the gap.
How Data Mask Actually Works
Per Salesforce’s Data Mask help documentation, Data Mask is a managed package installed into an Unlimited or Enterprise production org. You configure your masking rules there. When you refresh a sandbox from that production org, the sandbox inherits the configuration, and you run the masking job from within the sandbox itself.
Two things about this architecture matter:
- The unmasked window. Between the refresh completing and the masking job finishing, the sandbox is a full-fidelity copy of production. This is a real, finite exposure window — close it as fast as operationally possible.
- The masking is irreversible. Unlike a stage-and-commit flow, Data Mask does a destructive replacement. Once run, the original values are gone from the sandbox. That’s the point.
The Four Masking Types
Per the Salesforce Data Mask Trailhead module, Data Mask supports four replacement strategies. Pick the right one per field:
| Strategy | What it does | Good for |
|---|---|---|
| Anonymization | Replaces values with random library data (names, emails, addresses) | Name, Email, Phone, most PII |
| Pseudonymization | Replaces consistently — same input produces same output | Fields you join on in testing (same customer = same mask) |
| Pattern-matching | Replaces matching a regex or format (e.g., 16-digit card format) | Card numbers, account IDs |
| Deletion | Clears the field entirely | Fields you don’t need in sandbox at all |
The three categories of fields teams consistently miss are the ones where the default “leave it alone” is the wrong choice, and the right choice is not obvious until you look at the field carefully.
The Three Fields Teams Forget
1. Long-Text and Rich-Text Fields
Custom Long Text Area and Rich Text Area fields are the biggest gap in most Data Mask rollouts. Teams dutifully mask Email, Phone, and Billing_Address, then leave Service_Notes__c unmasked.
Look inside a typical Notes field on a Case object. It will contain real customer names, real email addresses, real account numbers, the product version of a confidential contract, a medical complaint a patient made — everything the structured fields were designed to keep out of plain text. Developers who can’t see Patient.SSN__c but can read the notes are not actually constrained.
Either anonymize to random library-generated text, or delete entirely if the field isn’t needed for sandbox testing. Do not leave as-is. If you must preserve some structure for testing (e.g., the notes still say “customer called about X”), build a pattern that replaces values but preserves the template.
2. Chatter Feed Posts and Email Message Bodies
Chatter feed content (FeedItem.Body) and email messages (EmailMessage.TextBody, EmailMessage.HtmlBody) are conversation records. They contain exactly the kinds of information the organization otherwise tries to protect — customer names, negotiated prices, confidential project codes, medical descriptions.
These objects are often missed because they are not the objects developers think of when designing masking rules. They come along for the ride in a sandbox refresh.
For Chatter and email bodies, deletion is usually the right answer unless the dev team has a specific use case for the content. If they do, anonymize — don’t leave the real conversations.
3. Legacy Data in Unusual Places
Every org I’ve worked with has at least one of these:
- A custom
Description__cfield on a custom object that was used as a dumping ground five years ago - A set of custom checkbox fields where somebody stored categorical PII (
Is_HIV_Positive__c) that should not exist even in production, let alone sandbox SystemModstamp-adjacent text fields used for audit notes
The only reliable way to find these is to enumerate the fields — go through the object manager, list every text, long-text, and rich-text field, and classify each. Don’t trust “our schema is clean.” It isn’t.
Before your first Data Mask configuration, run a one-time exercise: list every text-ish field in every object, classify as “clean”, “masked”, or “deleted”. Keep this inventory as a Custom Metadata Type. Review it quarterly — new fields added between reviews are the ones that become unmasked by default.
The Minimum Viable Setup
What to configure on day one
1. Install the managed package in production.
2. Assign Data Mask User permission set licences to the people who will configure and run the masking jobs.
3. Start with the standard objects that have PII: Contact, Lead, Account, Case, Opportunity, and any PersonAccount fields.
4. Configure masking rules for the fields that unambiguously contain PII:
| Object | Field | Strategy |
|---|---|---|
| Contact | FirstName, LastName | Anonymization (library) |
| Contact | Anonymization (email library) | |
| Contact | Phone, MobilePhone | Anonymization (phone library) |
| Contact | MailingStreet, MailingCity, MailingPostalCode | Anonymization (address library) |
| Account | Name | Anonymization |
| Case | SuppliedEmail, SuppliedName, SuppliedPhone | Anonymization |
5. Extend to long-text fields on Case, Opportunity, and your top five custom objects.
6. Add Chatter and email bodies — typically deletion.
7. Run the mask in a test sandbox, review the output, confirm nothing important broke, then roll out.
The Operational Habit
Problem: A financial-services client had Data Mask configured and ran it “whenever they remembered after a sandbox refresh.” The on-call team refreshed a sandbox at 16:00 on a Friday, developers logged in Monday morning, and the masking job was not run until 14:00 that Tuesday. During that Tuesday morning, a developer’s screen was shared in a training session — full client account numbers on display.
Fix: Automate the masking job to run immediately at sandbox-refresh-complete. Treat the unmasked sandbox like an open vault. Minutes matter, not days.
The operational rule that prevents this: masking runs before the sandbox is made available to developers. Not “after the refresh.” Not “when the admin gets around to it.” Before access is granted.
This is typically achieved by:
- Coupling the sandbox-refresh workflow with a masking-job trigger
- Having the sandbox in a restricted access state until masking completes
- Logging the masking run for compliance evidence
What Data Mask Does Not Do
Four limits that matter
1. Data Mask does not mask the schema. Field names, object labels, and custom metadata travel as-is. If your field name itself is sensitive (SSN_Last_Four__c on a custom object), rename it before migrating.
2. Data Mask does not encrypt — it destroys and replaces. If you need reversible masking for a specific test scenario, Data Mask is not the tool. Look at vendor alternatives or custom patterns.
3. Data Mask only runs in sandboxes. It cannot mask production data. Production PII controls are a separate concern — Shield Platform Encryption, field-level security, access controls.
4. Data Mask does not mask files. Attachments, Files, ContentDocuments carry their original content. If PDFs in your org contain customer PII, you need a separate strategy for those.
Where the Official Docs Live
- Secure Your Sandbox Data with Salesforce Data Mask (help) — the canonical overview
- Salesforce Data Mask Trailhead module — step-by-step setup, masking-type examples
- Salesforce Data Masking Techniques (Trailhead unit) — the four strategies in depth
The Trailhead module is the most practical starting point — it walks through the four strategies with concrete examples.
When did your team last run Data Mask on your full-copy sandbox — and do you have the automation to guarantee it runs before developers log in next time?
How did this article make you feel?
Comments
Salesforce Tip