I’ve worked on orgs ranging from a few thousand records to over 50 million, and the difference in how you need to think about data is enormous. What works perfectly at 100K records can bring an org to its knees at 5 million. Large Data Volume (LDV) is one of those topics that’s rarely taught until something breaks in production.
Let me walk through the strategies I rely on — from the quick wins to the architectural decisions that require planning from day one.
What Salesforce Considers “Large Data Volume”
Salesforce’s official LDV threshold is typically cited as 1 million records per object. In practice, you’ll feel the pain earlier — often around 500K records — depending on your query patterns, the number of fields, and the automation layered on top.
The symptoms are predictable:
- List views time out
- Reports take minutes instead of seconds
- Apex queries hit governor limits
- Batch jobs slow to a crawl
None of these are inevitable. They’re architecture problems with known solutions.
LDV Decision Framework
Scenario: A manufacturing org’s Case list views and reports started timing out after passing 2 million records. The SOQL queries were filtering on Status__c and Region__c — both unindexed fields — causing full table scans on every load.
A custom index request was submitted to Salesforce Support for both fields. After the 2-day turnaround, list view and report query times dropped from 30+ seconds to under 2 seconds. No code changes, no schema changes — just an index.
Custom Indexes — The Fastest Win
Standard Salesforce indexes cover a handful of fields: Id, Name, OwnerId, CreatedDate, SystemModstamp, RecordTypeId, and fields used in external IDs. Every other field runs a full table scan.
If your queries filter on a non-indexed field — say, Status__c or Account_Region__c — you can request a custom index from Salesforce Support. This is free and typically takes 1-2 business days.
Fields that benefit most from custom indexes:
- Fields used in WHERE clauses across large result sets
- Fields used in report filters with millions of rows
- Fields used in list view filters
Index limitations to know
- Null values are not indexed by default (Salesforce indexes non-null values only)
- Multi-currency orgs may have restrictions
- Indexes on formula fields are not supported
- For fields with skewed data (e.g., 80% of records have
Status__c = 'Closed'), a standard index won’t help. You’ll need a two-column index or a different query strategy.
Skinny Tables
A skinny table is a Salesforce-internal feature where Support creates a secondary, narrow table that contains only the columns your most-common queries need. Queries against that object read the skinny table instead of the full table, which is dramatically faster.
Skinny tables are ideal when you have a report or list view that pulls 5-10 fields from an object with 5 million+ records, and the query runs frequently (dashboard, home page, API integration). Request them from Salesforce Support — specify the object and the exact fields you need.
Query Optimization in Apex
Even with indexes in place, poorly written SOQL will hurt you at volume:
Bad Pattern
// BAD: retrieves all fields, no selective filter, no limit
List<Case> cases = [SELECT Id, Subject, Status, Account.Name FROM Case];Optimized Pattern
// BETTER: filter on indexed fields, limit columns, use LIMIT
List<Case> cases = [
SELECT Id, Subject, Status
FROM Case
WHERE Status = 'Open'
AND CreatedDate >= :Date.today().addDays(-30)
ORDER BY CreatedDate DESC
LIMIT 200
];Rules I follow for LDV queries:
- Always filter on indexed fields — use CreatedDate, OwnerId, or your custom indexed fields
- Use date literals (
LAST_N_DAYS:30) where possible — they’re index-friendly - Never use
!=orNOT INon large sets — these force table scans - Avoid OFFSET for pagination on large result sets — use cursor-based pagination with
WHERE Id > :lastIdinstead
Batch Apex at Scale
Standard Batch Apex has a default chunk size of 200 records. For LDV processing, I often reduce this:
Database.executeBatch(new MyBatchClass(), 50);Why smaller batch sizes help at scale
Smaller batches mean more transactions, but each transaction is less likely to hit CPU or DML limits. For read-heavy operations (no DML), you can often go higher — 2000 records per chunk.
For very large datasets, use indexed WHERE clauses so the initial query completes fast:
public Database.QueryLocator start(Database.BatchableContext bc) {
return Database.getQueryLocator([
SELECT Id, Status__c FROM My_Object__c
WHERE Status__c = 'Pending'
AND CreatedDate < :Date.today().addDays(-7)
]);
}Add an Is_Archived__c checkbox to any object that will grow large. Exclude it from all list view, report, and trigger filters from day one. When record counts climb, flip older records to archived in a nightly batch job. Your active working set stays small, your automation stays fast, and you avoid the painful retroactive work of adding archive logic after slowdowns appear.
BigObjects — Unlimited Historical Storage
BigObjects are a Salesforce feature for storing massive volumes of data (billions of records) directly in the platform. They don’t count against your standard storage limit.
When to Use
- Audit trails and compliance logs
- Historical transaction records
- Email send history, click/open tracking
- Any data that’s written once and rarely updated
Limitations
- No standard UI — you must access data via SOQL, Apex, or API
- No triggers, workflows, or flows on BigObjects
- You can only query on the indexed fields defined in the object’s index
- No UPDATE — you insert, you delete, but you can’t update individual rows
// Insert to BigObject
My_Archive__b record = new My_Archive__b();
record.Account_Id__c = accountId;
record.Event_Date__c = Datetime.now();
record.Event_Type__c = 'StatusChange';
Database.insertImmediate(record);External Objects — Data Where It Lives
If historical data lives in an external system (data warehouse, legacy ERP), you don’t need to import it into Salesforce. External Objects via Salesforce Connect let you query external data as if it were native, using OData or custom adapters.
The benefit: zero Salesforce storage consumption, data stays current at source. The tradeoff: each query makes a real-time callout to the external system, so latency matters.
Data Partitioning Strategies
For record-triggered automations and reports, partitioning your data logically can prevent slowdowns. Common patterns:
- Archive flag: add an
Is_Archived__ccheckbox, filter it out of all reports and triggers - Fiscal year partitioning: use a formula field to assign records to a year partition, build separate list views per year
- Soft delete pattern: set
Is_Deleted__c = trueinstead of hard-deleting, batch purge nightly
Why the archive flag pattern is so powerful
The archive flag pattern is particularly powerful because it requires minimal schema change but dramatically reduces the working set that triggers and reports operate on. A single checkbox, added proactively, can defer the need for BigObjects or external objects by years.
The Conversation to Have Before You Build
The most valuable LDV work happens before a record is written to the database. When designing a new object or integration, I always ask:
- How many records will this hold in 3 years?
- What fields will we filter on in reports and list views?
- Do we need all historical records accessible in the UI, or just recent ones?
- Are there compliance requirements that force retention of old records?
Answers to these questions determine whether you need custom indexes, skinny tables, BigObjects, or external objects — before you’ve written a line of code.
What LDV problem is your org currently struggling with, and which of these strategies are you considering? If you’ve used BigObjects in production I’d especially love to hear how you handled the lack of UI access — share your approach in the comments.
Test Your Knowledge
How did this article make you feel?
Comments
Salesforce Tip