LDV Testing on the Force.com Platform

What is LDV?

The Acronym LDV stands for Large Data Volumes. Admittedly, the term has a very fuzzy definition. According to Salesforce, if your deployment has tens of thousands of users, tens of millions of records, or hundreds of gigabytes of total record storage then your deployment qualifies as needing to consider the LDV consequences. I consider LDV to be when Salesforce reporting breaks down, SOQL starts returning more than what’s allowed, or query performance is impacted due to table sizes.

What to do when you have LDV?

There are already a lot of other really good resources out there for building around LDV and ensuring applications are still performant. (See links to the standard reading below.) This article is focused on the testing of LDV after you’ve come up with a good LDV architecture and made all of your performance optimizations.

Here are a few links to the standard reading regarding governors and LDV.

Governors: https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/apex_gov_limits.htm

LDV: https://developer.salesforce.com/docs/atlas.en-us.salesforce_large_data_volumes_bp.meta/salesforce_large_data_volumes_bp/

Query Selectivity: https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/langCon_apex_SOQL_VLSQ.htm

Identify where LDV will affect you.

Not every transaction in your application will be affected by LDV, I hope! Usually there are going to be one or two objects you’ll want to focus on. These can be but are not limited to junction objects, transaction logs, or grandchildren. I like to put these objects on everyone’s minds who will be working on the project as soon as possible, so I identify them as soon as possible. The best tool I’ve found is to put together a simple spreadsheet with every business object listed and ask the SMEs what sort of volume to expect. From there I can perform the calculations based on number of users and relationships to see where LDV will hit us – specifically focusing on total number of children and grandchildren based on expected parent and grandparent records. This spreadsheet is never very complex, but it wouldn’t be difficult to build a tool in SFDC or a spreadsheet that would do the work for you.

What can go wrong?

If LDVs are ignored there is a good chance you’ll have customers finding your LDV problems for you. This is of course the worst time to find issues like this, because the number of records involved are huge and not easily migrated or dealt with. This is especially true when you’re working with a managed package and you can’t easily alter the object model or indexes.

Unit Tests

Unit tests are going to be the first type of testing that we focus on. You are already aware that a ‘good’ unit test will test bulk operations, and do so with at least 200 records. If your code behaves with 200 records it can be properly considered bulk safe. Bulk safeness is a topic related to what we’ll be discussing here, but one that’s been covered so many times I would probably be doing a disservice if I wrote about how to do it properly. Again, thinking about LDV only. So, let’s assume your production code is 100% bulk safe from here on out.

Based on what you identified as LDV areas prior to breaking ground, you’ll want to call out focus areas where more stringent unit testing is required. Then when writing the tests you’ll want to add a few extra tests focused on testing with larger-than-normal volumes. This can be difficult to do when you consider the governor limits around DML and setting up tests. How can you properly test tens of millions of records when you can only insert 10k in a test? Well, there’s always seeAllData=true. It’s generally reserved for when you’re stuck working on pricebook entries or the few other objects that can’t be manipulated in a test, but it also makes a lot of sense here. We always have several developer edition orgs or sandboxes on the path to production. It’s usually not difficult to boot up one additional DE or SB with a massive amount of data in your LDV objects that unit tests can be run against.

If this idea disgusts you (it does a lot of people!) don’t use it. While you won’t ever be able to approximate a real world million record test in an apex unit test with seeAllData=false, you can still measure execution times with a smaller data set, and assert on them. In this case I would come up with a baseline amount of time a certain transaction should execute in, with some wiggle room of 10% or so. If this test ever fails, then I know a developer has added some code to this area of the system. This will at least alert me to inspect that code – and I can alter the new expected computation time or recommend optimizations to the code. The idea here is that any small amount of time added to a transaction that is in the LDV area could end up being 10x more severe in the real world. This style will not alert you to issues in your code with SOQL selectivity, but it will show you when new loops show up where you don’t want them.

Functional Testing

The best way to find issues in your LDV areas I’ve found is with functional testing. Sometimes it can be hard to even know what to look for on a new codebase. In this case just load up the system with crazy data.

Here’s what I mean by “crazy data”:

Millions of Accounts

Skew on a good number of them, such as 12k contacts under each of the 20k account records.

Millions of Leads
Millions of records under your Custom objects along with the required junctions and transactions to simulate the number of records you will realistically hit within your archival period

Transactions to test during this time:

Are my Visualforce page controllers performant?
Do my batch classes sit in preparing status forever?
Do I see any table scans in the query planner in the developer console?

This one is big. You should be checking this whenever a change is made to your LDV queries to ensure you’re not accidentally introducing a select on null, or a tablescan, and that indexes are properly used.

What happens when I have five or ten or 100 users all doing the same thing?

What happens when I have them doing the same thing while a batch is running?
What happens when I have them doing the same thing while a sharing recalc is happening?
You’re looking for row locks.

It’s up to you and your team’s capabilities as to whether you automate what I describe above or if you want to have humans do it. Some things like the running of the queries through the query planner will have to remain manual for now, but they’re worth it.

Case Studies

“Aggregate query has too many rows …. “

This was a fun one to discover. I’m sure you’ve all seen this issue before, and it was during your first few weeks of working with Apex. You queried your account table without a filter and because there were 10001 records you could not assign them to your list. Instead you learn to loop over them directly. However, the error can show up in more places when a developer makes a simple mistake. Here’s one: imagine that you have 50 Account records total, but each Account record has 2000 Contact record children. 50 accounts is not too many to assign to a list, neither is 2000. But the below code will still result in the error showing up –

for (Account acct : [SELECT Id, (SELECT Id FROM Contacts LIMIT 1000) FROM Account]) { //do something }

For some reason Force.com considers the combination of 1000 contacts each for 50 accounts to be the same as directly assigning 50000 contacts to a single list in this case. We missed this in testing because we never imagined it would be a problem. Had we built one of the crazy LDV testing orgs I described earlier in the article we would have discovered it before our customer did. In the end we really didn’t need an unbound contact subquery here. We likely never would have found this problem in a simple Apex unit test because there’s no way we could have exceeded the 10k limit on the test data insert to begin with.

“Query is not selective …. “

Here’s another fun one. This one bit me personally a few years ago. Salesforce has laid out a lot of guidelines for making queries selective. These guidelines are detailed enough that one should be able to estimate whether a query is selective or not. It has been my experience that these thresholds are not always followed by their system. So it’s easy to get yourself into trouble with a non-selective query because SFDC will do its best to deliver the results even if you’re way beyond selectivity thresholds.

In my case I had a table with about 4.5 million records which represented a year’s worth of transactions. Each month was roughly 1/12th of the overall 4.5mil, or about 375k. These transactions were queried by a batch job that tabulated everything. During the early days it was very performant, since each month we were only adding 375k records. Month #2 we only had around 750k, Month #3 1.1mil, etc. We had a 2 year archival strategy, and thought we had done a good job with a selective query that powered the batch. One thing we never picked up on at month #13 was that our batch was sitting in ‘Preparing’ until it timed out after 10 minutes. We had a tablescan in there by mistake, and we never caught it because we didn’t test with actual data volumes, and we didn’t run our queries through the query planner to discover tablescans as soon as they were introduced. Using either of the methods described in the functional testing section above would have alerted the team to the problem immediately, and saved many hours of processing time.

Summary

My hope here is that I’ve left you with some new ways of thinking about large data volumes, specifically around testing. We’ll publish another article soon with mitigation strategies. Worth repeating is that large data volumes on the Force.com platform can result in major impacts to user enjoyment and adoption, and they’re worth identifying as soon as possible. Good luck!

Does your app work with LDV?

Do you have a client that just signed their largest deal ever? Do you know if your app will work with large data volumes? We can help get answers fast. Let’s talk!