To achieve a good customer experience, the performance of your Sitecore based digital platform is an elementary subject. This Blog is intended for Sitecore developers or architects and I will address some of the basics around the Sitecore caching and some of the best practices that you can follow to boost the performance of your Sitecore solution. My goal is to expose all the tools and knowledge that is necessary to start improving your Sitecore website performance!
While working for a client, we decided to migrate from Sitecore 8.1 to 9.0.2 and combine this migration with a change in hosting architecture. Our website was running on IaaS resources and we decided to move to PaaS. This was a big step forward, however with many challenges on the way!
Before swapping the IaaS production environment with the PaaS, we did a couple of load tests to get a feeling on the performance of our new PaaS environments. The outcome was not as good as we were expecting, so we started investigating on all the possible performance tweaks we could possibly apply.
To simplify things, when I am referring to cache I mean in-memory cache which refers to a memory space in RAM.
Sitecore caching structure
It is important to undestand the structure of the Sitecore caches, what the purpose of each cache layer is and why you should use them.
Why use cache?
The point of using caches is related to the fact that the communication between the client(browser) and the databases is relatively slow. The closer you get to the database, the growth in memory access time can be exponential, which makes it far worse than a linear growth! Imagine every call from the user in the browser doing a search in the database to fetch the required items! A heavy process would be initiated in the background to fetch all these items, while the user was waiting for a response. To make this process a bit faster, we can leverage the caches, that actually bring all these items a bit “closer”(compared to the databases) to the client and make the whole process a lot faster. You may be wondering:
Why don’t we place all the items in the cache?
Answer: Well there are many reasons that prevent us from doing so. The main reason is that we want the cache size to be relatively small (I will explain below in details). Another reason might be that we cannot actually fit all our data inside the caches. The software caches are actually located in the RAM of the machine that hosts our website and the maximum size of all the caches combined is limited to the amount of available RAM – keep in mind that a part of the RAM is reserved/occupied by the OS and other services running in the machine-.
Why there are many different layers of cache?
There are multiple layers of cache that grow in volume as you go further away from the client. In order to explain the reason behind this, it’s important to mention that as the cache size grows, the latency to find the data in the cache gets higher as well. Searching for data inside a cache is done with a Linear Search and therefore the time complexity is O(N), where N is the amount of items in the cache. As N grows inside the cache, the searching time grows as well! It is clear now that the less items are stored in the cache; the faster it is to retrieve a single item from within the cache. Finally we can answer the question: ”Why there are many different layers of cache”. When a request comes in, we want to search in the cache to check if the items that will be used to create the response exist. We want to respond to the user as fast as possible so we want a really fast cache in the first layer. If we cannot find what we are looking for on this layer, we have another layer of cache to look into, which is bigger than the first (and thus a bit slower). The searching for the required items in the cache goes on in the rest cache layers until we reach the DB.
One important thing to note down here is that the purpose of the HTML cache is to store HTML parts that have already been created. It doesn’t store the required data to produce the HTML but the actual result. This removes the CPU overhead to recreate a piece of HTML .Thus, when you use the HTML cache, you save time and resources as well- kill two birds with one stone -!
The design of HTML cache -which is the closest to the client- is to maximize the hit rate (the probability of the desired data being in the cache) while keeping the cache latency as low as possible. The Item cache – which comes next in the hierarchy - is much larger, and is designed to minimize the miss penalty (the delay incurred when a HTML cache miss happens). The same principle applies to the rest of the cache layers.
Is it a good practice to follow the pyramid design of the cache layers?
YES! We explained in the previous paragraph what determines how fast a cache can be and how tightly coupled is the size with the performance of the cache. In the first layer of cache which is the closest to the client we want a small/fast cache, in the second layer a bigger cache but still fast….. This results to a pyramid shaped design for the cache layers which turns out to be the most efficient.
How to setup the perfect patch for Sitecore caches
You will need two things in order to setup the patch file:
- Prepare a simple load test that requests the most important pages of your website,
- Open the Cache Administration Page of your Sitecore instance ("hostname"/Sitecore/admin/cache.aspx) and keep it open on the side.
You need to identify some of the most important caches that need to be configured properly and configure their initial size to at least 100MB. Let’s use web DB as an example and set the size of Data and Item caches to 100MB each.
While running the load test, monitor the cache utilization and especially the__Delta__ column. The Delta is the approximate change in the size of the cache since the last refresh of the cache administration page.
- If the Delta for a cache fluctuates constantly, or if the size of the cache is consistently above 80% of the size limit, then increase the size of the cache by at least 50%.
Example: Cache size limit: 300MB and current utilization is 290MB. (290MB / 300MB) * 100% ~= 97% which is bigger than 80%. The size of the cache should be increased to 450MB. (300MB + 300MB * 50%)
- If a cache size seems to be a lot bigger than necessary, then reduce the size of this cache so that the maximum value found during load is around 65%-75% of the total cache size.
Example: cache size limit = 300MB and maximum value during load test is 70MB. The size limit of this cache should be 100MB
Your patch file should like this one:
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/"> <sitecore> <databases> <!-- Core DB --> <database id="core"> <cacheSizes hint="setting"> <data>100MB</data> <items>50MB</items> <paths>1MB</paths> <itempaths>20MB</itempaths> <standardValues>1MB</standardValues> </cacheSizes> </database> <!-- Master DB --> <database id="master"> <cacheSizes hint="setting"> <data>1GB</data> <items>500MB</items> <paths>10MB</paths> <itempaths>50MB</itempaths> <standardValues>10MB</standardValues> </cacheSizes> </database> <!-- Web DB --> <database id="web"> <cacheSizes hint="setting"> <data>2GB</data> <items>1GB</items> <paths>10MB</paths> <itempaths>50MB</itempaths> <standardValues>10MB</standardValues> </cacheSizes> </database> </databases> </sitecore> </configuration>
Cache size vs Performance
After setting up the patch file for the caches size, you might wonder if the size of you cache is too big or too small, if the size is actually affecting the performance of you website or not... Unfortunately, there is no golden ratio for this! The reason is that every Sitecore solution differs and the same applies for the items stored in the database of each solution.
For example, let’s that assume that we have 2 Sitecore solutions A and B. The Item that we use the most in solution A is Item A with a size of 0.5GB (I am exaggerating a bit for the sake of the example) and the item we use the most in solution B is Item B with size of 1MB. If we have set the size of Item Cache to 1.5GB that means that for solution A we can store 3 Item A items, and for solution B we can store 1500 item B items. In the scenario where both caches are fully occupied with items, finding an item in the Item Cache of solution A is faster compared to solution B (remember that the cache size is the same in both solutions). So retrieving an item from cache is faster but the price you pay is in the amount of items you are able to store in the cache.
Another important thing to notice is that we are using software implemented caches! The performance of software is tightly coupled with the hardware (+OS and everything that might interfere with execution time) that our software is running on. Since it is recommended to host your website in dedicated environments, you should not worry about this too much, but it is good to keep it in mind.
The optimum cache size (on each layer) is dependent on the structure of each solution, the type/size of the most commonly used items and the hosting environment. After spending some time on executing load tests with different parameters, you can find what suits best for your custom solution.
In my next blog post I will analyze performance tweaks/best practices with focus on Sitecore instances running on Azure PaaS.