Fundamentals of Caching in Drupal 8
This article at a glance -
Caching is an essential part of high-performance websites. Storing a web page (or part of a page) after generating it for future visitors results in faster load times than otherwise possible. Drupal includes numerous tools for caching content that can work for your site right out-of-the-box, and it's important to know what they are and what they do.
Drupal 8 has a whole host of great new features compared to Drupal 7 and previous versions, but one which is perhaps a bit overlooked is its fantastic caching system. Even if it isn't discussed all that often, it is a vital piece of what makes Drupal 8 such a good choice for fast and high-performance websites. At Ashday, we really delved into Drupal's caching system while building a number of high-traffic, content-heavy sites for EnsembleIQ, and not only did it help us on that project, but also on all the Drupal 8 sites we've worked on since.
This post is the first in a series I'll be writing about caching in Drupal 8. Over the course of the series, I'll be sharing not only some of the things we learned from the EnsembleIQ project but also from the other sites we have developed since. I assume that readers have at least some familiarity with the fundamental concept of caching; if not, consider checking out Drupal.org's documentation about the concept before continuing on to the more technical details below.
Today, I'll be looking at Drupal's built-in caching modules and at the various ways Drupal makes it possible to invalidate cache entries.
Drupal includes two separate modules for caching, which sound similar at first but are actually quite distinct. These are the Internal Page Cache module and the Internal Dynamic Page Cache module. It's important to realize, however, that most of Drupal's caching functionality is built into core and is enabled even without these modules. All the modules do is implement Drupal's existing caching capabilities in a way which should work well for most websites.
The Internal Page Cache module provides functionality somewhat similar to Drupal 7's built-in caching, but with many improvements. While the module is enabled, any time a visitor goes to a page on the site while not logged in, that entire page gets stored in the cache. Future anonymous visitors will then see the same content loaded from the cache, which is stunningly fast since it means the page doesn't have to be put together again from scratch. This module is very important for sites which have a lot of visitors without accounts, but if most of your users are logged in, then it won't be as helpful.
The Internal Dynamic Page Cache module is designed to cache small parts of each page (such as individual blocks) for all users, whether or not they are logged in. Then, when the page needs to be displayed again later to the same or a different user, the module can pull in those individual parts to speed up the building of the page. A great example of this is the header menu of a site. If you have the same header on every page (or even just on two pages), there's no real need for Drupal to recreate all of that HTML each time it assembles a page, since it's just going to get the same result each time. Storing the header's HTML in the dynamic page cache means it only has to get created once to work on all pages, and that sort of time savings can definitely add up, especially on complex sites.
These modules work in conjunction with each other as well. If the header is cached by the dynamic page cache, then an anonymous visitor goes to a page on the site, that page will get the cached version of the header, build the rest of the page on the fly, and then store the whole final result in the internal page cache.
Unless there is a compelling reason not to, these modules should generally both be enabled on most any Drupal 8 site. One exception is that if all of your user activity happens while they're logged in, you may wish to consider uninstalling the internal page cache module since it won't do much of anything for your site.
Tags, Contexts, and Max-Age
In order for a caching system to be useful, it needs to know when not to use the cached version of something - if it doesn't, then there's a chance that it may display something which is out-of-date or intended to be displayed in other circumstances. Fortunately, Drupal has the perfect architecture for handling this, thanks to its use of tags, contexts, and max-age.
Tags are used to invalidate cache entries when something on the site changes (invalidating simply means that the cache entry won't get used, and will get recreated the next time that piece of content is rendered). Drupal includes many cache tags to account for all sorts of different scenarios. For instance, the cache tag "node:5" gets invalidated any time the Drupal content node with ID 5 gets modified. Whenever content gets cached which relies on something related to node 5, the cache entry keeps track of that tag; then, saving the node causes that cache entry to get invalidated.
There are different cache tags for all sorts of things, from individual nodes and blocks, to site configuration settings, to menus. Basically, any time you save something in Drupal, an appropriate tag gets invalidated.
Contexts are a bit different. Cache contexts get stored alongside cache entries and are designed to allow content to vary depending on what circumstances it is displayed in. A simple example: Say you have a site with users of several different roles, and one block on the site shows something different depending on what roles the user seeing it has. This isn't something that can be accomplished with cache tags alone (since nothing gets changed about the site itself to invalidate the cache entry between two different users viewing the block). But rather than leaving the block completely uncached, it can instead have the "user.permissions" context applied to it. With this context, the block can actually get cached multiple times – specifically, one time for each combination of roles that the users seeing the block have. This way, an "administrator" can see something different from an "editor" who can see something different from a user who has both roles.
Cache contexts exist for numerous things; there are user-based contexts, as in this scenario, as well as contexts which can vary content by the URL the content is being viewed at or by what theme or language the content is being viewed in.
Max-Age is the final way of handling cache invalidation and is quite simply a way to set how long the content should be cached for. This can range from 0 seconds (to not cache the content at all) to as long as you please, or it can be set to -1 to cache the content indefinitely until tags or context indicate otherwise. Assuming that all of your tags and contexts are working as intended, this can be set to indefinite (which is what Drupal usually does by default), since those can cover most scenarios where cached content might need to be recreated.
However, sometimes it is necessary to set a specific, time-based limit on how long any one cache entry should be used. For instance, say you have a page that gets its content from a remote service. The content returned by that remote service changes several times each day. When it does so, however, it does nothing to notify your Drupal site that the content has changed, so no cache tags can be invalidated, and no context is helpful since the content doesn't vary by the situations in which it is displayed. If you set a max-age of 3600 on the page, then it will cache its content for up to one hour before automatically invalidating, at which point the next person who views the page would get a brand-new updated version (fresh with new content from the remote service) which would then get cached for another hour. This way, you get all the benefits of caching without causing your site to stop updating itself with content from the remote service.
One caveat of doing this is that it does mean that the content from the service that is shown on your site might be close to an hour "out of date" from the content provided by the service. If the service updates at 3 PM and somebody views your site at 2:55 PM, your site will be "stuck" showing the older content until 3:55 PM. For many use cases, this is an acceptable discrepancy, with the extra speed provided by caching making it will worth some occasionally out-of-date content. On the other hand, if you need up-to-the-second updates from the remote service, perhaps for something like displaying the scores of an ongoing event, then simple max-age caching may not be the right choice.
So, to recap: Tags are to make sure cached content reflects changes made to the site, Contextsare to make sure that content which displays differently in different circumstances does so correctly, and Max-Age is for simply caching things based on an amount of time.
That's the fundamentals of how Drupal 8 caches content out-of-the-box. As this series continues, we'll cover how caching, including tags, contexts, and max-age, can impact the custom code you might write for your site, and we'll also look at how to determine what rules your site might need for caching its content in ways that Drupal might not handle out-of-the-box.