Cache with Versioned Tags
A couple of years ago I had an idea: a tagged cache with versioned tags. It works just like an associative array but with an added hidden layer in the cache interface that salts the key with any tags provided and their version numbers. This facilitates classified mass extinctions by the simple act of incrementing a tag version. The entire cache or any tagged subset can be “flushed” in this way. I call it extinction because although the items are dead (irretrievable by a normal read operation) they are not removed and may still be exhumed—Jurassic Park-style—until an expiration or eviction takes them out.
WordPress.com depends largely on memcached clusters so I modified the WordPress memcached client to operate with versioned tags and it did work as planned. It is not operating today because the core changes needed to make the benefits of the new cache outweigh the performance cost never happened. WordPress does use a “cache group” concept and we have used it successfully to create different cache scopes but it does not use a multi-tagging cache and it does not have a mechanism to flush groups of items. However, let’s keep WordPress in mind because the basic data structures of a blog are simple and thus good example fodder.
For one usage example, there could be a “posts” tag indicating that the cached resource derives from data in the posts table and you would increment the “posts” version every time you changed that table. When you ask the cache for items tagged “posts”, it will only find resources that were stored since the last time the “posts” tag was incremented by a flush. Any operation that changes the table’s structure or its contents would trigger a flush of the “posts” tag.
To add another degree of utility, you can apply any number of tags when storing a resource in the cache. Thus if you stored a rendered page, you could tag it with every table queried while generating that resource and rely on tag flushes instead of specifically deleting the item from the cache. For finer-grained control, tags could be specified for individual table rows (e.g. “posts:142″) but the detrimental effects of increased version array size would have to be outweighed by the benefits of excluding the rest of the “posts” items from an extinction.
The central idea behind this scheme is that you should have multiple paths to remove a key or set of keys without knowing which keys are stored in the cache. If your caching substrate has a good eviction scheme this works. If you can reliably determine the generating factors (tables queried, etc.) for every cached resource, you can forget about cache expirations and let the tags do all the work.
There is also the concept of “tag scope”. It is not universally applicable but in WordPress MU, where each blog has its own tables, each blog must also have its own array of tag versions so that extinctions are not unnecessarily applied out of context. There are also certain global keys that never vary between blogs. By specifying the scope of some tags as local to the blog and others as global, and storing global and local versions separately, we can specify whether to increment a tag version on one or all blogs.
Each blog would have a local array version that can be incremented to flush all tags in the local scope, making it easy to clear out one blog’s poisoned cache. A global array version allows us to trigger a universal cache extinction. With many dedicated memcached servers, this would be a neat time-saver.
Metaversions are the cherry on top: a global “metaversion” for any local tag can be stored in local as well as global version arrays. The local metaversion is checked against the global metaversion and if they differ, the local version and the local metaversion are incremented. Thus when the “posts” tag metaversion is incremented in the global array, each local “posts” version will be incremented. This way, we can trigger mass extinctions of any tag across all scopes.
My experimental client did all of this but it increased page generation times by 5-10%. I am still sure that if the software made smart use of cache tags we could have seen great benefits. Unfortunately, with a package as large as WordPress, the time to improve cache utilization would be many times longer than the two weekends I spent hacking the cache client.