Batcache for WordPress

[I meant to publicize this after a period of quiet testing and feedback but the watchdogs at WLTC upended the kitten bag and forced my hand. Batcache comes with all the usual disclaimers. If you try it on a production server expect the moon to fall on your head.]

People say WordPress can’t perform under pressure. The way most people set it up, that’s true. For those who host their blog for $7.99 a month (do they also run Vista on an 8086?) the best bet is to serve static pages rather than dynamic pages. Donncha’s WP-Super-Cache does that brilliantly. I’ve seen it raise a server’s capacity for blog traffic by one hundred times or more. It’s a cheapskate’s dream.

WP-Super-Cache is good for anyone with a single web server with a writable wp-content/cache directory. To them, the majority, I say use WP-Super-Cache. What about enterprises with multiple servers that don’t share disk space? If you can’t or won’t use file-based caching, I have something for you. It’s based on what WordPress.com uses. It’s Batcache.

Batcache will protect you

Batcache implements a very simplistic caching model that shields your database and web servers from traffic spikes: after a document has been requested X times in Y seconds, the document is cached for Z seconds and all new users are served the cached copy.

New users are defined as anybody who hasn’t interacted with your domain—once they’ve left a comment or logged in, their cookies will ensure they get fresh pages. People arriving from Digg won’t notice that the comments are a minute or two behind but they’ll appreciate your site being up.

You don’t need PHP skills to install Batcache but you do have to get Memcached working first. That can be easy or hard. We use Memcached because it’s awesome. Once you know how to install it you can create the same kind of distributed, persistent cache that underpin web giants like WordPress.com and Facebook.

What Batcache does

The first thing Batcache does is decide whether the visitor is eligible to receive cached documents. If their cookies don’t show evidence of previous interaction on that domain they are eligible. Next it decides whether the request is eligible for caching. For example, Batcache won’t interfere when a comment is being posted.

If the visitor and the request are eligible, Batcache enters its traffic metering routine. By default it looks for URLs that receive more than two hits from unrecognized users in two minutes. When a URL’s traffic crosses that threshold, Batcache caches the document for five minutes. You can configure these numbers any way you like, or turn off traffic metering and send documents right to the cache.

Once a document has been cached, it is served to eligible visitors until it expires. This is one place where Batcache is different. Most other caches delete cached documents as soon as the underlying data changes. Batcache doesn’t care if it’s serving old data because “old” is relative (and configurable).

What Batcache doesn’t do

It doesn’t guarantee a current document. I repeat this because reliable cache invalidation is a typical feature that was purposefully omitted from Batcache. There is a routine in the included plugin that tries to trigger regeneration of updated and commented posts but in some situations a document will still live in the cache until it expires. This routine will be improved over time but it is only an afterthought.

Batcache doesn’t automatically know the difference between document variants. Variants exist when two requests for the same URL can yield two different documents. Common examples are user agent-dependent variants formatted for mobile devices and referrer-dependent variants with Google search terms highlighted. In these cases you MUST take extra steps to inform Batcache about variants to avoid serving a variant to the wrong audience. The source code includes examples of how to turn off caching of uncommon variants (search term highlighting) or cache common variants separately (mobile versions).

Where Batcache is going

I want to make Batcache easier to configure by adding a configuration page and storing the main settings in memcached as well as the database. This way you won’t have to deploy a code change to update the configuration. However, conditional configurations (e.g. “never cache URLs matching some pattern”) and variant detection will probably always live in PHP.

I want to have Batcache serve correct headers more reliably. On some servers it can detect the headers that were sent with a newly generated page and serve them again from the cache. But when that doesn’t work you will have to take extra steps to serve certain headers. For example you must specify the Content-Encoding header in the Batcache configuration or add it to php.ini. I want this sort of thing to be done automatically for all server setups.

I know that Batcache is not ideal for most WordPress installations. It saves us a lot of headaches and expense at WordPress.com, so maybe it can help other large installations. If you try it, I want to hear from you whether it worked and how well. I am also keen to see what new configurations and modifications you use.

As always, this software is provided without claims or warrantees. It’s so experimental that it doesn’t even have a version number! Until the project grows to need its own blog, keep an eye on the Trac browser for updates.