Help test Stats 1.5 beta

At first we thought it was a good idea to use iframes to display reports in the Stats plugin. We’ve seen since then a lot of problems with browsers and cookies. To help resolve these issues, and in anticipation of future features, I am updating the plugin and the stats reporting system to remove the iframes. I just posted 1.5 beta 1. If you host your own WordPress 2.7+ blog and you use the Stats plugin, why not contribute to its development by installing this testing version? Anyone can download the beta but I don’t recommend it unless you are able to cope with potentially unstable software.

  • What are the risks of using this beta?
    You won’t lose any stats. If something goes horribly wrong it’s probably a bad download; just reinstall the latest version of Stats.
  • How does it work?
    The plugin connects to to get the stats reports when you request them. It uses the API key to authenticate.
  • Aside from fixing cookie problems, how is this better?
    Now it’s possible for anyone who can publish posts on your blog to see blog stats. They don’t have to be logged into a account. They only need the publish_posts capability (Author role) to view stats reports.
  • Where did the dropdown blog switcher go?
    Because the plugin uses a single API key to authenticate, the service doesn’t know whether the visitor is the owner of that key or some other user. So it doesn’t make much sense to show the list of blogs belonging to the API key owner. You can still use the switcher if you view your stats on any dashboard.
  • Where did the Stats Access panel go?
    This is also related to single API key authentication. Maybe in future we will bring administrative access back to the plugin. But until then, we have left the Stats Access panel intact on dashboards. You might want to bookmark if you need these features on a regular basis.
  • Will this be a required upgrade?
    You mean will older version of stats be broken? Not by 1.5. Later versions may break compatibility but for now you can keep using earlier versions of Stats if you like.
  • What if I install this and still see iframes?
    This happens because your server is unable to connect to I set it up to use SSL (https) in the hopes that most hosts support this. If yours does not work, I’d like to hear from you and do some testing on your host.

Upgrade Memcached Before WordPress

Self-hosted WordPress and WordPress MU administrators: if you are using the memcached object cache (a prerequisite for batcache), upgrade it before upgrading WordPress. There is a bug that keeps the old db_version in the options cache, preventing WordPress from remembering that it has been upgraded, and this causes it to ask you to upgrade again. In a pinch you can resolve the problem by restarting the memcached daemon.

The memcached object cache can’t be automatically upgraded because it’s not a normal plugin. Make sure to use the right version: 1.0 (sockets) or 2.0 (PECL). Only one line was changed, so you might prefer to update by hand: 1.0, 2.0.

If you aren’t sure whether you are using memcached, look for a file named object-cache.php in wp-content. If that file exists, look inside to see if the plugin name is “Memcached”.

Batcache for WordPress

[I meant to publicize this after a period of quiet testing and feedback but the watchdogs at WLTC upended the kitten bag and forced my hand. Batcache comes with all the usual disclaimers. If you try it on a production server expect the moon to fall on your head.]

People say WordPress can’t perform under pressure. The way most people set it up, that’s true. For those who host their blog for $7.99 a month (do they also run Vista on an 8086?) the best bet is to serve static pages rather than dynamic pages. Donncha’s WP-Super-Cache does that brilliantly. I’ve seen it raise a server’s capacity for blog traffic by one hundred times or more. It’s a cheapskate’s dream.

WP-Super-Cache is good for anyone with a single web server with a writable wp-content/cache directory. To them, the majority, I say use WP-Super-Cache. What about enterprises with multiple servers that don’t share disk space? If you can’t or won’t use file-based caching, I have something for you. It’s based on what uses. It’s Batcache.

Batcache will protect you

Batcache implements a very simplistic caching model that shields your database and web servers from traffic spikes: after a document has been requested X times in Y seconds, the document is cached for Z seconds and all new users are served the cached copy.

New users are defined as anybody who hasn’t interacted with your domain—once they’ve left a comment or logged in, their cookies will ensure they get fresh pages. People arriving from Digg won’t notice that the comments are a minute or two behind but they’ll appreciate your site being up.

You don’t need PHP skills to install Batcache but you do have to get Memcached working first. That can be easy or hard. We use Memcached because it’s awesome. Once you know how to install it you can create the same kind of distributed, persistent cache that underpin web giants like and Facebook.

What Batcache does

The first thing Batcache does is decide whether the visitor is eligible to receive cached documents. If their cookies don’t show evidence of previous interaction on that domain they are eligible. Next it decides whether the request is eligible for caching. For example, Batcache won’t interfere when a comment is being posted.

If the visitor and the request are eligible, Batcache enters its traffic metering routine. By default it looks for URLs that receive more than two hits from unrecognized users in two minutes. When a URL’s traffic crosses that threshold, Batcache caches the document for five minutes. You can configure these numbers any way you like, or turn off traffic metering and send documents right to the cache.

Once a document has been cached, it is served to eligible visitors until it expires. This is one place where Batcache is different. Most other caches delete cached documents as soon as the underlying data changes. Batcache doesn’t care if it’s serving old data because “old” is relative (and configurable).

What Batcache doesn’t do

It doesn’t guarantee a current document. I repeat this because reliable cache invalidation is a typical feature that was purposefully omitted from Batcache. There is a routine in the included plugin that tries to trigger regeneration of updated and commented posts but in some situations a document will still live in the cache until it expires. This routine will be improved over time but it is only an afterthought.

Batcache doesn’t automatically know the difference between document variants. Variants exist when two requests for the same URL can yield two different documents. Common examples are user agent-dependent variants formatted for mobile devices and referrer-dependent variants with Google search terms highlighted. In these cases you MUST take extra steps to inform Batcache about variants to avoid serving a variant to the wrong audience. The source code includes examples of how to turn off caching of uncommon variants (search term highlighting) or cache common variants separately (mobile versions).

Where Batcache is going

I want to make Batcache easier to configure by adding a configuration page and storing the main settings in memcached as well as the database. This way you won’t have to deploy a code change to update the configuration. However, conditional configurations (e.g. “never cache URLs matching some pattern”) and variant detection will probably always live in PHP.

I want to have Batcache serve correct headers more reliably. On some servers it can detect the headers that were sent with a newly generated page and serve them again from the cache. But when that doesn’t work you will have to take extra steps to serve certain headers. For example you must specify the Content-Encoding header in the Batcache configuration or add it to php.ini. I want this sort of thing to be done automatically for all server setups.

I know that Batcache is not ideal for most WordPress installations. It saves us a lot of headaches and expense at, so maybe it can help other large installations. If you try it, I want to hear from you whether it worked and how well. I am also keen to see what new configurations and modifications you use.

As always, this software is provided without claims or warrantees. It’s so experimental that it doesn’t even have a version number! Until the project grows to need its own blog, keep an eye on the Trac browser for updates.

Austin WordPress Professional Office

We’re thinking of opening an Automattic office in Austin. Being the only local employee, I imagine sharing the space with independent/satellite WordPress professionals. Are you interested?

Candidates should be earning all or part of their income working on WordPress (developing, designing, or servicing) and be able to defend their choice of editor. Benefits may include collaboration, networking, social opportunities, fortune, fame, romance, and French pressed Ruta Maya coffee.

We don’t have any locations in mind yet. Please include your home zip code for geographical tabulation. If you can recommend a cool location with flexible space, good bandwidth, and no long-term commitments, please do.

Attachment Stats

WordPress 2.5 with its new uploader results in a lot more post attachments. Until now, the Stats plugin didn’t keep track of which attachments were viewed. Stats 1.2.1 contains a minor change that lets us show you which of your attachments are most popular.

Remember, attachment views are only counted if you link to the attachment’s Post URL when inserting from the uploader.

Cache with Versioned Tags

A couple of years ago I had an idea: a tagged cache with versioned tags. It works just like an associative array but with an added hidden layer in the cache interface that salts the key with any tags provided and their version numbers. This facilitates classified mass extinctions by the simple act of incrementing a tag version. The entire cache or any tagged subset can be “flushed” in this way. I call it extinction because although the items are dead (irretrievable by a normal read operation) they are not removed and may still be exhumed—Jurassic Park-style—until an expiration or eviction takes them out. depends largely on memcached clusters so I modified the WordPress memcached client to operate with versioned tags and it did work as planned. It is not operating today because the core changes needed to make the benefits of the new cache outweigh the performance cost never happened. WordPress does use a “cache group” concept and we have used it successfully to create different cache scopes but it does not use a multi-tagging cache and it does not have a mechanism to flush groups of items. However, let’s keep WordPress in mind because the basic data structures of a blog are simple and thus good example fodder.

For one usage example, there could be a “posts” tag indicating that the cached resource derives from data in the posts table and you would increment the “posts” version every time you changed that table. When you ask the cache for items tagged “posts”, it will only find resources that were stored since the last time the “posts” tag was incremented by a flush. Any operation that changes the table’s structure or its contents would trigger a flush of the “posts” tag.

To add another degree of utility, you can apply any number of tags when storing a resource in the cache. Thus if you stored a rendered page, you could tag it with every table queried while generating that resource and rely on tag flushes instead of specifically deleting the item from the cache. For finer-grained control, tags could be specified for individual table rows (e.g. “posts:142”) but the detrimental effects of increased version array size would have to be outweighed by the benefits of excluding the rest of the “posts” items from an extinction.

The central idea behind this scheme is that you should have multiple paths to remove a key or set of keys without knowing which keys are stored in the cache. If your caching substrate has a good eviction scheme this works. If you can reliably determine the generating factors (tables queried, etc.) for every cached resource, you can forget about cache expirations and let the tags do all the work.

There is also the concept of “tag scope”. It is not universally applicable but in WordPress MU, where each blog has its own tables, each blog must also have its own array of tag versions so that extinctions are not unnecessarily applied out of context. There are also certain global keys that never vary between blogs. By specifying the scope of some tags as local to the blog and others as global, and storing global and local versions separately, we can specify whether to increment a tag version on one or all blogs.

Each blog would have a local array version that can be incremented to flush all tags in the local scope, making it easy to clear out one blog’s poisoned cache. A global array version allows us to trigger a universal cache extinction. With many dedicated memcached servers, this would be a neat time-saver.

Metaversions are the cherry on top: a global “metaversion” for any local tag can be stored in local as well as global version arrays. The local metaversion is checked against the global metaversion and if they differ, the local version and the local metaversion are incremented. Thus when the “posts” tag metaversion is incremented in the global array, each local “posts” version will be incremented. This way, we can trigger mass extinctions of any tag across all scopes.

My experimental client did all of this but it increased page generation times by 5-10%. I am still sure that if the software made smart use of cache tags we could have seen great benefits. Unfortunately, with a package as large as WordPress, the time to improve cache utilization would be many times longer than the two weekends I spent hacking the cache client.