WordPress Search plugin

My Google Summer of Code 2009 student, Justin Shreve, has done an excellent job creating a new search API for WordPress. We hope this API will be integrated into the WordPress core because it would simplify replacing the core search functionality and encourage developers to create many more options for searching blogs.

Justin’s Search plugin is actually a package of three plugins. The first plugin installs the API that lets other plugins do the searching. The other two plugins use the API to provide search systems that we think will please most users who are dissatisfied with the built-in WordPress search results: MySQL Fulltext and Google Custom Search. (The Google plugin requires a Google account.) For the search-savvy, Justin also wrote a Sphinx-based plugin, Sphinx Search. This last one involves installing additional software on the server.

I’m running the Fulltext plugin on my personal blog so you can try it. Enter a search in the sidebar. On the search results page you can refine your search by specifying whether to search posts, pages, and comments. You can also sort the results by relevance, date, or alphabet. The Advanced Search link leads to a form where you can specify author, categories, tags, and date range.

Self-hosted WordPress users can install Search. (It is not available for WordPress.com… yet.) After activating the main Search plugin you must also activate one of the other plugins: MySQL Fulltext, Google Custom Search, or Sphinx Search. We are anxious to know what you think of it. Justin plans to continue to improve the search system so he will need lots of user feedback.

Help test Stats 1.5 beta

At first we thought it was a good idea to use iframes to display reports in the Stats plugin. We’ve seen since then a lot of problems with browsers and cookies. To help resolve these issues, and in anticipation of future features, I am updating the plugin and the WordPress.com stats reporting system to remove the iframes. I just posted 1.5 beta 1. If you host your own WordPress 2.7+ blog and you use the Stats plugin, why not contribute to its development by installing this testing version? Anyone can download the beta but I don’t recommend it unless you are able to cope with potentially unstable software.

  • What are the risks of using this beta?
    You won’t lose any stats. If something goes horribly wrong it’s probably a bad download; just reinstall the latest version of Stats.
  • How does it work?
    The plugin connects to WordPress.com to get the stats reports when you request them. It uses the API key to authenticate.
  • Aside from fixing cookie problems, how is this better?
    Now it’s possible for anyone who can publish posts on your blog to see blog stats. They don’t have to be logged into a WordPress.com account. They only need the publish_posts capability (Author role) to view stats reports.
  • Where did the dropdown blog switcher go?
    Because the plugin uses a single API key to authenticate, the service doesn’t know whether the visitor is the owner of that key or some other user. So it doesn’t make much sense to show the list of blogs belonging to the API key owner. You can still use the switcher if you view your stats on any WordPress.com dashboard.
  • Where did the Stats Access panel go?
    This is also related to single API key authentication. Maybe in future we will bring administrative access back to the plugin. But until then, we have left the Stats Access panel intact on WordPress.com dashboards. You might want to bookmark dashboard.wordpress.com if you need these features on a regular basis.
  • Will this be a required upgrade?
    You mean will older version of stats be broken? Not by 1.5. Later versions may break compatibility but for now you can keep using earlier versions of Stats if you like.
  • What if I install this and still see iframes?
    This happens because your server is unable to connect to WordPress.com. I set it up to use SSL (https) in the hopes that most hosts support this. If yours does not work, I’d like to hear from you and do some testing on your host.

Batcache for WordPress

[I meant to publicize this after a period of quiet testing and feedback but the watchdogs at WLTC upended the kitten bag and forced my hand. Batcache comes with all the usual disclaimers. If you try it on a production server expect the moon to fall on your head.]

People say WordPress can’t perform under pressure. The way most people set it up, that’s true. For those who host their blog for $7.99 a month (do they also run Vista on an 8086?) the best bet is to serve static pages rather than dynamic pages. Donncha’s WP-Super-Cache does that brilliantly. I’ve seen it raise a server’s capacity for blog traffic by one hundred times or more. It’s a cheapskate’s dream.

WP-Super-Cache is good for anyone with a single web server with a writable wp-content/cache directory. To them, the majority, I say use WP-Super-Cache. What about enterprises with multiple servers that don’t share disk space? If you can’t or won’t use file-based caching, I have something for you. It’s based on what WordPress.com uses. It’s Batcache.

Batcache will protect you

Batcache implements a very simplistic caching model that shields your database and web servers from traffic spikes: after a document has been requested X times in Y seconds, the document is cached for Z seconds and all new users are served the cached copy.

New users are defined as anybody who hasn’t interacted with your domain—once they’ve left a comment or logged in, their cookies will ensure they get fresh pages. People arriving from Digg won’t notice that the comments are a minute or two behind but they’ll appreciate your site being up.

You don’t need PHP skills to install Batcache but you do have to get Memcached working first. That can be easy or hard. We use Memcached because it’s awesome. Once you know how to install it you can create the same kind of distributed, persistent cache that underpin web giants like WordPress.com and Facebook.

What Batcache does

The first thing Batcache does is decide whether the visitor is eligible to receive cached documents. If their cookies don’t show evidence of previous interaction on that domain they are eligible. Next it decides whether the request is eligible for caching. For example, Batcache won’t interfere when a comment is being posted.

If the visitor and the request are eligible, Batcache enters its traffic metering routine. By default it looks for URLs that receive more than two hits from unrecognized users in two minutes. When a URL’s traffic crosses that threshold, Batcache caches the document for five minutes. You can configure these numbers any way you like, or turn off traffic metering and send documents right to the cache.

Once a document has been cached, it is served to eligible visitors until it expires. This is one place where Batcache is different. Most other caches delete cached documents as soon as the underlying data changes. Batcache doesn’t care if it’s serving old data because “old” is relative (and configurable).

What Batcache doesn’t do

It doesn’t guarantee a current document. I repeat this because reliable cache invalidation is a typical feature that was purposefully omitted from Batcache. There is a routine in the included plugin that tries to trigger regeneration of updated and commented posts but in some situations a document will still live in the cache until it expires. This routine will be improved over time but it is only an afterthought.

Batcache doesn’t automatically know the difference between document variants. Variants exist when two requests for the same URL can yield two different documents. Common examples are user agent-dependent variants formatted for mobile devices and referrer-dependent variants with Google search terms highlighted. In these cases you MUST take extra steps to inform Batcache about variants to avoid serving a variant to the wrong audience. The source code includes examples of how to turn off caching of uncommon variants (search term highlighting) or cache common variants separately (mobile versions).

Where Batcache is going

I want to make Batcache easier to configure by adding a configuration page and storing the main settings in memcached as well as the database. This way you won’t have to deploy a code change to update the configuration. However, conditional configurations (e.g. “never cache URLs matching some pattern”) and variant detection will probably always live in PHP.

I want to have Batcache serve correct headers more reliably. On some servers it can detect the headers that were sent with a newly generated page and serve them again from the cache. But when that doesn’t work you will have to take extra steps to serve certain headers. For example you must specify the Content-Encoding header in the Batcache configuration or add it to php.ini. I want this sort of thing to be done automatically for all server setups.

I know that Batcache is not ideal for most WordPress installations. It saves us a lot of headaches and expense at WordPress.com, so maybe it can help other large installations. If you try it, I want to hear from you whether it worked and how well. I am also keen to see what new configurations and modifications you use.

As always, this software is provided without claims or warrantees. It’s so experimental that it doesn’t even have a version number! Until the project grows to need its own blog, keep an eye on the Trac browser for updates.

Automattic Stats for self-hosted WordPress

The new Automattic Stats plugin is available for download. It lets self-hosted WordPress bloggers use the exact same traffic metrics system we provide to WordPress.com users. It tracks post and page views, referrers, search terms, and clicks on your external links. It takes moments to install if you already have a WordPress blog and a WordPress.com API key. And it’s totally free.

Although the code is almost exclusively my work, I must give thanks for Matt‘s guidance, Barry‘s systems wrangling, and Rudy’s barbecue, each of which were indispensable.

The rest of this post will cover technical details of the system, how it works and why it’s cool. If you have a question I didn’t answer, leave a comment and I’ll do my best to answer.

How does it work?

The plugin adds a tiny image to your blog () and that image is hosted on our servers. Every time your blog is viewed by a browser with javascript enabled, the browser downloads that image and we see a new line in our server logs. We then process the server logs and insert the data into a big MySQL database that we use to generate the lists and charts on your stats page.

There’s a little more to it than that. The plugin adds the post ID and referrer to the image URL so we know what the visitor is looking at and where they came from. We examine the referrer and if it looks like a search engine, we sift out the search terms and save those instead. Our servers also communicate with your blog from time to time, such as when you update the title of a post.

What makes it fast?

When you run your own stats system, your blog server has to do a lot of extra work to track each visit. We take that load off your server to keep it snappy.

By serving the javascript from stats.wordpress.com, we take advantage of the browser cache so that no matter how many blogs are visited, the script is only loaded once per week.

Clicks are reported asynchronously. Rather than the more common method of mangling URLs and forcing the visitor to wait during a redirect, the click stats are tracked using elements of AJAX. Your hrefs are safe and your visitors experience no delays.

The tracking gif loads fast because WordPress.com infrastructure just plain rocks.

What’s with the smiley face?

When we started developing stats for WordPress.com in 2005, Matt thought it would be cute. That’s his artwork.

No doubt, people will want to hide the smiley face. There are wrong ways to do this. Basically, anything that causes the image not to be loaded by the browser will break your stats.

Applying “display:none” to the image will break your stats. Don’t do it. If you want to hide the smiley face, add this to your stylesheet:

img#wpstats{width:0px;height:0px;overflow:hidden}

Why do my links point to WordPress.com?

All stats reports are rendered by our servers. We designed it this way for a lot of reasons. It’s faster this way because your server doesn’t have to connect to our server every time you look at your stats. It’s also better because we can update the reporting UI without forcing you to upgrade your plugin.

How much traffic can you handle?

The stats hardware is currently handling millions of views every day and we’re nowhere near capacity. We built this system with growth in mind. The software is ready to run on as many servers as we allocate for the purpose. Growing pains are inevitable but if we’ve done our job, you will never feel them.

Can I install this on my non-WordPress sites?

The short answer is that the system only supports WordPress blogs.

The long answer is that anyone with a thorough understanding of WordPress and XMLRPC could clone the plugin to work with other blogging platforms. I can’t prevent it, I won’t discourage it, I do expect it, and I don’t even mind it. There are pitfalls, however, and I do not plan to document the requirements. Here be dragons.

Anyone found abusing the system, causing undue loads on the servers, or inflicting headaches on me or Barry or anyone else, will be subject to having their API Key revoked and their name written in giant, fiery letters across the night sky to be cursed by all who see it. Please don’t abuse this free service.

Why does the date change before/after midnight?

To keep things fast and consistent, we are ignoring time zones and keeping all stats in UTC.

Minimalist Plugin: Cap Comments

I whipped this up for a particularly busy blog. If anybody wants to use it or flesh it out with way too many options and UI enhancements, feel free.

<?php
/*
Plugin Name: Cap Comments
Plugin Author: Andy Skelton
Description: Turn off comments at a pre-set comment count
License: GPL
*/

define('CC_COMMENT_LIMIT', 100);
define('CC_CLOSE_COMMENTS', true);
define('CC_CLOSE_PINGS', true);

function cc_comments_off($post) {
	if ( $post->comment_status != 'closed' && CC_CLOSE_COMMENTS ) {
		$post->comment_status = 'closed';
		$update = true;
	}

	if ( $post->ping_status != 'closed' && CC_CLOSE_PINGS ) {
		$post->ping_status = 'closed';
		$update = true;
	}

	if ( $update )
		return wp_update_post($post);
}

function cc_comment_post($comment_ID) {
	$comment = get_comment($comment_ID);
	$post = get_post($comment->comment_post_ID);
	if ( $post->comment_count >= CC_COMMENT_LIMIT )
		cc_comments_off($post);
}
add_action('comment_post', 'cc_comment_post');

?>

No support will be offered. No warranty will be honored. No fool will be suffered.

Buddy Cards from 30 Boxes adorn comments

The good folks at 30 Boxes have gone and created one of the coolest WordPress plugins I’ve seen in a long while. Narendra gave me a sneak peek of it at WordCamp 2006 and I was very excited. Now it’s available to all 30 Boxes users and you can see a screenshot of my Buddy Card on the feature page.

Buddy Cards are a Web 2.0 social calendar site’s solution to the problem of truncated identity in blog comments. Where your comments would have a mere link to the URL you provide, they can now link to a centralized profile page with a rich set of features including buddy pics (a.k.a. gravatars) and automatic buddy harvesting from other places like Flickr.

My description surely fails. Buddy Cards are a visual and interactive thing. Head over to a recent post on my personal blog to see Buddy Cards in action.

Installing the Buddy Cards plugin on your WordPress blog is easy: get the plugin; unzip and upload it to your plugins directory; activate. (This plugin is not available for WordPress.com blogs at this time.)

I am not involved with the development of this plugin; I just think it’s cool. If you have bug reports or support questions, please bring them to the 30 Boxes Forums.