Automatically open remote files in local emacs

I prefer to edit text locally in emacs. Most of the files I edit reside on remote servers so I use TRAMP to open remote files locally. What kills me is using emacs remotely via terminal when a shell command invokes $EDITOR (e.g. svn commit). With my new setup, the default editor on the remote machine is my local emacs. I love this.

First, I configure SSH to forward a remote port to my machine. This means that whenever the remote machine tries to connect to itself on that port (localhost:9999) it actually connects to port 9999 on my local OSX machine. I like to keep these details in my ssh_config file (my local ~/.ssh/config):

Host des
User wpdev
ControlMaster auto
ControlPath ~/.ssh/des.sock
RemoteForward 9999 localhost:9999

(I use abbreviated hostnames to save keystrokes. There is a matching entry in my hosts file.)

Second, I configure my local emacs to start the server and copy the server file to the remote host. The server file tells emacsclient how to connect to the server. Adding this to emacs-startup-hook adds a few seconds to my emacs startup time but I rarely start emacs more than once in a day so that’s fine. This is in my local ~/.emacs:

(setq server-use-tcp t
      server-port    9999)
(defun server-start-and-copy ()
  (copy-file "~/.emacs.d/server/server" "/des:.emacs.d/server/server" t))
(add-hook 'emacs-startup-hook 'server-start-and-copy)

Third, I create a bash script on the remote host which calls emacsclient with the necessary TRAMP path prefixed to its arguments. (If you try running emacsclient remotely without the TRAMP path you’ll get an empty emacs buffer.) Here is the script I put in remote ~/bin/ec and then chmod +x:


for p in "$@"; do
    if [ "$p" == "-n" ]; then
        params+=( "$p" )
    elif [ "${p:0:1}" == "+" ]; then
        params+=( "$p" )
        params+=( "/ssh:des:"$(readlink -f $p) )
emacsclient "${params[@]}"

Finally, I set up $EDITOR on the remote machine. I also add my bin directory to $PATH so I can invoke ec. This is in my remote ~/.bashrc:

export PATH=~/bin:$PATH
export EDITOR=~/bin/ec

That’s it! More elegant solutions are possible but my new tool is sufficiently sharp and I have work to do!

Batcache for WordPress

[I meant to publicize this after a period of quiet testing and feedback but the watchdogs at WLTC upended the kitten bag and forced my hand. Batcache comes with all the usual disclaimers. If you try it on a production server expect the moon to fall on your head.]

People say WordPress can’t perform under pressure. The way most people set it up, that’s true. For those who host their blog for $7.99 a month (do they also run Vista on an 8086?) the best bet is to serve static pages rather than dynamic pages. Donncha’s WP-Super-Cache does that brilliantly. I’ve seen it raise a server’s capacity for blog traffic by one hundred times or more. It’s a cheapskate’s dream.

WP-Super-Cache is good for anyone with a single web server with a writable wp-content/cache directory. To them, the majority, I say use WP-Super-Cache. What about enterprises with multiple servers that don’t share disk space? If you can’t or won’t use file-based caching, I have something for you. It’s based on what uses. It’s Batcache.

Batcache will protect you

Batcache implements a very simplistic caching model that shields your database and web servers from traffic spikes: after a document has been requested X times in Y seconds, the document is cached for Z seconds and all new users are served the cached copy.

New users are defined as anybody who hasn’t interacted with your domain—once they’ve left a comment or logged in, their cookies will ensure they get fresh pages. People arriving from Digg won’t notice that the comments are a minute or two behind but they’ll appreciate your site being up.

You don’t need PHP skills to install Batcache but you do have to get Memcached working first. That can be easy or hard. We use Memcached because it’s awesome. Once you know how to install it you can create the same kind of distributed, persistent cache that underpin web giants like and Facebook.

What Batcache does

The first thing Batcache does is decide whether the visitor is eligible to receive cached documents. If their cookies don’t show evidence of previous interaction on that domain they are eligible. Next it decides whether the request is eligible for caching. For example, Batcache won’t interfere when a comment is being posted.

If the visitor and the request are eligible, Batcache enters its traffic metering routine. By default it looks for URLs that receive more than two hits from unrecognized users in two minutes. When a URL’s traffic crosses that threshold, Batcache caches the document for five minutes. You can configure these numbers any way you like, or turn off traffic metering and send documents right to the cache.

Once a document has been cached, it is served to eligible visitors until it expires. This is one place where Batcache is different. Most other caches delete cached documents as soon as the underlying data changes. Batcache doesn’t care if it’s serving old data because “old” is relative (and configurable).

What Batcache doesn’t do

It doesn’t guarantee a current document. I repeat this because reliable cache invalidation is a typical feature that was purposefully omitted from Batcache. There is a routine in the included plugin that tries to trigger regeneration of updated and commented posts but in some situations a document will still live in the cache until it expires. This routine will be improved over time but it is only an afterthought.

Batcache doesn’t automatically know the difference between document variants. Variants exist when two requests for the same URL can yield two different documents. Common examples are user agent-dependent variants formatted for mobile devices and referrer-dependent variants with Google search terms highlighted. In these cases you MUST take extra steps to inform Batcache about variants to avoid serving a variant to the wrong audience. The source code includes examples of how to turn off caching of uncommon variants (search term highlighting) or cache common variants separately (mobile versions).

Where Batcache is going

I want to make Batcache easier to configure by adding a configuration page and storing the main settings in memcached as well as the database. This way you won’t have to deploy a code change to update the configuration. However, conditional configurations (e.g. “never cache URLs matching some pattern”) and variant detection will probably always live in PHP.

I want to have Batcache serve correct headers more reliably. On some servers it can detect the headers that were sent with a newly generated page and serve them again from the cache. But when that doesn’t work you will have to take extra steps to serve certain headers. For example you must specify the Content-Encoding header in the Batcache configuration or add it to php.ini. I want this sort of thing to be done automatically for all server setups.

I know that Batcache is not ideal for most WordPress installations. It saves us a lot of headaches and expense at, so maybe it can help other large installations. If you try it, I want to hear from you whether it worked and how well. I am also keen to see what new configurations and modifications you use.

As always, this software is provided without claims or warrantees. It’s so experimental that it doesn’t even have a version number! Until the project grows to need its own blog, keep an eye on the Trac browser for updates.

Fast MySQL Range Queries on MaxMind GeoIP Tables

A few weeks ago I read Jeremy Cole’s post on querying MaxMind GeoIP tables but I didn’t know what all that geometric magic was about so I dropped a comment about how we do it here on (Actually, Nikolay beat me to it.) Jeremy ran some benchmarks and added them to his post. He discovered that my query performed favorably.

Today I saw an article referencing that comment and I wished I had published it here, so here it goes. There is a bonus at the end to make it worth your while if you witnessed the original discussion.

The basic problem is this: you have a MySQL table with columns that define the upper and lower bounds of mutually exclusive integer ranges and you need the row for which a given integer fits within the range and you need it fast.

The basic solution is this: you create an index on the upper bound column and find the first row for which that value is greater than or equal to the given value.

The logic is this: MySQL scans the integer index in ascending order. Every range below the matching range will have an upper bound less than the given value. The first range with an upper bound not less than the given value will include that value if the ranges are contiguous.

Assuming contiguous ranges (no possibility of falling between ranges) this query will find the correct row very quickly:

SELECT * FROM ip2loc WHERE ip_to >= 123456789 LIMIT 1

The MySQL server can find the row with an index scan, a sufficiently fast operation. I can’t think of a faster way to get the row (except maybe reversing the scan when the number is known to be in the upper half of the entire range).

The bonus is this: because the time to scan the index is related to the length of the index, you should keep the index as small as possible. Nikolay found that our GeoIP table had gaps between some ranges and decided to rectify this condition by filling in the gaps with “no country” rows, ensuring that the query would return “no country” instead of a wrong country. I would advise against doing that because it lengthens the index and adds precious query time. Instead, check that the found range’s lower bound is less than or equal to the given value after you have retrieved the row.

Free tech gear

Here’s a clever WordPress site: Take My Tech is giving away used gadgets by selecting a random commenter when the number of comments on a gadget reaches one hundred. The site will pay for shipping (and hopefully make the owner a few bucks) with the revenues from advertisements. It is a bit of a gamble for the owner—craigslist or ebay would be the more obvious choices—but I find it clever all the same.

The lottery process, including checking for duplicate comment emails/IP’s, could be automated with a plugin. I suggested building on top of my own Cap Comments plugin.