Fast MySQL Range Queries on MaxMind GeoIP Tables

By Andy

A few weeks ago I read Jeremy Cole’s post on querying MaxMind GeoIP tables but I didn’t know what all that geometric magic was about so I dropped a comment about how we do it here on WordPress.com. (Actually, Nikolay beat me to it.) Jeremy ran some benchmarks and added them to his post. He discovered that my query performed favorably.

Today I saw an article referencing that comment and I wished I had published it here, so here it goes. There is a bonus at the end to make it worth your while if you witnessed the original discussion.

The basic problem is this: you have a MySQL table with columns that define the upper and lower bounds of mutually exclusive integer ranges and you need the row for which a given integer fits within the range and you need it fast.

The basic solution is this: you create an index on the upper bound column and find the first row for which that value is greater than or equal to the given value.

The logic is this: MySQL scans the integer index in ascending order. Every range below the matching range will have an upper bound less than the given value. The first range with an upper bound not less than the given value will include that value if the ranges are contiguous.

Assuming contiguous ranges (no possibility of falling between ranges) this query will find the correct row very quickly:

SELECT * FROM ip2loc WHERE ip_to >= 123456789 LIMIT 1

The MySQL server can find the row with an index scan, a sufficiently fast operation. I can’t think of a faster way to get the row (except maybe reversing the scan when the number is known to be in the upper half of the entire range).

The bonus is this: because the time to scan the index is related to the length of the index, you should keep the index as small as possible. Nikolay found that our GeoIP table had gaps between some ranges and decided to rectify this condition by filling in the gaps with “no country” rows, ensuring that the query would return “no country” instead of a wrong country. I would advise against doing that because it lengthens the index and adds precious query time. Instead, check that the found range’s lower bound is less than or equal to the given value after you have retrieved the row.

Tags: ,

3 Responses to “Fast MySQL Range Queries on MaxMind GeoIP Tables”

  1. _ck_ Says:

    I’ve actually used this technique for a couple years now after realizing not only does it speed up the search but reduces the db table to just over 1mb which can be cached far better (and dropping the unneeded columns)

    However I discovered it’s a little more accurate and gives better “missing” results if you do it backwards using the ENDING column and descend – searching backwards essentially. MySQL does it just as fast, and if not found, the next lower result is better.

    I’ve had to manually patch the maxmind ranges about two dozen times now. The free db has several holes and inaccuracies, especially with ISPs like AOL. It also lists EU for several spots that should be more country specific. We should share results when we find holes and make a group project for patches?

  2. Tim Says:

    We also refer to the FAQs in another company for some useful insights about the steps required to speed up the query performance.

    http://www.ip2location.com/faqs-ip-country.aspx

  3. Alex Says:

    I love it when I’m googling up a problem and the answer turns up on a co-worker’s blog. Even better when it means the code I need is already in our repo.

Leave a Reply