Many moons ago, I tried to use the GeoIP database, via MySQL – and discovered that MySQL was doing the wrong thing with the queries. Naturally, I found some tricks to make it faster.
Of course, MaxMind has long provided their own custom data file and code for reading it efficiently, but for a variety of reasons I did not want to use it. The problem was that the obvious query for the table turns out to be rather horrendously inefficient – on the order of a couple seconds per lookup. Oy!
As it turns out, there’s a couple ways to fool MySQL’s query planner into doing a much more efficient lookup. I discovered today that even in MySQL 5.6, these tricks are still needed. Despite all the lovely improvements in query planning, MySQL still wears its pants squarely on its head when it comes to the type of query you want to do here.
For those of you who haven’t used MaxMind’s GeoIP database, the schema looks like this:
|
The obvious query you’d want to do is going to look like this:
|
Unfortunately, even on my Haswell-based laptop, with an SSD, this winds up
taking about 1.5 seconds
to run.
Being Smart on Behalf of MySQL
As I said above, there’s a couple ways to trick to the query planner. But
first, let’s look at the EXPLAIN
output for the above query:
|
There are some 1.7 million rows in the table, and MySQL believes it will need
to look at about half of them. The Using index condition
is related to one
of the new optimizations in MySQL 5.6: Index Condition Pushdown.
Judging by the performance of this query it seems like MySQL is, in fact, looking at all those rows. Ugh.
So, how can we make this better?
Trick 1: Coercing MySQL via Query Section
Consider this query and its corresponding EXPLAIN
plan:
|
|
So, the only difference here is that now we’re using the ip_end
index,
instead of the ip_start
index. Interestingly though, this query is so fast
that MySQL just reports the execution time as 0.00 seconds
! MySQL is clearly
being much smarter and is obviously able to early-out quickly instead of
actually looking at all those rows – even though it doesn’t realize it will
be able to do so when preparing the execution plan.
Running this query 100,000 times on my latop takes about 12.6 seconds
– or
about 0.000126 seconds per query
. That’s excellent! So, we’re done, right?
Well… No, not quite.
While this query will always return the same results as the original, it just feels a bit ugly – and, one has to wonder if perhaps we can’t do a bit better in terms of performance.
Trick 2: FORCE INDEX
, To The Rescue!
Consider:
|
|
So, no difference in query plan, but the use of FORCE INDEX
makes it explicit
and obvious that we are coercing the query planner behavior, which certainly
helps with the maintainability of things. But what about the performance?
Running this query 100,000 times on my latop takes about 9 seconds
– or
about 0.00009 seconds per query
. Fantastic! We’ve actually improved the
already very fast query above by about 30%
, while improving the clarity of
the code!
What About Different Indexes?
In this case, it turns out that a different choice of indexes doesn’t help.
Strictly speaking, a UNIQUE INDEX
on (ip_start, ip_end)
should give MySQL
more opportunity to optimize its execution strategy, and a covering index
(adding location_id
as the last column of the index) ought to reduce the
number of random I/Os involved by letting MySQL return data directly from the
index without going back to the actual table to fetch the row in order to
obtain the location_id
value.
Unfortunately, it seems the increased physical size of the indexes outweighs the gains.
Wrapping Up
So what have we learned? Just the obvious:
- MySQL’s query planner is still not terribly bright.
- What MySQL thinks it will need to do doesn’t always wind up being what it must do.
- Benchmark, benchmark, benchmark.