I was out sick yesterday, but I managed to spend a little time working on “SB”. I’ve been playing with Ruby on Rails for a couple months, but today I really began to grok where the limits are with respect to Ruby, Rails, and performance.
Today, I focused on one particular action of one particular controller. This is actually the heart of the application in many ways and is also very complex. It involves fetching a hierarchical dataset from the database where each level of the hierarchy is represented by a different model (not a “nested set” in other words). For the sake of keeping things clean while prototyping the database was put together in a highly normalized manner. Unfortunately this meant I had to fetch the data bottom-up making it hard to constrain the data being fetched because the information required to decide on constraining it is in the next level up.
Once the data’s in memory, I have to process it bottom-up, then make a second bottom-up pass, then one last top-down pass.
Anyway, the first version was remarkably clean and worked quite nicely. I figured it wouldn’t scale well though, so I decided to do some “what if” tests. The first of which became the beginning of my personal rabbit-hole.
“What if each of my six test users have 50,000 records each and the second-fifth users are all friends with the first user?”
End result: 18 minutes to resolve the request, with about 3 minutes spent in the database. On top of that, this resulted in 54MB of data in development.log. For one request. ACK!
It was immediately apparent that there was no identity mapping or query caching going on. Even though the database access was only three minutes of the execution time I decided to address this side first because writing 54MB of log data every time I hit the URL made debugging anything else somewhat annoying.
Looking around revealed some efforts to add caching to ActiveRecord in Rails 2.0, along with the revelation that identity mapping had been considered and rejected. Wow. That kinda floored me. Six years ago I built an ORM system for PHP and then for Java and the very first things I did for each was to implement identity mapping and query caching. It just seemed so obvious and necessary. Of course, I was using InnoDB exclusively and utilizing transactions as a matter of course. Now, if you didn’t know, InnoDB by default runs in “repeatable-read” isolation – which as isolation levels go is second from the top in terms of strictness guarantees, and usually considered a really good way to kill performance but somehow they make it fly. The key characteristic of this mode is that within the scope of a transaction, you will only ever see one version of a row. You can fetch the same row twenty different ways from twenty different queries and it will always be the same as it was the very first time you saw it. This makes cache invalidation an easy problem: You keep a per-transaction cache, and flush the whole thing at the end. Simple. Apparently though things are a bit more complex when you’re not using transaction (or perhaps using transaction at a lower isolation level such as read-committed or, god forbid, read-uncommitted). Nonetheless I kept looking, and found this. At first glance, it doesn’t appear to actually read from the cache ever but I figured perhaps I was just missing some of the magic.
Ultimately, I couldn’t get it working at all. So I gave up, and just moved on to query caching. I found SqlCache which worked really nicely. It shaved a couple minutes off the execution time, and cut the per-request log data size from 54MB to 22KB. Much better.
Next up, the Ruby code itself. This involved going over each of the transformations piece by piece and seeing how they performed, then looking for other ways to do the same thing and seeing how they worked. The first step was something like the following:
This was taking 7-10 seconds to run. So I tried the following:
This was much more consistent, and a bit faster. ~5.8 seconds, with only one anomalous score of 8 seconds. Better, but not great. I tried a few iterations that varied a bit in their performance but ultimately wound up approaching things differently, by representing the intermediate data in a different way. It wound up being a bit more complicated than I want to get into right now but ultimately it made things quite a bit faster not just through that step, but through all of the remaining steps.
It did involve one notable “cheat” though. The records at the bottom of the hierarchy must be sorted. Whether they are sorted ascending or descending is a function of the parent record. My solution let me trim the database access substantially by being able to sort the whole list, and clamp it to the top/bottom N records without having to send a bunch of stuff to the web server to be sorted in-memory. Basically, I’m storing the numeric value as NEGATIVE in the DB and have the model class hide the abstraction. I can do this because by definition the numeric value logically can never be negative.
The end result though was pretty huge: A roughly similar amount of code that is actually easier to read and takes only 17 seconds to resolve my test query. Not too shabby, but certainly not perfect. The biggest learning for me was that I need to avoid the problem: Make it impossible for users to have 50,000 records. Strictly speaking, they’ll never need more than N, where N is actually a quite small number. So when they’re adding new ones, I just need to ditch the old ones.
Next time I poke around, I’ll be answering the question “What if a user has 9,000 friends?”, and I suspect the Scaling Twitter slideshow recommendation about keeping a list of friend IDs in a text field in the user table will be relevant there.
Also, be sure to check your indexes. As in, make sure you have appropriate indexes for your queries. A neat feature for Rails might be to log SQL statements (in dev mode) in green/yellow/red based on how long they’re taking. Configurable cutoffs would be nice but 99% of cases can be simplified down to:
- If the execution time is less than or equal to 0.001 seconds, it’s green.
- If the execution time is less than or equal to 0.01 seconds, it’s yellow.
- Otherwise, it’s red.
Anyway, time for sleep.