Over the last week, at Delta Projects, I've been working on replacing MemcacheDB with Cassandra. We have run with MemcacheDB for a while, because we store data that we can't store client side and we need some way of storing this serverside. Since the data needs to be available really quickly and has to be persistent, MemcacheDB (a server that exposes more or less the same API as Memcache, but persists to disk) was put in place some time ago. This worked well for a while, but since we're expanding, the requirements changed for storing data server side. The initial requirements were pretty clear: we can get away with a key/value store, latency is important and the data should be persistent. However, since we're going to run with multiple data centers, the requirements have changed. The initial requirements are still valid, but when running over multiple locations, we would like to have all data available in all locations. Next to that, scaling MemcacheDB is hard. We ran with two MemcacheDB servers, where data was sharded, but the databases on both machines outgrew the physical memory in the servers, which caused a huge impact on performance. Since data is sharded and not replicated, you can't just pop in an extra box and lessen the load on the other servers. You have to rebalance manually. So we went on a quest for different server side storage solutions that would fulfill our requirements.
Long story short, we decided on Cassandra.
Cassandra is more than a key/value store. It's a distributed data store that uses Bigtable's ColumnFamily design, which will allow us to use our data differently than just storing and retrieving data by using keys; we can actually query the data. Anyway, the decision was made to move from MemcacheDB to Cassandra and we had to come up with a migration path. The path we envisioned would be:
- Keep reading and writing from and to MemcacheDB but also write to Cassandra.
- Stop writing to MemcacheDB and start reading from Cassandra first and have MemcacheDB as a fall-back if we can't read from Cassandra.
- In the mean time copy data from MemcacheDB. Keys that are not in Cassandra will be copied, the others will be skipped
- Disable reading from MemcacheDB.
This meant that we had to run with both Cassandra and MemcacheDB at the same time.
While investigating the use of Cassandra, we found out that the Ruby client for Cassandra doesn't deliver. Mainly because Cassandra uses Thrift and the Thrift client for Ruby isn't too good. However, Hector, a Java client for Cassandra, worked flawlessly. This, together with some other discoveries1 made us decide to move our application from MRI Ruby (the standard Ruby interpreter) to JRuby (a Ruby interpreter on the JVM). The nice thing about JRuby is that we were able to use best of both the Java and the Ruby world. Most Ruby code runs well in Ruby, but sometimes code has to be adjusted a little or sometimes it's better to use a JRuby version of a specific library; one that's more suited for running on the JVM.
So, we added Cassandra support to our application and moved from the standard Ruby memcache client to the JRuby version. The JRuby version is a small wrapper that exposes the same API as the Ruby version and wraps an existing Java Memcache client. However, after some testing, we found out that the way keys are distributed over the MemcacheDB servers is not the same in both implementations. Both implementations used different algorithms to determine which key should be placed on which server. The Java version provides the ability to choose between three different algorithms, while the Ruby version only provides one, but neither one is compatible with the others.
For getting our JRuby application to talk to our MemcacheDB servers, we had two options; patch the Java version so it behaved like the Ruby version, or just run with the Ruby version (JRuby is still Ruby, right?). Initially, we chose the latter option, which seemed to be the quickest fix. Just pop in the gem and you're done.
Right.
In the MRI Ruby version of our application, we used one MemcacheDB client per process (and then running with multiple processes), while in the JRuby version, we had to run with one client per Java thread. The standard Ruby version of the client is not very good with Java threads and apparently, TCP sockets are handled differently in Java then when using the MRI interpreter. Sockets would linger for a very long time and when the MemcacheDB server would be unresponsive for a while and we would try to reconnect, we would end up with a lot of established connections that were never going to be used, but also never closed. The JRuby version didn't seem to have all these problems and is (obviously) much better suited to run in the JVM; it uses nice socket pools and such.
Not to waste time, we started patching the JRuby client to mimic the behavior of it's Ruby counterpart, which worked out pretty well. We still have to test it a little bit more thoroughly, but the first tests look promising.
The last couple of days have been a head-breaking quest, but I think we've finally nailed it. We really need to move to Cassandra soon, since MemcacheDB is only going to give us a lot more headaches. At least I got to code some Java again and I learned quite a few things about MemcacheDB and how sharding actually works. I also learned that MemcacheDB is a bad choice. Implementing it is done fairly quickly (since you can just use a Memcache client), but it doesn't scale and apparently the sharding differs from one implementation to the other (what if you run multiple applications accessing MemcacheDB and all are written in different languages?). It's too bad that we spent a lot of time finding a solution for something that is going to be replaced very soon.
If you're writing a new application and need persistent cache or something other than a SQL database, check out alternatives like Cassandra or MongoDB.
1. We decided not to use Scribe for writing to HDFS, but rolled our own code using ActiveMQ, in-process.
Recent Comments