Over the last couple of weeks, my colleagues and me at Delta Projects have been working on some extra functionality for Facebook's Scribe server. Scribe is a server that logs messages. It can do this to multiple different "stores" and does failover from one store to the other if one fails. In our case, we wanted Scribe to log messages from our ad servers to HDFS and in case of an HDFS failure, write to local disk and replay those logs, once HDFS becomes available again.
After some investigation, we noticed that Scribe was very static in the way file paths were constructed and we couldn't fulfill our requirements; we want to have our log files structured in a directory structure like /year/month/day/hour/adserver.log.
Here is where open source software is great: you have the code, so you can modify it yourself. So we did. We spent a couple of hours implementing dynamic file paths and then committed the feature to our own github fork of scribe. We posted the patch to the Scribe mailing list and spoke about it on IRC with some guys of Facebook and Twitter, but not much happened.
Another requirement is that we store our data compressed on HDFS, but Scribe doesn't come with compression. There is a patch for LZO from Twitter, but because of license issues, this patch will never make it into the main branch of Scribe. Because of this, we decided that we would either use FastLZ or BZip2 for our purposes. But again, Scribe has no built-in support for compression.
Since we already poked around in the Scribe code, we decided to implement compression ourselves. The problem with Scribe however is that it basically untested. There are a couple of tests written in PHP, but the don't cover the whole application. Next to that, the Scribe source (written in C++) is a mess. It's one big hack. No neat design patterns, a lot of dead code everywhere, classes with way too much responsibility, etc. So, for our implementation of compression, we figured that it would be nice not to hack it in there (even the LZO patch from Twitter seemed like a quick hack), but do it correctly, so that we wouldn't only introduce new functionality, but also clean up a lot and refractor ugly parts of the code. We extracted and abstracted a lot of code and put it in the right place (using a lot of practices that Uncle Bob preaches in "Clean Code"). And as icing on the cake, we threw in tests. We introduced Google Tests, a C++ testing framework and wrote tests for every piece of code that we wrote.
Then the day came that our work was finished, so we committed our changes to our github branch and posted our patch to the scribe mailing list. The patch was around 200k (including the tests), which is big and apparently, it scared the guys at Facebook, because after that, nothing happened. The only response was that the patch was too big and we we're asked if we could cut it up in smaller pieces, so reviewing the code would be easier. Unfortunately, we couldn't do this, since so many interweaved concepts were pulled apart and put into different places. We would have to retrace our steps and create a patch for every one of them, with having a working system after every step.
The end result is that we have a lot of (imho really nice) code, that probably nobody is going to use. I even think that we're not going to use it either, since it's in a branch of Scribe that is only maintained by us. If it would be in the main branch of Facebook's Scribe, the code would be maintained by "the community", but keeping it only in our branch will cause a divergence of our branch in the future and we don't want to maintain a spin-off of Scribe for ages to come.
I think it's great that companies like Facebook and Twitter open source a lot of software, but when there is no large community outside of those companies, adding features might not be worth it. Scribe only seems to be used (on a large scale) by Twitter and Facebook and whatever suits them will make it to the main branch. If only it was covered by tests, then the maintainers could have quickly verified our work, but now, there is no reason for Facebook to merge in our patch, so there is no need for allocating resources to review our changes. And again, forking the whole project and maintaining it isn't an option for us. We'll move on to other technology..
0 Responses to “Is it worth contributing to open source software?”