Ruby’s garbage collector and caching

My current job at Delta Projects is great in terms of working with high volumes. The fact that we serve around 50+ million ads a day creates the need for different approaches in terms of storing and retrieving data. Our ad servers for example are completely independent from the backend. Business logic and models are exported as code in our backend systems and pushed to our array of ad servers. Content itself is put into a git repository and is updated regularly by the ad servers. The backend system is unaware of which ad servers there are. Raw data is pushed back from the ad servers and processed later.

Because we want to serve ads as quickly as possible (and creating the least possible delay on the page that uses ads that are served by us), we cache ads on the ad servers. Some weeks ago however, we noticed that CPU usage was increasing at a very high rate over time. Memory usage was also increasing, but not as quickly as the CPU usage. The memory increase was caused by the fact that we stored ad meta data in memory inside our application. Since we run around 10 Unicorn processes, an ad would be in memory 10 times, but that wasn't such a big deal, since we run with a lot of RAM. The CPU consumption was more worrying. After some investigation we found out that when all ads are loaded in memory, we had a lot of string objects in memory (around 1.2 million) that were retained, since a global cache array would keep a reference to them. In other words, the objects are never garbage collected. But, since ruby 1.8 has a non-generational GC, all objects are inspected by the GC and having 10 processes performing a GC run over 1.2 million objects every now and then, caused a lot of CPU load.

So, we needed a better way of caching. Memcached was our first idea, but having yet another process on which we depend didn't feel like such a good idea. Since it's a local cache, my colleague Kalle came up with the idea of storing our cache data on tmpfs. Our application takes care of filling the cache (since it serializes ad meta data) and reading from it. Invalidating cache items is now done through a git hook, that simply removes a file that has been updated or deleted from tmpfs.

This al lead to a tenth of the memory consumption and  a lot less and constant cpu usage.

0 Responses to “Ruby’s garbage collector and caching”


  1. No Comments

Leave a Reply