Archive for the 'ruby' Category

Ruby’s garbage collector and caching

My current job at Delta Projects is great in terms of working with high volumes. The fact that we serve around 50+ million ads a day creates the need for different approaches in terms of storing and retrieving data. Our ad servers for example are completely independent from the backend. Business logic and models are exported as code in our backend systems and pushed to our array of ad servers. Content itself is put into a git repository and is updated regularly by the ad servers. The backend system is unaware of which ad servers there are. Raw data is pushed back from the ad servers and processed later.

Because we want to serve ads as quickly as possible (and creating the least possible delay on the page that uses ads that are served by us), we cache ads on the ad servers. Some weeks ago however, we noticed that CPU usage was increasing at a very high rate over time. Memory usage was also increasing, but not as quickly as the CPU usage. The memory increase was caused by the fact that we stored ad meta data in memory inside our application. Since we run around 10 Unicorn processes, an ad would be in memory 10 times, but that wasn't such a big deal, since we run with a lot of RAM. The CPU consumption was more worrying. After some investigation we found out that when all ads are loaded in memory, we had a lot of string objects in memory (around 1.2 million) that were retained, since a global cache array would keep a reference to them. In other words, the objects are never garbage collected. But, since ruby 1.8 has a non-generational GC, all objects are inspected by the GC and having 10 processes performing a GC run over 1.2 million objects every now and then, caused a lot of CPU load.

So, we needed a better way of caching. Memcached was our first idea, but having yet another process on which we depend didn't feel like such a good idea. Since it's a local cache, my colleague Kalle came up with the idea of storing our cache data on tmpfs. Our application takes care of filling the cache (since it serializes ad meta data) and reading from it. Invalidating cache items is now done through a git hook, that simply removes a file that has been updated or deleted from tmpfs.

This al lead to a tenth of the memory consumption and  a lot less and constant cpu usage.

Ruby SOAP – the story continues..

As written before, I've been struggling with Ruby and SOAP. Apart from the fact that I really don't understand why the world likes to use SOAP, I've ran into a couple of issues I'd like to share for future generations:

  • soap4r and ActionWebService don't play nicely. The SOAP implementation that is included in the standard Ruby distribution isn't very nice at all (issues with validating WSDLs), so I tried using soap4r to make SOAP requests to remote services. Single scripts worked eventually, but when including this in a Rails project that also acts as a SOAP server (using AWS), things broke majorly. Not spending too much time, I decided to make SOAP requests from external scripts that I call with exec().
  • soap4r doesn't set the xsi:nil attribute to elements that allow this when the content is nil, but the element is a ComplexType. The solution here was to manually construct the SOAP elements (using SOAP::SOAPElement).
  • On my development machine (OSX with Ruby 1.8.6) things finally worked fine, but when deploying to a production environment (Linux with Ruby 1.8.7), things broke when calling "id" on a SOAP result object with the message "warning: Object#id will be deprecated; use Object#object_id" and instead of the id in the SOAP result, I got the object_id (which is an internal id for the object and utterly useless for my purposes). The solution here was to not call methods on the result object, but treating it as a Hash. So result.id becomes result['id'].

Rails and SOAP part II

As written in a previous last post, I've been busy with Rails and SOAP. I've been figuring out how to pass hashes to my remote calls instead of an argument list.

The default API requires something like:

api_method :get_my_thingy, :expects => [:id => :int, :name => :string], :returns => [Thingy.struct]

When calling this (e.g. with a ruby soap thingy) it looks something like:

soap.getMyThingy(2, "foo")

But I would like to do:

soap.getMyThingy(:id => 2, :name => "foo")

For this to happen, I introduced an OptionStruct:

module ActionWebService
 class OptionStruct < ActionWebService::Struct
 member :id, :int
 member :name, :string
 end
end

After this, the API should be like:

api_method :get_my_thingy, :expects => [OptionStruct], :returns => [Thingy.struct]

See my previous post for the Thingy.struct

Rails and SOAP

For a project, I'm currently implementing a SOAP API in Rails. The Rails application will function as a SOAP server and I'm using the datanoise activewebservice plugin for this. Implementing a SOAP API is pretty easy with this, however, I ran into some problems with returning complete model objects.

I have a model called Customer that I want to expose via my API. After setting up the API controller, I specified the following in /app/services/customer_api.rb:

class CustomerAPI < ActionWebService::API::Base
  api_method :get, :expects => [:int], :returns => [Customer]
end

Implemented with:

def get(id)
  return Customer.find(id)
end

Requesting the WSDL nicely gives the definition of Customer, but when making a request for a specific customer errored with the message "Cannot map Customer to SOAP/OM.". It appears that boolean values (in Ruby presented as true/false) and emtpy values (presented in Ruby as nil), were the problem. SOAP wants them as "true", "false" and "".

I solved this by extending my models with a to_struct() method that returns all attributes in a Struct. I extended both ActiveRecord::Base instances to get the instance method to_struct() inside my models and I extended ActiveRecord::Base class so it would return a Struct. The latter is needed, since the SOAP plugin wants to know the attributes and types of the Struct, without actually creating one. Here's what I did:

To extend ActiveRecord::Base with class methods create a module and put the following in your /config/environment.rb:

require 'model_extensions'
ActiveRecord::Base.send(:extend, ModelExtensions)

Then in the module:

module ModelExtensions
  def struct
    # Check if the DataStruct is already defined
    begin
      eval("#{self}::DataStruct")
    rescue
      # Define the DataStruct class and fill it with
      # members that resemble the model
      class_eval <<-EOF
        class DataStruct < ActionWebService::Struct
          #{self}.columns.map{|c| member c.name.to_s, c.type.to_s}
        end
      EOF
    end
    return eval("#{self}::DataStruct")
  end
end

Now to create a DataStruct in a model, we define the following:

module ActiveRecord
  class Base
    def to_struct
      # Get a DataStruct and instantiate
      struct = self.class.struct.new
      # Set the attributes after clean-up
      self.attributes.each do |k,v|
        v = (v.nil?) ? "" : ((v == true) ? "true" : (v == false) ? "false" : v)
        struct.send("#{k}=", v)
      end
      return struct
    end
  end
end

To finish off, we change the API implementation a bit to the following:

class CustomerAPI < ActionWebService::API::Base
  api_method :get, :expects => [:int], :returns => [Customer.struct]
end

Implemented with:

def get(id)
  return Customer.find(id).to_struct
end

Rails will_paginate plugin

I know, it has been a long time since I've blogged and actually people have complained about the fact that I didn't do this for a long time. Well, in short, I'm fine. A lot has happened since the last time I've blogged, but I'll write about that when I make the time somewhere in the upcoming weeks. Since I've been coding a lot with Rails lately, I just wanted to share some geek stuff with the rest of the world today.

In one of my projects, I'm using the will_paginate plug-in, which is pretty cool. Without a lot of stuff, it will give you pagination in your views. It also works for generic Arrays. Anyway, there is one problem. The project I'm working with can contain a lot of records per table. Since will_paginate doesn't act like a named scope (or at least not in this case, since we're not directly calling paginate() on an AssociationProxy, but on an Array) and requires the whole set returned. This is, because it needs the size of the array as well, so it will calculate the total amount of pages and records in the collection. Obviously, this can be quite expensive if you have a lot of records in your table. It will retrieve all records, create objects in the Array and finally cut of just a very small part of it to display. Because of this, we came up with the following solution:

# The amount of objects per page we want to show
limit = $OBJECTS_PER_PAGE

# If the page was set to 0 (shouldn't happen), or nil (yeah, nil.to_i == 0),
# set the offset to 0
offset_page = (page.to_i > 0) ? page.to_i - 1 : 0
offset = limit * offset_page

# Create args hash with which we count
count_args = args.dup

# Add the limit and the offset to the arguments hash
args[:limit] = limit unless args[:limit]
args[:offset] = offset unless args[:offset]

# Count all object that would be there
total_count = objects.count(:all, count_args)

# Find all objects (this is limited)
objects = find(:all, args)

# Create an array filled with nil values, the size of the total collection
objects_array = Array.new(total_count, nil)

# Insert the found objects at the right place in the array
objects_array[(offset)..(offset + limit - 1)] = objects

# Call paginate() on the array with limit and page   
return objects_array.paginate(:per_page => limit, :page => page)

What this does is only getting the required records, using :limit and :o ffset in find() and an additional count() without :limit and :o ffset. To make sure that will_paginate gets a collection the size of the total collection, we create an Array filled with nil values, the size of count(). Then we insert the found records in this array.

The only problem is that the whole operation isn't atomic. The count could differ from the real objects array.

I haven't run any benchmarks, but the Rails logs tell me that only a very small subset of the complete table is selected.

Update: Pointed out by Habbie, using SQL OFFSET is very slow. Back to the drawing board.

Euruko 2008, Prague and people

It's almost done for me. Euruko08 was fantastic! I took a plane on Friday and was kind of lucky. Since the plane was apparently overbooked, there was already someone in my seat, so I had to sit somewhere else. The nice stewardess told me to sit in the front of the plane, where normally the people sit who get the special treatment. Because of this, I also had some extra services like a newspaper and great food. The flight only took 90 minutes and was comfortable. After customs and luggage claim, I took the bus and metro to my hostel. The hostel was pretty basic, but ok.
After strolling around the city center a bit, I took a tram to the opening party. On the way, I crossed one of Pragues many bridges and because of the great view over the river and the light (sun was just going down, which gave spectacular lighting), I stopped after the bridge, walked back and took some great pictures (I'll upload them later on). Afterwards I went on to the party.
Here, there were some people already and very quickly I started some conversations. In the end there were around a hundred people. I talked to so many people from so many different countries; awesome. After the party, I took a taxi to the city center with Thomas and Niklas (from Germany) and Alex from Greece. We had some food (midnight snack) and then I went back to the hostel.
Next day, I went to the conference and spent all day listening to talks, talking to people and so on. I had lunch with Thomas and Niklas and after the conference, we had dinner together at a nice pizza place. After that, party again. There were more people than the day before and again I had lots of conversations. After the party, I took a taxi back to the hostel again.
The following morning, I got up and walked to the conference, alongside the river. Beautiful view! The conference again had nice talks and we had lunch together. The conference ended with 2.5 hours of lightning talks, where I was speaking last. The talk went alright. When the conference was really over, I met Marek, a dutch guy with whom I worked with a couple of years ago, but hadn't seen him since. Together with him, some colleagues, Niklas and Alex, we went to a Mexican bar/restaurant to drink and eat. Here we talked a lot and even tried out some cool code Alex came up with. When the bar was closing we left. I walked back to my hostel through the city center. Prague is such a beautiful city, especially by night!
This morning, I woke up at 8:30 am, had breakfast and checked out of my hostel. Then I roamed the streets, looking at shops and so on. Now I'm in a small restaurant and will leave in a couple of minutes. I'll be doing the tourist thing a bit more and will go to the airport around 3 pm to get my flight back home at 5 pm.
All in all, the weekend was absolutely fantastic! I feel great, met so many nice and interesting people. The only thing is that my voice is absolute crap now, because of all the talking and liters of beer I drank, but that will be fine.. Anyway, I'll be at Euruko 09 next year! Definitely!

Geeks and (spoken) language interest

As written before, I'm currently in Prague for the Euruko 2008 Ruby congress. Yesterday evening, the congress was officially opened with a party at a very nice club. There I spoke with a lot of people, from all over Europe. I had expected that most talks would be about Ruby or Rails, but actually, this didn't happen. What I found very interesting is that most conversations I had (and overheard) were about spoken languages. Apparently, geeks have a fascination for spoken languages. I, myself sucked very badly at French and German in high school (as I did in multiple subjects), but this was mainly because of a lack of interest in learning in general. Since I started programming, I really developed an interest in languages in general; comparing languages and structures within a language. Maybe because of this, I looked for a real challenge and started to learn Russian somewhere last year.
The nice thing to notice is that it's not just me who's really fond of languages in general. I really wouldn't have guessed to find so many people with the same thing at a programming language congress.

Speaking at EURUKO 2008

Friday, I'll be flying to Prague for the EURUKO 2008 Ruby conference and will be doing a 10 minute "lightning talk" about Capistrano and Webistrano (stuff for deploying (Rails) applications in bigger environments).

EURUKO 2008

I just booked a trip to the EURUKO 2008 Ruby conference in Prague, March 29th and 30th! Apparently, I was the first attendee to register and even submitted a proposal for a talk. If it's accepted, more on this later.
I've been to Prague before and I really liked the city. Being there with fellow Rubyists, will probably be even better.

IntelliJ IDEA 7 with Ruby (on Rails)

Only today, I noticed that IntelliJ IDEA 7 was released some weeks ago and I saw that it comes with Ruby and Rails support. I decided to check it out and give it a go. For Rails development, I currently use Aptana RadRails as an Eclipse plugin. Even though I like RadRails, it still lacks some features and coherency in my opinion. Also, setting up RadRails as an Eclipse plugin can be a pain in the ass when dealing with the latest beta versions or nightly builds.

So, I downloaded IntelliJ to my Mac, dragged it to my Applications folder and started it. After installing the Ruby plugin (2 clicks and a restart), I was good to go. I played with it for an hour or so and I must say that I'm quite impressed. The interface is clean (better then NetBeans) and stuff is where I look for it. Also code completion is working (in oposite to RadRails) very well. Hopping through code (declarations, views, models, etc) is nice.

I'll be playing with it for the next couple of days and trying out some other stuff and see if it's perhaps worth buying a license.