Making ActiveResource 34x faster: QActiveResource

One of the things that really hurts us when we’re pulling data from ActiveResource (as we do with Shopify and a couple projects internally) is the slowness of Rails’ ActiveResource. Using the Nokogiri backend helps a lot, but it’s still painfully slow. It’s so slow that the bottle neck is actually parsing the data rather than transfering it.

So I set off yesterday to rewrite our importer in C++ using Qt (and libcurl to grab the data). The result is a nice Qt-ified ActiveResource consumer that throws things into a collection of QVariant / hashes / lists.

Once I had that done I wondered, “What would the performance look like if I wrote a Ruby wrapper for the C++ implementation?” The answer was, fortunately, “Great!” meaning fortunately that the application logic can stay in Ruby with QActiveResource doing the heavy lifting.

It’s still relatively light on features — it just supports the find method, but the numbers speak for themselves:

The first column is the default pure-Ruby ActiveResource implementation, the second is with the same, but using the implemented-in-C Nokogiri backend. The third is just using my C++ backend directly and the fourth is with that bound to Ruby objects.

Methodology:

The data set is the set of products that we have in our test shop on Shopify. There are 17 of them, for a total of a 36k XML file. Each test iterates over reading that set 100 times. To remove other bottlenecks, I copied the file to a web server on localhost and queried that directly.

So, then that’s reading 1700 records for each backend over 100 request, with that the average times were:

  • Ruby / ActiveResource / REXML: 34.60 seconds
  • Ruby / ActiveResource / Nokogiri: 12.87 seconds
  • C++ / QActiveResource: 0.79 seconds
  • Ruby / QActiveResource: 1.01 seconds

All of the code is up on GitHub here, including the test data and the raw results.

API in Ruby and C++:

The Ruby API is very similar to a limited version of ActiveResource and supports things like this, for example:

resource = QAR::Resource.new(ENV['AR_BASE'], ENV['AR_RESOURCE'])
resource.find(:all, :params => { :page => 1 }).each { |record| puts record[:title] }

The C++ implementation also follows the same basic conventions, e.g.

#include 
#include 
 
int main()
{
    QActiveResource::Resource resource(QUrl(getenv("AR_BASE")), getenv("AR_RESOURCE"));
 
    foreach(QActiveResource::Record record, resource.find())
    {
        qDebug() << record["title"].toString();
    }
 
    return 0;
}

At present this naturally requires Qt and libcurl to build. I may consider building a version which doesn’t require Qt if there’s general interest in such (we use Qt in the C++ bits of our code anyway, so there was no extra cost to schlurping it in).

There are more examples in the Ruby directory on GitHub. Once it matures a wee bit we’ll get it packaged up as a Gem.

Edit:

The API’s already been going through some changes, now it can be easily used as a mixin, a la:

require 'rubygems'
require 'active_resource'
require 'QAR'
 
class Product < ActiveResource::Base
  extend QAR
  self.site = 'http://localhost/'
end

12 Comments

  1. joshua:

    i think you mean “Ruby / ActiveRecord / Nokogiri: 12.87 seconds” in your methodology section, not REXML.

  2. Scott Wheeler:

    Erm, yep. And should have been ActiveResource too. I still alway mix that up. Fixed!

  3. Edward Ocampo-Gooding:

    Oh wow! Nice work Scott.

    Any plans to open source this code?

  4. Scott Wheeler:

    @edward – The GitHub links are in the post. :-)

  5. Cody Fauser:

    Scott,

    Looks impessive! Are you taking advantage of gzip compression as well? I found that this reduces the size of API responses significantly. Fetching 250 products with Accept-Encoding: gzip is 7.5% of the size of the non-gzipped response.

  6. Scott Wheeler:

    Hey Cody – nope, this wasn’t with gzip, but the profiling was with files served from localhost, so I’d assume it’d only slow them down? Honestly I don’t know how to force AR to use gzip/zlib, but it’s easy enough from libcurl. :-)

  7. Vito Botta:

    Hi Scott,

    those figures look stunning! I am working on both an API and a couple of client applications which make heavy use of ActiveResource to communicate with each other.

    However, I am using JSON as serialisation format rather than XML. Would QAR work using JSON? And, if yes, would I still see that huge performance boost?

    Thanks!

  8. Scott Wheeler:

    Hey Vito —

    At the moment QAR just does XML parsing (since that’s where the use case was for us — consuming AR APIs that we don’t have control of). Some of the speedup is in object creation and using libcurl instead of Ruby’s network stack, which you would get even with a JSON consumer, but a lot of the speedup is in XML parsing as well, so how much of a speedup you’d get there really depends on how bad Ruby’s JSON parsers are. :-)

    In principle it wouldn’t be hard to extend QAR to JSON. Just the reader function would need to be replaced and the fetch function would need to be made a little smarter to decide which format to use.

  9. Scott Wheeler:

    It’s quite similar in spirit to MonkeySupport, though attacking another part of the Rails stack. Honestly, I think Rails is grown up enough at this point that it’d make sense for it to have C implementations of a lot of its hot points stuff and for them to not be outcasts in the project.

  10. Vito Botta:

    Thanks for the speedy reply Scott – will definitely keep an eye on QAR ;)

  11. Gitfeed, Git to RSS and the growing collection of small hacks at Scott Wheeler:

    […] for reading data — basically a C++ implementation of the find method. Details in the company blog, or naturally on […]

Leave a comment