The Blog

Posts from March 2010

Mar 30

All Parsers Are Not Created Equal

By David Czarnecki

This is a post about XML and JSON.

We recently came across a bottleneck in one of our applications that grabs data from a content repository. A large part of the bottleneck had to do with caching of data from the content repository, or lack thereof. A small part of the bottleneck has to do with parsing of XML data as the application grabs XML feeds from the content repository and parses it to display data within the application. Novel idea using XML as a communication mechanism between 2 applications? You betcha.

In any event, boring machine name aside, I decided to benchmark the data parsing via JSON, Hpricot, and Simple-RSS.

David-Czarneckis-iMac:application-playground dczarnecki$ ruby parsing-benchmarking.rb
"Benchmarking JSON"
"    JSON document: /Users/dczarnecki/projects/parsing-benchmarking/sample_json_document.json, Size: 161574"
"    JSON::Ext::ParserJSON::Ext::Generator"
"    Benchmarked JSON parse for 20x (average): 0.0121459603309631"
"Benchmarking Hpricot"
"    XML document: /Users/dczarnecki/projects/parsing-benchmarking/sample_photo_rss_document.rss, Size: 162644"
"    Benchmarked XML parse for 20x (average): 0.00232400894165039"
"Benchmarking Simple-RSS"
"    XML document: /Users/dczarnecki/projects/parsing-benchmarking/sample_photo_rss_document.rss, Size: 162644"
"    Benchmarked XML parse for 20x (average): 1.78787769079208"
David-Czarneckis-iMac:application-playground dczarnecki$

The numbers are interesting. Here is my opinion on the results:

  • Hpricot is the fastest at parsing a big XML document. You have to use XPath to get at the elements in the document. XPath is, for me, non-intuitive and you sacrifice a bit in terms of readability of the code.

  • JSON is the second fastest at parsing a big JSON document. Also, you have a four letter acronym technology integrated into your application, and let’s finally admit it, size matters. For me, you gain intuitive access to the underlying data and more readability of the code.

  • Simple-RSS is slow.

  • XML output from our content repository doesn’t give us paged access to the content, but the JSON API in the content repository does give us paged access to the content. With the XML output, we have to suck down a firehose and slice it up appropriately leading to a bunch of flackery with caching the parsed document.

If you’re looking for a better read, then check out the following book, “Paint It Black: A Guide to Gothic Homemaking”.

Paint it Black

Mar 11

vBulletin and NGINX

By Jason LaPorte

It’s no secret that Agorian systems folk favor NGINX for our web serving needs. We’ve written about it a lot before. Therefore, it should be no surprise that we end up making a lot of things designed to work on Apache work on NGINX. (We’ve also written about that before, come to think of it…)

One example is vBulletin. A number of Agora’s sites are powered by the forum software, which comes with rewriting rules for Apache’s mod_rewrite and IIS… but not NGINX.

So, if you’re interested in setting up vBulletin behind NGINX (and are using the advanced URL rewriting, like we are), you can find a sample configuration for doing so  here.

Let us know if you have any questions!

Mar 8

Bzr to Git Migration

By An Engineer

When I joined Agora, one of the first things I did was talk up git and how it’ll cure cancer, AIDS, and solve world peace. All at once. What that means for me is I’ve basically been tasked with the job of migrating anything that’s not git to git.

For some things these kinds of migration are first class citizens. Conveniently SVN, our old VCS is one of those. One of my new migrations was, less conveniently, Bazaar. Now we have nothing against Bazaar at Agora. I mean my main personal open source project, Exaile, uses Bazaar. But we agreed we would rather only have one VCS in house.

After looking around and trying some fancy tools that didn’t work (read: tailor), I stumbled on a really quick solution that seems like it does everything necessary. Both Git and Bazaar (via plugins) support the fast-import/export format. I’m not sure about the mystic ways of how this format works but I do know it made my Bazaar repository a Git repository, and that makes me pleased.

Getting the bzr plugin

The first step would be to get the fast-import plugin for Bazaar from the launchpad mirror. mkdir -p ~/.bazaar/plugins cd ~/.bazaar/plugins bzr clone lp:bzr-fastimport fastimport You can make sure it installed properly using a bzr fast-export --help and ensure that it doesn’t complain.

Copy the repository

Now that we have all the tools, time to copy it over mkdir ~/project.git cd ~/project.git git init bzr fast-export --plain ~/path/to/bzr/branch | git fast-import git checkout master # only needed for a non-bare repository, like I made above

Wait a little while (or a long while if you’re testing the above code on a netbook for some reason like me). And that should be it.

I’m not sure how well this works with multiple Bazaar branches. There may be some crazy --flags on each side to make it work but running the code I put above on a full repo makes fast-export complain that I’m not pointing it to a valid branch. Please give us your comments if you know how to do this :).

Update: Found out it was .bazaar not .bzr. My bad.

Mar 8

Devon in print

By Devon Smith

Our very own Devon Smith (Agora’s multi-talented QA Lead) had an article published in this month’s T.E.S.T Magazine. You can see her article online ( http://www.testmagazine.co.uk/2010/03/keep-the-user-in-mind/), or buy it in print later this month.