Related Pages on Wikipedia via our web services API.

So, we think this is pretty cool beans.  When we did our demo with a mashup of Wikipedia’s content we knew that we wanted something that potential customers could quickly look at and get a feel for what our recommendation engine is capable of, and we got a lot of good feedback about that in our recent technology preview.  On the other hand, we knew that we weren’t going to get the masses to switch over to user our Wikipedia interface.

One of the open questions for us as we pushed out the first bits of our web-services API  last week was, “Can we get this content to show up in Wikipedia proper?”

Last night after an extended hacking session where I tried a number of strategies for doing DOM scripting to pull in external content (and some misadventures in trying to do cross-site XMLHttpRequests) I managed to come up with a simple way of pulling in content from our web service via JSONP, and added support for JSON output to our web service along the way.  For Wikipedians that are logged in, it only requires adding one line to your monobook.js file and I’ve created a short how-to here.  The source code, for interested hackers is here.

Here’s what it looks like:

When we launched our demo a few people didn’t seem to get quite what it does that our engine is doing — we’re not just analyzing the current page and pulling in a few important links; we’re jumping out a few levels in the link structure and analyzing and ranking usually several thousand links in the neighborhood of the target page.  Often those pages are linked from the target page, but that’s hardly a surprise.  I come from a background of doing research in web-like search, so it’s no coincidence that our approach to finding related pages takes some hints from PageRank and other link-based approaches to sorting out the web.

We’d invite people to try this out and of course to keep playing with our mashup; we’ve gotten so used to having related pages that it’s hard to go back to the vanilla Wikipedia — having the related pages there makes it really easy to sort out things like, “What are the important related topics?” or “Well, I know about X, what are the main alternatives?”  And so on.  We’ve got some other exciting stuff up our collective sleeves that we’ll be rolling out in the next couple of weeks, so stay tuned!

3 Comments

  1. YoNoSoyTu:

    I have turned your script into a userscript for Greasemonkey Firefox extension, so you don’t have to have a Wikipedia account to use it.

    The userscript is here: http://userscripts.org/scripts/show/32440

  2. Scott Wheeler:

    Ah, great. We’ll mention that in a follow up post. Thanks!

    By the way, per the comments on the script site, we’ll soon be adding support for other major european languages, so then you’ll be able to access e.g. the Spanish wikipedia articles via our API as well.

  3. Valentin:

    Awesome, thanks for the Greasemonkey script! Just installed it, works like a charm.

Leave a comment