You can kill the background for speed, if you wish.[x]

Friday, December 18, 2009

What I learned from the xkcd effect

It is a well-known phenomenon by now that whenever Randall Munroe mentions anything obscure in xkcd, searches for it spike tremendously. To this point, as far as I'm aware, he hasn't wielded this fact for evil, but still...the power that Randall Munroe holds over the internet is terrifying. I was reminded of this when one of my friends referenced one of these comics on Twitter. I then, with a bit of Googling, was unable to find a good list of examples of the XKCD effect, and decided that I would endeavor to create one.

Of course, I decided to write a script to do this - there are 676 comics as of this writing, and fortunately, transcriptions are available via OhNoRobot, so I don't even have to deal with the images. After a little poking around, I found a post on the XKCD forums (thanks, philip!) that gave the url http://www.ohnorobot.com/transcribe.pl?comicid=apKHvCCc66NMg&url=http:%2F%2Fxkcd.com%2F[comic number]%2F to get the transcription for a given comic. Perfect! I of course lean towards Perl for these kinds of things, and I was tempted to go with Python because of the images. But then I remembered that a simple gnome-open [image file] would do more than I needed, so Perl it was.

Additionally, I remembered that I had already written an xkcd download script that I had stashed away in my gMail, so I had a leg up already. So, after a mere four hours of hacking, I present the xkcd effectalyzer. It's pretty self-explanatory, the only parameter is "-r", if you want to go through the comics in reverse. The script goes through your specified comics, presents you with the OhNoRobots text, and gives you the option to view the image of the comic. Once that's all done, it asks you for a phrase to search on Google Trends. It then (with the credentials you provided at the beginning) grabs the necessary CSVs from Google Trends to get the trend data for the 5 days around and including the publication date of the comic (which it gets from xkcd's archives page). It then tells you what the indexes are, and lets you decide whether or not to save that result to a the output CSV file. It continues to ask you for phrases until you don't give it one, and will write the first three to the CSV file.

It might do more than that, but like I said - it should be pretty self-explanatory. It's got all kinds of nifty features like saving your Google session, grabbing neighboring months if necessary, that are mostly what took so long. But anyway, I went through backwards from the current comic (676) back to 600, and besides revisiting some fantastic comics (including my favorite of the more recent ones), I found over 60 noticeable spikes in Google searches because of an episode of xkcd - that's in just 75 comics. Some comics spiked for multiple phrases, of course, and some none at all. But this also includes 20 terms that hit Google's Hot Trends page. I put them together in a graph that shows the spikes collected around the release of the comic, and there are a couple of interesting anomalies: Obviously, there are a couple ponts off to the right that need explaining. The two that are shifted one to the right are both from "Locke and Demosthenes", which was released on Friday, October 11 of this year. So why the discrepancy? Well, my script gathers the dates from the recommended source - the alt-text on the xkcd archive page. But for "Locke and Demosthenes", the alt-text is off by one day, and says "9-10-09". Since the previous comics were published 9-7-09 and 9-9-09, and Randall only publishes on Thursdays in the event of a five-day series - not to mention the Google results, I'm willing to bet that the archives page is in error, and it was actually published on Friday. The other anomaly on the right-hand side is just because the fifth day is in the next month, which screws up the relative numbers. It disappears in this chart based on the fixed data, which also, handily enough, highlights an anomaly on the right side: "github" spiked on the day that Munroe put out Branding, but was climbing in popularity before that. Why that is, I haven't the foggiest.

Now, pretty charts and things aside, it's also interesting, of course, to look at which terms spiked the most. So, here's a list of all the terms, sorted by the severity of their spikes:

As I looked through these, the thing I was most surprised by were some of the things that people Googled, presumably because they didn't know what it was about. I mean, some, like SMBC, Hofstad, Peter Wiggin, Demosthenes - I understand those. But classic stuff like "The only winning move is not to play", "the cake is a lie", and stuff like sampling bias, Q.E.D., Carl Sagan, 2038, or the debacle with the brontosaurus, demonstrates that xkcd readership does indeed include many that are not part of the normal geek crowd - such as liberal-arts majors. Also, Stephen Douglas? The Bull Moose party? What have history classes been teaching that people had to look those up?

But the takeaway from this, I think, is that Randall Munroe, as of late, anyway, has a better than 50% chance (41 times out of 76) of noticeably affecting the Google searches for whatever he happens to mention in his comic. It takes a whole lot of readers (which of course we know xkcd has) to do that with a single webcomic, and this illustrates quite clearly that Randall has them.

The other thing this demonstrates is that I have too much time on my hands, but I just finished finals, and it's Christmas break, so I don't want to hear about it.

10 comments:

MissBaobob said...

Nice.. I've definitely wondered how far this played out...

Anonymous said...

> Also, Stephen Douglas? The Bull Moose party? What have history classes been teaching that people had to look those up?

Don't forget that a significant fraction of the xkcd readership is not from the USA...

John S. Wilkins said...

US shared knowledge is not so shared as you might expect. Even in the US...

Anonymous said...

The increase in 'tautology' from 'Honor Societies' is pretty impressive.

Jagat said...

I'm sure malamonteau beats all of these.

D said...

This is kind of brilliant.

parmeisan said...

> Also, Stephen Douglas? The Bull Moose party? What have history classes been teaching that people had to look those up?

Yeah... not being American, I don't feel the least bit ashamed about not knowing these.

Interesting article, though.

Anonymous said...

As said be above, xkcd is read and loved outside of the US. Plenty of "shared knowledge" only is in a cultural context.

I search for xkcd reference all the time, and often it's because:

A - it's something I that is only commun knowledge in the US (eg: US history)
B - it's some trendy thing in the US, but not outside
C - I know the stuff but didn't know you write it or consider it that way in english

Anonymous said...

Your effectualiser link (http://myhome.spu.edu/bradsj/stuff/xkcdeffect.pl.txt) is broken.

Anonymous said...

For comic #369, "blogging accident spiked from almost nothing. In the comic, it mentioned it had two results. Now it has 9180 results, even in quotation marks.