Clark pointed out in my last post on the Washington Post's statistical analysis of the Iranian election results that we can't conclude from an event with 4 in 1000 probability of occurrence that there is a 996 in 1000 probability that occurrence of the event was faked. This is true, of course, and I should have been more clear in my remarks.
The information we want is the probability that the Iranian election results were legitimate, given the data we see. What the WaPo has given us, however, is the probability of seeing the data that we see, given a legitimate election. They sound like the same thing, but they're not.
The Wikipedia article on conditional probability gives the classic example of why this is the case. Suppose you have a test for a disease that is 99% accurate--it returns a false result 1% of the time. That sounds pretty good, but if you use it to screen for a disease that affects only 1% of the population, then the probability that a person actually has the disease, given a positive test result, drops to 50%. So the probability of seeing a positive test result, given the presence of the disease (99%) is not the same as the probability of having the disease, given a positive test result (50%).
The Prosecutor's Fallacy describes the application of this logic in the real world. Here, the minuscule probability of certain evidence emerging, given the innocence of the accused, is used to argue that the probability of the accused being innocent must be just as tiny. This is what the WaPo appears to be engaging in with their statistical analysis.
Or is it? The thing to remember here is that this discrepancy exists because of the discrepancies in the prior probabilities that our conditional probabilities are based on. For example, in our disease scenario, the test becomes as accurate as it appears if 50% of the population are affected by the disease. The probability of seeing a positive test result, given the presence of the disease, and the probability of having the disease, given a positive test result are both 99%. It's the fact that the disease is so rare to begin with that makes all the difference.
So with the Iranian election, if we already knew that there was a 50% chance the election results were faked, then the analysis described in the WaPo would be strong evidence indicating shenanigans. I'm not saying we know that, but there does seem to be a lot more evidence of irregularities other than statistical analysis alone.
One more example to illustrate what I mean. Suppose 99 out of every 100 swans are white, and one is black. If you're walking along in the park and you see a black swan, you've just witnessed a rare event, with a probability of 1%. But this does not mean that you can be 99% sure that this swan is fake. That's because the odds of seeing a fake swan are based on something else entirely, and are usually far from 50%.
Suppose, however, that you heard on the news this morning that people have been out spray-painting white swans black. Suppose further that half of them have been spray-painted. Now, when you see a black swan, the odds of it being fake have climbed much closer to 99%, because you know the odds of seeing a fake swan to begin with are 50%.
Regina Spektor has a new album out today. I bought the version with the DVD. Everyone seems to be sticking DVDs in their CDs these days. I guess they feel like they need to give you a reason to buy the CD in the store, what with iTunes and everything.
Speaking of iTunes, I realize I'm in the minority here, but I like albums. Even if I only want one song, I prefer to own the entire album that comes with it. I don't know, it just feels like it belongs to something bigger. Maybe the rest of the album gives context to the song. I can't explain why, but it is incredibly annoying to me whenever I scroll through the songs on someone's iPod and there are all these solitary songs without their full albums. They're like socks without mates or something. Songs belong in albums.
So the whole iTunes business model has never appealed to me. I have never felt ripped off because I bought an album for one song and never listed to the rest of it, I guess. If anything, it's the opposite with me--there might be one song on the disc that I skip. So why pay $13 for an iTunes album when I can buy the same thing at Target for $10 and get the physical media and liner notes and everything?
So yeah, get off my lawn.
Hello, loyal readers. My apologies for not blogging in over three months.
My excuse is that I got a new job, and this new job eats up my time in several ways. The commute is very long and the work itself gives me little opportunity to surf the Internet. So I have less time for actual blogging, and less time to find interesting things to actually blog about.
But hopefully as I settle in I can find a way to work this blog back into my routine.
To that end, check out this statistical analysis of the election results in Iran. Just by looking at the various vote tallies, it can be shown that there is a 99.5% chance that these tallies were made up by a human rather than generated randomly, as would be the case in a clean election. Fascinating.