Monday, May 01, 2006

Agatha Christie is for Chumps

Okay, I'll admit it. I watch CSI. I only watch the original show, CSI:Miami and CSI:NY both leave me cold. I also love Quincy and Monk. I detest Colombo because they usually reveal the killer in the opening moments of the episode. I can't tolerate such nonsense. If I already know the identity of the killer, who cares if Columbo figures it out or not?

Along a similar line, I have zero respect for the Agatha Christie stories I've read or seen on TV. In most cases, one person is murdered by any one of a small number of people, perhaps as many as ten. Even if Poirot is a complete blunderer, outside of The Orient Express he has a one in ten chance of being correct assuming only one killer. This is admittedly a critical assumption, if we assume any combination of only five possible murderers are equally likely, Poirot's chances of determining the correct combination are much lower than 1 in 5, in the neighborhood of 1 in 30.

The real problem with Agatha Christie stories is that they fabricate a closed system to limit the scope of the problem to just a few possible criminals. With only ten people on the train, boat, or in the provincial English manor, only those few people may have contributed to the criminal event. In the real world, a crime is not a closed system, and many more than ten people might have contributed the evidence we find in a crime scene.

However, our much maligned author can contribute a useful idea to our toolbox: the notion of Prior Odds. I think about prior odds in the following fashion:

What is our chance of being "correct" if we choose a solution at random?

Prior odds are commonly utilized in mass disaster cases such as the World Trade Center disaster, Hurricane Katrina, and plane crashes. For example, if 301 people board a plane that later crashes, our expected chances of correctly identifying any one set of remains if we wildly guessed at random is about 1 in 301. The chances of identifying any one set of remains incorrectly is 300 in 301. So how do we get to prior odds from these two figures? The math is quite simple.

The odds of an event are the probability of the event divided by the probability of the non event. From above, our Prior Odds would be
(probability of correct ID)/(probability of incorrect ID).

So our prior odds are: (1/301)/(300/301) = 1 in 300.
To prevent nasty lawsuits, we want our scientific data that identifies remains as belonging to a passenger to have better odds than 300 to 1, maybe much better than 300 to one. It'd be a good idea to divide our statistics by 300 to see how well our data overcomes the 1 in 300 odds of being right.

What this means is that if our scientific analysis finds data that is 100 times more likely to belong to passenger #57 than a random person, there are only 1 out 3 odds of being correct because there are nearly 300 chances to be wrong. If there were only 10 people on the plane, the odds are instead around 10 to 1 in favor of being right.

A typical DNA profile matching an individual might be one trillion times more likely to be from that person than any one unrelated individual. However, when we are thinking about a large scale mass disaster like the 2004 Indian Ocean Tsunami where over one hundred thousand people were killed, prior odds that are at most 1/100,000 reduce our result to "only" ten million times more likely. If we imagine cases where a the missing person's toothbrush or other DNA-laden item is not available, we'd only have DNA from family members to identify a missing person. Typical DNA results might produce odds of one million to one, which when multiplied by our 1/100,000 number leaves only 10 to one.

Sounds pretty good, right? Here's the problem. If we have 10 to 1 odds of being right, that means we'll actually be wrong about 9% of the time. That means out of 100,000 dead people, we'd have on average 9000 wrong identifications. I suppose that might be acceptable to anyone other than the 9000 families with misidentified loved ones.

What follows next is some math to explain a little more of what I mean.