Sunday, May 14, 2006

Idle chatter about global warming

The global warming debate rages on. Many folks think that it's a tempest in a teacup, others think that it's a real sign of human impact on the environment. I don't have much of an opinion on the subject, as I'm woefully ignorant of the science. However, the occaisonal debate catches my interest.

A recent discussion on a different message board included this statement:

Water vapor has always been the primary green house gas.

Let's assume this statement is true. The implication from this statement is that if water vapor is the primary greenhouse gas, that all of the CO2 we could possibly produce would never amount to much. Therefore, global warming is not due to human activity, and attempts to limit it are futile.

Maybe. It depends on what we mean when we say "primary". By primary green house gas do we mean that the primary gas has the largest amount in the atmosphere or the largest contribution to the warming effect?

The difference is essential.

A hypothetical example: assume water is 90% of the greenhouse gas by amount. CO2 is 5%, and all of the other gases are 5%. What if CO2 has 100 times the Green House Gas Activity than water?
Relative activity of water = 1 x 90 = 90

Relative activity of CO2 = 100 x 5 = 500
That 5% by volume would have over five times the net impact of water despite being 1/18th the relative amount.

Revisiting Prior Odds

Remember our earlier examples of deciding whether we have a six or twenty sided die?

When I glibly divide 1/6 by 1/20 I assume that I am equally likely to choose a six sided or a twenty sided die. Let's instead assume a case where 101 dice are placed in a bag. 1 is a six sided die. The remainder are 20 sided. The odds of pulling a six sided die out of the bag is 1 in 100:
P of six sided die: 1/101

P of 20 sided die: 100/101

odds = (1/101)/(100/101) = 1 in 100.
We pull one die out of the bag and start rolling it to see what results we get and to determine how many sides it has. To be confident that we've pulled a six sided die out of our bag, we'd want the result of our rolls to overcome our 100 to one odds of being wrong. For one roll where the number is six or less, our calculation would be:
1/100 * 1/(6/20) = 1/100 * 3.333 = 0.03333

Which is a roughly 1 in 30 chance or 3.2%
For two rolls:
1/100 * 20/6 * 20/6 = 1/100 * 11.111 = 0.11111

About a 1 in 10 chance or 9.9%.
For three rolls:
1/100 * 20/6 * 20/6 * 20/6 = 1/100 * 37.037 = 0.37037

slightly less than 1 in 3 or 27%.
For four rolls:
1/100 * 20/6 * 20/6 * 20/6 * 20/6 = 1/100*123.456 = 1.23456

About 1.23 to one or about 55%.
For five rolls:
1/100 * (20/6)*(20/6)*(20/6)*(20/6)*(20/6) = 4.11

About 4 to 1 or 80%.
So in four rolls with numbers of six or less we could be 80% certain that we had a six sided die. If we rolled even one seven, we would be 0% certain that we had a six sided die.

If there were equal numbers of six and twenty sided dies in the bag, our numbers would be different. The prior probability would be 1/2, the prior odds would be 1 in 1. Therefore our result above would be 411 to one, with a probability of 99.75%.

No matter how compelling our scientific data, the particulars of our situation change the resulting probability.

Sunday, May 07, 2006

My Math is Wrong and I Don't Care

I'm not a mathematician. In fact, I would most likely irritate any mathematician worth his or her salt. This is because I am sampling from a mathematical buffet but am far too lazy to clean up after myself or, heaven forbid, prepare any of the buffet items.

A great deal of my thinking is influenced by something called Bayes's Theorem. Bayes's theorem and Bayesian Inference are often described as a guide on how to update or revise beliefs in light of new evidence.

I think that's fair statement of what I hope to accomplish.

I'm not too rigorous with the math, which means my calculations are most likely wrong at some level. I am cleverly selecting examples that I understand to illustrate my point before traveling into uncharted territory. This is much like a child who uses water wings before learning to swim. What is true is that my math is close enough so that my results make sense.

My earlier math appears wrong if you think about it carefully. Remember how we calculated odds?
Odds = P/(1-P)
Yet I was talking about odds of 3.3333 to one when dividing 1/6 by 1/20. The probability of not rolling a six on a six sided die is 5/6, not 1/20. So what gives?

It turns out that when I'm comparing the 1/6 and 1/20 probabilities, there is a Prior Odds hiding in the calculation that makes the math work. When I divide 1/6 by 1/20, I am assuming that I'd be equally likely to select a six or twenty sided die. This implies a prior odds of 1. It's there, we just don't notice it because it does not effect the calculation.

What that means is
Prior odds = P(selecting a six sided die)/P(selecting a non-six sided die)
Which is another way of saying
Prior Odds = P/(1-P)
Since
P = (1-P)
Therefore
Prior Odds = P/(1-P) = 1.
Therefore the only thing affecting the calculation is the DNA evidence.

In the earlier cases, where I merely divide 1/6 by 1/20, I'm assuming a prior odds of 1. Therefore even though 1/6 divided by 1/20 is not an odds calculation, because the prior odds are present the math will work out. Every calculation I've made without a Prior Odds in fact had one. The real calculation was
Odds of a six sided die = Prior Odds of Choosing a Six Sided Die x (Probability of rolling a six with a six sided die) / (Prob. of Rolling a six with a 20 sided die)
or
Odds of a six sided die = 1 x (1/6)/(1/20) = 20/6 = 3.3333

Monday, May 01, 2006

Odds, Schmodds.

I've tried to avoid large amounts of arithmetic, but at this point it seems inescapable. The "odds" math I was talking about earlier can be expressed a little more formally.
Odds = (probability of an event)/(probability of non-event)

if we call probability P then odds becomes:

Odds = P/(1-P)
And therefore
P = Odds/(1 + Odds)
Let's revisit an earlier example where we rolled a die without knowing how many sides it has. Our earlier comparison looked like this:
Probability of rolling a six assuming a six-sided die: 1/6

Probability of rolling a six assuming a twenty-sided die: 1/20

or odds of about 3.333 to one.
But what do odds of about 3.333 to one mean, really? Perhaps if we could express this in terms of a probability, like 20%, 50%, of 90% we'd feel like we understood it better.

We already know the odds, we want to know the probability. Thanks to the awesome power of algebra, we can compute the probability from the odds. Our calculations go like this:
3.3333/(1+3.3333) = 3.3333/4.3333 = 0.7692
Therefore, odds of 3.333 to one mean a probability of about 77% that we have a six sided die.

Let's look at our earlier examples and convert them from odds to probabilities.

The "no DNA in the duke rape case" example: odds of 1.3333 to one: 57.1%

A standard rape case with one trillion to one odds: 99.9999999999%

Our trick die that rolled six sixes in a row: 99.9978%

Our example of rolling numbers six our less six times in a row on a six sided die versus a twenty sided die: 97.4%

This also gives us handy guide for thinking about odds and probability. If someone says the odds are 10 to one, we're talking roughly 91%. If we switch to 20 to one, we're talking about roughly 95%. If someone says 100 to one, 99%. One thousand to one, 99.9%, ten thousand to one, 99.99%, and so on.

Agatha Christie is for Chumps

Okay, I'll admit it. I watch CSI. I only watch the original show, CSI:Miami and CSI:NY both leave me cold. I also love Quincy and Monk. I detest Colombo because they usually reveal the killer in the opening moments of the episode. I can't tolerate such nonsense. If I already know the identity of the killer, who cares if Columbo figures it out or not?

Along a similar line, I have zero respect for the Agatha Christie stories I've read or seen on TV. In most cases, one person is murdered by any one of a small number of people, perhaps as many as ten. Even if Poirot is a complete blunderer, outside of The Orient Express he has a one in ten chance of being correct assuming only one killer. This is admittedly a critical assumption, if we assume any combination of only five possible murderers are equally likely, Poirot's chances of determining the correct combination are much lower than 1 in 5, in the neighborhood of 1 in 30.

The real problem with Agatha Christie stories is that they fabricate a closed system to limit the scope of the problem to just a few possible criminals. With only ten people on the train, boat, or in the provincial English manor, only those few people may have contributed to the criminal event. In the real world, a crime is not a closed system, and many more than ten people might have contributed the evidence we find in a crime scene.

However, our much maligned author can contribute a useful idea to our toolbox: the notion of Prior Odds. I think about prior odds in the following fashion:

What is our chance of being "correct" if we choose a solution at random?

Prior odds are commonly utilized in mass disaster cases such as the World Trade Center disaster, Hurricane Katrina, and plane crashes. For example, if 301 people board a plane that later crashes, our expected chances of correctly identifying any one set of remains if we wildly guessed at random is about 1 in 301. The chances of identifying any one set of remains incorrectly is 300 in 301. So how do we get to prior odds from these two figures? The math is quite simple.

The odds of an event are the probability of the event divided by the probability of the non event. From above, our Prior Odds would be
(probability of correct ID)/(probability of incorrect ID).

So our prior odds are: (1/301)/(300/301) = 1 in 300.
To prevent nasty lawsuits, we want our scientific data that identifies remains as belonging to a passenger to have better odds than 300 to 1, maybe much better than 300 to one. It'd be a good idea to divide our statistics by 300 to see how well our data overcomes the 1 in 300 odds of being right.

What this means is that if our scientific analysis finds data that is 100 times more likely to belong to passenger #57 than a random person, there are only 1 out 3 odds of being correct because there are nearly 300 chances to be wrong. If there were only 10 people on the plane, the odds are instead around 10 to 1 in favor of being right.

A typical DNA profile matching an individual might be one trillion times more likely to be from that person than any one unrelated individual. However, when we are thinking about a large scale mass disaster like the 2004 Indian Ocean Tsunami where over one hundred thousand people were killed, prior odds that are at most 1/100,000 reduce our result to "only" ten million times more likely. If we imagine cases where a the missing person's toothbrush or other DNA-laden item is not available, we'd only have DNA from family members to identify a missing person. Typical DNA results might produce odds of one million to one, which when multiplied by our 1/100,000 number leaves only 10 to one.

Sounds pretty good, right? Here's the problem. If we have 10 to 1 odds of being right, that means we'll actually be wrong about 9% of the time. That means out of 100,000 dead people, we'd have on average 9000 wrong identifications. I suppose that might be acceptable to anyone other than the 9000 families with misidentified loved ones.

What follows next is some math to explain a little more of what I mean.