Tuesday, July 07, 2009

What's wrong with being average?

Yahoo finance has their own list of guest columnists that write opinion pieces on money and investing. They vary greatly in quality, by far the worst is "Rich Dad, Poor Dad" Author Robert Kiyosaki. Mr. Kiyosaki's mendacity is pretty well documented, so I won't comment on it here.

What is abundantly clear from his July 7 article is that Mr. Kiyosaki hasn't the faintest understanding of statistics. He tells investors "not to be average" as if average investors chose mediocrity.

No one chooses to be average.

If you try to make money from your investments, you are most likely to obtain average returns. This is because the average return is calculated based on the returns of people just like you. They're all shooting for the big returns, but most of them end up being average. Even worse, approximately as many of those few who get big returns instead get big losses.

Being a pro is no protection against average returns either. How many "pros" have lost so big that they're in a pickle now?

Why fight average returns anyways? You can, with a minimum of effort, obtain average returns from a well diversified portfolio. It has low transaction costs, low taxes, and typically does better than a non-diversified portfolio. It did far better during our current crisis unless you were one of the lucky few pros or amateurs to be in the right asset class at the right time.

Average returns, because they're a more likely result, are something you can plan around. You can plan your future on them, including knowing how much money to save and how to invest it. Not so for big returns - they're far less likely.

If you want to be above average, you could instead follow Kiyosaki's advice and work like a dog for those above average returns. Most likely, you'll still get average returns. You'll pay higher taxes and transaction costs, and depending on your strategy, have none of the benefits of diversification.

In his article, Kiyosaki talks about how his greater than average strategy led him to be homeless and broke in 1985. He left his wife with $2 while he went off to Australia on a business trip. His net income that year was $1500. I have no idea why anyone would want to replicate such a life. An average person will make more from average returns based on average savings from an average salary, without the homelessness and bone-crushing stress that comes with it.

One way you can become "above average" is to invest more than the average amount. You'll make more money because you're getting an average return from an above-average sized investment.

Sunday, November 12, 2006

Politics and the "Dumb Electorate" Myth

Around every election season, we routinely hear grumblings from folks that go something like this:
Man, people are stupid! If only people weren't so ignorant, the election would have gone the other way. What we really need to do, is make certain that we have an educated voting public rather than the stupid one we have now.
I've even heard people insist that we need a literacy test to keep the brainless electorate from ruining our lives.

I don't buy it. The idea that things would go my way if only people weren't such dopes is a very seductive concept, but ultimately it is a self serving one. I had many a strong political disagreement with well-educated colleagues, and the problem was definitely not a lack of education or misunderstanding of the issues.

It's difficult to determine how well-informed voters are on specific election issues, although I am working on finding out what I can. We can get a sense of how stupid our "dumb electorate" is by examining their educational status. Fortunately our wonderful data-compiling government collected statistics on various demographic data on those who register and those who vote. This data includes the educational status of our voting public.

You can pull up tables of who registered and who voted. I calculated percentages of the educational status of those who actually voted in the 2004 Presidential Election. What I found was:
11% have an advanced degree

21% have a bachelor's degree

31% have some college or an associates degree

28% have a high school diploma
You can then use these numbers of calculate some startling results.
91% who vote have at least completed k-12 successfully.

32% have either a bachelor's or advanced degree.

63% have some college, an associates degree, a bachelor's degree, or an
advanced degree.

They dwarf the 28% who only have a high school diploma and
the remaining 9% that have less than a k-12 education and still vote.
So, is this an uneducated voting populace?

To bring this full circle, I don't think some kind of barrier aimed at disenfranchising the less-educated is going to change much, even if it wasn't howled out of the room for a the ridiculous imposition upon the rights of the governed that it really is. Even if I assume that these people are a big problem, they're a very small percentage of the voters - a wee 9% of the vote.

I place the "Dumb Electorate" myth in same category along with the Eugenics hysteria of the first half of the 20th Century. The belief was that our country was being overrun by idiot immigrants because they hadn't passed an allegedly education and languange independent I.Q. test. Things got so out of hand that people were actually sterilized to stamp out their allegedly stupid genes. More on this topic is discussed in Stephen Jay Gould's The Mismeasure of Man.

To quote Mark Twain "There is something worse than ignorance, and that's knowing what ain't so."

Sunday, May 14, 2006

Idle chatter about global warming

The global warming debate rages on. Many folks think that it's a tempest in a teacup, others think that it's a real sign of human impact on the environment. I don't have much of an opinion on the subject, as I'm woefully ignorant of the science. However, the occaisonal debate catches my interest.

A recent discussion on a different message board included this statement:

Water vapor has always been the primary green house gas.

Let's assume this statement is true. The implication from this statement is that if water vapor is the primary greenhouse gas, that all of the CO2 we could possibly produce would never amount to much. Therefore, global warming is not due to human activity, and attempts to limit it are futile.

Maybe. It depends on what we mean when we say "primary". By primary green house gas do we mean that the primary gas has the largest amount in the atmosphere or the largest contribution to the warming effect?

The difference is essential.

A hypothetical example: assume water is 90% of the greenhouse gas by amount. CO2 is 5%, and all of the other gases are 5%. What if CO2 has 100 times the Green House Gas Activity than water?
Relative activity of water = 1 x 90 = 90

Relative activity of CO2 = 100 x 5 = 500
That 5% by volume would have over five times the net impact of water despite being 1/18th the relative amount.

Revisiting Prior Odds

Remember our earlier examples of deciding whether we have a six or twenty sided die?

When I glibly divide 1/6 by 1/20 I assume that I am equally likely to choose a six sided or a twenty sided die. Let's instead assume a case where 101 dice are placed in a bag. 1 is a six sided die. The remainder are 20 sided. The odds of pulling a six sided die out of the bag is 1 in 100:
P of six sided die: 1/101

P of 20 sided die: 100/101

odds = (1/101)/(100/101) = 1 in 100.
We pull one die out of the bag and start rolling it to see what results we get and to determine how many sides it has. To be confident that we've pulled a six sided die out of our bag, we'd want the result of our rolls to overcome our 100 to one odds of being wrong. For one roll where the number is six or less, our calculation would be:
1/100 * 1/(6/20) = 1/100 * 3.333 = 0.03333

Which is a roughly 1 in 30 chance or 3.2%
For two rolls:
1/100 * 20/6 * 20/6 = 1/100 * 11.111 = 0.11111

About a 1 in 10 chance or 9.9%.
For three rolls:
1/100 * 20/6 * 20/6 * 20/6 = 1/100 * 37.037 = 0.37037

slightly less than 1 in 3 or 27%.
For four rolls:
1/100 * 20/6 * 20/6 * 20/6 * 20/6 = 1/100*123.456 = 1.23456

About 1.23 to one or about 55%.
For five rolls:
1/100 * (20/6)*(20/6)*(20/6)*(20/6)*(20/6) = 4.11

About 4 to 1 or 80%.
So in four rolls with numbers of six or less we could be 80% certain that we had a six sided die. If we rolled even one seven, we would be 0% certain that we had a six sided die.

If there were equal numbers of six and twenty sided dies in the bag, our numbers would be different. The prior probability would be 1/2, the prior odds would be 1 in 1. Therefore our result above would be 411 to one, with a probability of 99.75%.

No matter how compelling our scientific data, the particulars of our situation change the resulting probability.

Sunday, May 07, 2006

My Math is Wrong and I Don't Care

I'm not a mathematician. In fact, I would most likely irritate any mathematician worth his or her salt. This is because I am sampling from a mathematical buffet but am far too lazy to clean up after myself or, heaven forbid, prepare any of the buffet items.

A great deal of my thinking is influenced by something called Bayes's Theorem. Bayes's theorem and Bayesian Inference are often described as a guide on how to update or revise beliefs in light of new evidence.

I think that's fair statement of what I hope to accomplish.

I'm not too rigorous with the math, which means my calculations are most likely wrong at some level. I am cleverly selecting examples that I understand to illustrate my point before traveling into uncharted territory. This is much like a child who uses water wings before learning to swim. What is true is that my math is close enough so that my results make sense.

My earlier math appears wrong if you think about it carefully. Remember how we calculated odds?
Odds = P/(1-P)
Yet I was talking about odds of 3.3333 to one when dividing 1/6 by 1/20. The probability of not rolling a six on a six sided die is 5/6, not 1/20. So what gives?

It turns out that when I'm comparing the 1/6 and 1/20 probabilities, there is a Prior Odds hiding in the calculation that makes the math work. When I divide 1/6 by 1/20, I am assuming that I'd be equally likely to select a six or twenty sided die. This implies a prior odds of 1. It's there, we just don't notice it because it does not effect the calculation.

What that means is
Prior odds = P(selecting a six sided die)/P(selecting a non-six sided die)
Which is another way of saying
Prior Odds = P/(1-P)
P = (1-P)
Prior Odds = P/(1-P) = 1.
Therefore the only thing affecting the calculation is the DNA evidence.

In the earlier cases, where I merely divide 1/6 by 1/20, I'm assuming a prior odds of 1. Therefore even though 1/6 divided by 1/20 is not an odds calculation, because the prior odds are present the math will work out. Every calculation I've made without a Prior Odds in fact had one. The real calculation was
Odds of a six sided die = Prior Odds of Choosing a Six Sided Die x (Probability of rolling a six with a six sided die) / (Prob. of Rolling a six with a 20 sided die)
Odds of a six sided die = 1 x (1/6)/(1/20) = 20/6 = 3.3333

Monday, May 01, 2006

Odds, Schmodds.

I've tried to avoid large amounts of arithmetic, but at this point it seems inescapable. The "odds" math I was talking about earlier can be expressed a little more formally.
Odds = (probability of an event)/(probability of non-event)

if we call probability P then odds becomes:

Odds = P/(1-P)
And therefore
P = Odds/(1 + Odds)
Let's revisit an earlier example where we rolled a die without knowing how many sides it has. Our earlier comparison looked like this:
Probability of rolling a six assuming a six-sided die: 1/6

Probability of rolling a six assuming a twenty-sided die: 1/20

or odds of about 3.333 to one.
But what do odds of about 3.333 to one mean, really? Perhaps if we could express this in terms of a probability, like 20%, 50%, of 90% we'd feel like we understood it better.

We already know the odds, we want to know the probability. Thanks to the awesome power of algebra, we can compute the probability from the odds. Our calculations go like this:
3.3333/(1+3.3333) = 3.3333/4.3333 = 0.7692
Therefore, odds of 3.333 to one mean a probability of about 77% that we have a six sided die.

Let's look at our earlier examples and convert them from odds to probabilities.

The "no DNA in the duke rape case" example: odds of 1.3333 to one: 57.1%

A standard rape case with one trillion to one odds: 99.9999999999%

Our trick die that rolled six sixes in a row: 99.9978%

Our example of rolling numbers six our less six times in a row on a six sided die versus a twenty sided die: 97.4%

This also gives us handy guide for thinking about odds and probability. If someone says the odds are 10 to one, we're talking roughly 91%. If we switch to 20 to one, we're talking about roughly 95%. If someone says 100 to one, 99%. One thousand to one, 99.9%, ten thousand to one, 99.99%, and so on.

Agatha Christie is for Chumps

Okay, I'll admit it. I watch CSI. I only watch the original show, CSI:Miami and CSI:NY both leave me cold. I also love Quincy and Monk. I detest Colombo because they usually reveal the killer in the opening moments of the episode. I can't tolerate such nonsense. If I already know the identity of the killer, who cares if Columbo figures it out or not?

Along a similar line, I have zero respect for the Agatha Christie stories I've read or seen on TV. In most cases, one person is murdered by any one of a small number of people, perhaps as many as ten. Even if Poirot is a complete blunderer, outside of The Orient Express he has a one in ten chance of being correct assuming only one killer. This is admittedly a critical assumption, if we assume any combination of only five possible murderers are equally likely, Poirot's chances of determining the correct combination are much lower than 1 in 5, in the neighborhood of 1 in 30.

The real problem with Agatha Christie stories is that they fabricate a closed system to limit the scope of the problem to just a few possible criminals. With only ten people on the train, boat, or in the provincial English manor, only those few people may have contributed to the criminal event. In the real world, a crime is not a closed system, and many more than ten people might have contributed the evidence we find in a crime scene.

However, our much maligned author can contribute a useful idea to our toolbox: the notion of Prior Odds. I think about prior odds in the following fashion:

What is our chance of being "correct" if we choose a solution at random?

Prior odds are commonly utilized in mass disaster cases such as the World Trade Center disaster, Hurricane Katrina, and plane crashes. For example, if 301 people board a plane that later crashes, our expected chances of correctly identifying any one set of remains if we wildly guessed at random is about 1 in 301. The chances of identifying any one set of remains incorrectly is 300 in 301. So how do we get to prior odds from these two figures? The math is quite simple.

The odds of an event are the probability of the event divided by the probability of the non event. From above, our Prior Odds would be
(probability of correct ID)/(probability of incorrect ID).

So our prior odds are: (1/301)/(300/301) = 1 in 300.
To prevent nasty lawsuits, we want our scientific data that identifies remains as belonging to a passenger to have better odds than 300 to 1, maybe much better than 300 to one. It'd be a good idea to divide our statistics by 300 to see how well our data overcomes the 1 in 300 odds of being right.

What this means is that if our scientific analysis finds data that is 100 times more likely to belong to passenger #57 than a random person, there are only 1 out 3 odds of being correct because there are nearly 300 chances to be wrong. If there were only 10 people on the plane, the odds are instead around 10 to 1 in favor of being right.

A typical DNA profile matching an individual might be one trillion times more likely to be from that person than any one unrelated individual. However, when we are thinking about a large scale mass disaster like the 2004 Indian Ocean Tsunami where over one hundred thousand people were killed, prior odds that are at most 1/100,000 reduce our result to "only" ten million times more likely. If we imagine cases where a the missing person's toothbrush or other DNA-laden item is not available, we'd only have DNA from family members to identify a missing person. Typical DNA results might produce odds of one million to one, which when multiplied by our 1/100,000 number leaves only 10 to one.

Sounds pretty good, right? Here's the problem. If we have 10 to 1 odds of being right, that means we'll actually be wrong about 9% of the time. That means out of 100,000 dead people, we'd have on average 9000 wrong identifications. I suppose that might be acceptable to anyone other than the 9000 families with misidentified loved ones.

What follows next is some math to explain a little more of what I mean.

Sunday, April 16, 2006

The Best Uses for DNA: The Duke Case Continued

Another statement I disagree with from the original article:
The best use of DNA is excluding someone as the source of a particular sample," said Mark Rabil, a Winston-Salem lawyer who represented Darryl Hunt, a man freed in part by DNA evidence after serving 18 years in prison for a 1984 rape and murder.
I recoil at the use of the term "best". If DNA, as it currently existed, were "best" used as an exclusionary technique, then we could get by with a simple test that either said "match" or "no match". All of the statistics and studies of DNA profile rarity would be unimportant.

I won't belittle the benefit DNA has had for the wrongly convicted, but we shouldn't throw out the baby with the bathwater either. The primary benefit of forensic DNA technology as it exists today is not finding out who doesn't match the evidence, but understanding what it means when someone does match the evidence. The statistics that tell us that a profile is so rare it is found in only one out of ten trillion Caucasians allow us to interpret our results with confidence and present them honestly. We can then make intelligent statements about what fits best with the evidence.

If you read up on the current dust-up surrounding the admissibility of fingerprints into court under the Daubert standard, you'll understand why knowing what a match means is so important.

Saturday, April 15, 2006

The Durham Lacrosse Team Rape Case

A rape allegation involving Duke University's Lacrosse Team has quickly become a high-profile case. Scores of articles can be uncovered by searching news.google.com for "duke lacrosse rape". I don't have an opinion as to what happened, as I was not there and I have not personally reviewed any of the evidence. However, some egregious misstatements in a recent article have attracted my interest.

One belongs to the "willing the evidence into existence" category:
Lawyers representing some of the 46 players tested said the tests found no matching DNA on or in the woman. They contend that the results prove that no rape or sexual intercourse took place.

But the prosecutor disagrees, and the case isn't settled.

According to a U.S. Department of Justice study, DNA evidence from an attacker is successfully recovered in less than a quarter of sexual assault cases.
A statement in the Boston Globe directly attributes this figure to the DA:
Nifong said prosecutors were awaiting a second set of DNA results, but did not say how those differed from the tests reported Monday. Nifong added that in 75 percent to 80 percent of sexual assaults, there is no DNA evidence to analyze.
I can't tell from the media reports if no semen and no DNA were recovered from the victim, or if semen and DNA were found and the DNA that was found did not match any of the Lacrosse team. The difference is critical, as one finding supports rape by someone else while excluding the Lacrosse players, while the other does not support rape by anyone. What I'm stuck with is this: if semen and/or DNA had been found at all, why make the excuse that it's rarely found in the first place?

I'm unable to find any study that supports the DA's contention that DNA is rarely recoverable from rape cases. I know of reports (see below) that state that DNA from rape cases is rarely submitted to crime labs, but none that DNA is rarely recoverable. These two statements are completely different. Historically DNA was simply not collected or even submitted to crime labs due to any number of reasons, including that the victim had no idea who the attacker was. In the days before convicted offender DNA databanks, there was simply no way to know who the attacker might be without an eyewitness identification of some kind. Suspectless rape cases were simply a dead end. DNA was available, but without a suspect for comparison the results appeared useless.

I suspect that the 25% statement is a misquoting of a National Institute of Justice Report entitled The Unrealized Potential of DNA Testing, which states that of all of the reported rapes, about 40% were investigated by police, 9% provided DNA evidence to crime labs, and in 6% of cases was the DNA actually analyzed. If we adjust the numbers to include only those investigated by police, about 22% had DNA submitted to crime labs, and around 16% actually had the DNA processed. I'll add that these numbers are nine years old, and pre-date most of the "Cold Hit" programs that analyzed DNA evidence from suspectless rape cases. I wonder what the figures would be as of 2006?

The critical statement is that these numbers do not concern how much DNA is recoverable but actually recovered and analyzed.

Or maybe the figure came from somewhere else, since the quoted figure is 25% and not 20%. Maybe they're misreading this National Institute of Justice document entitled Convicted by Juries, Exonerated by Science: Case Studies in the Use of DNA Evidence to Establish Innocence After Trial, that in part states:

"Forensic DNA typing laboratories -- as numerous commentators have noted -- encounter rates of exclusion of suspected attackers in close to 25 percent of cases."

Which covers how often the initial suspect is the wrong man, not how often DNA is recovered from a case.

I cannot find any reference to a study that DNA is rarely recoverable from rapes. If anyone knows of the study, please let me know by posting to the comments of this article. I'll post a follow-up if I can find the study.

For the sake of giving the devil its due, let's do some math.

Not finding DNA in a rape case tells us plenty even when we take the 25% figure at face value. Proclaiming that "DNA is only recovered in 25% of rape cases" argues that a negative DNA result is essentially meaningless. This only makes sense if we refuse to examine alternative reasons for recovering no DNA. As before, let's assume that frequency and probability are the same.

Examined formally, this would be:

Probability of no DNA when rape did NOT occur: 100% or 1

Probability of no DNA when rape did occur: 100%-25% = 75% or .75

Relative probability of encountering no DNA when rape did not occur vs. rape did occur:

1/0.75 or 1.3333333... or about 1.3333 to one.

Therefore, not finding DNA is more likely if no rape occurred. The relative difference is small, but real.

Another way to think about this problem is in terms of reasonable doubt. Would the fact that no DNA was recovered make you cautious enough to have reasonable doubts? Or more appropriately, if we were the accused party, would we want the 1.33333 to one odds to play in our favor?

Even if DNA is recovered as rarely as 25% of the time in rape cases, the "no DNA" result is more likely if the rape did not occur. Before anyone gets angry with me, remember that I am not saying it's more likely that the rape did not occur, I'm saying that the fact that we have no DNA is more likely if the rape did not occur.