Lesson 67 – Bias

“Nooooo,” “No way…” “Ooooh,” “Yessss,” “I told ya,” “I knew it.”

These are some of the funny reactions from our high-school students when the true age is finally revealed; because the bias in their estimation is also revealed.

Every year, I start off the data analysis boot camp with this game from the bag of tricks for teaching statistics by Andrew Gelman.

The students sit in groups of four and look at rolling pictures to guess their ages. They have their own formulas (estimators) for the age!

What you saw in the opening image are the guessed ages of the former Chairman of the Federal Reserve, Jannet Yellen and the great economist Thomas Sowell.

Do you see anything peculiar?

Yes. The estimated age of Chairman Yellen is somewhat close to her true age (as per Wikipedia) and the estimated age of Dr. Sowell is not even close (again, as per Wikipedia).

Let’s compute the average of all the guesses, the expected value of the estimated age.

For Chairman Yellen it is 69.69 years, a difference of 1.31 years from her true age.

Pretty close to the truth. The students’ estimate has very little bias.

For Dr. Sowell, it is 61.88 years, a difference of 25.13 years from his true age.

Far from the truth. The students’ estimate is biased.

Let’s spend a little more time on the estimated ages.

There are 16 groups that guessed the ages based on a picture of the person. Let us use a notation \hat{\theta} for this estimate. So, after heavy thinking and vehement deliberation within the group, the estimate from group A is \hat{\theta}=73 for Chairman Yellen.

Group B is also filled with thinkers and negotiators, so their estimate for Chairman Yellen’s age is \hat{\theta}=70.

Likewise, the 16 groups provided a distribution of estimates. The expected value E[\hat{\theta}] of this distribution is 69.69 years.

The true age \theta is 71 years.

The bias is E[\hat{\theta}]-\theta=-1.31

Look at this visual. I am taking the 16 estimates and plotting them as a histogram (or a probability density). The true age is also shown in the histogram. The estimates are distributed about the true age. The mean of this distribution is very close to 71.

Dr. Sowell on the other hand, somehow seemed younger in the minds of the high-school students. The expected value of the estimate E[\hat{\theta}] is 61.88 years. The true age is 87 years, a bias (E[\hat{\theta}]-\theta) of around 25 years.The bias could be because of the picture I am using, his pleasant smile or his aura!

This idea of measuring the closeness of the estimate to the true parameter is called the bias of the estimate.

An estimator (\hat{\theta}) (i.e., a formulation to compute the true population parameter) is unbiased if on average it produces values that are close to the truth. E[\hat{\theta}]=\theta. In other words, if we have a distribution of the estimates, and if their mean is close (or centered) to the truth, the estimator is unbiased.

♦Θ♦


In lesson 63, we learned that the maximum likelihood estimator of p, the true probability of success for a Binomial distribution is \hat{p}=\frac{r}{n}, where r is the number of successes in a sequence of n independent and identically distributed Bernoulli random numbers. Do you think \hat{p} is unbiased?

Let’s check out.

The bias of \hat{p} is defined by E[\hat{p}]-p.

E[\hat{p}]=E[\frac{r}{n}]=\frac{1}{n}E[r]=\frac{1}{n}*np=p

The key point in the above derivation is that the expected value of the number of successes (r) for a binomial distribution is np.

The estimate \hat{p}=\frac{r}{n} is an unbiased estimate of the true probability of success.

The distribution of the estimate \hat{p} will be centered on the true probability p.

§⇔§

The maximum likelihood estimators for the Normal distribution are \hat{\mu} = \frac{1}{n}{\displaystyle \sum_{i=1}^{n}x_{i}} and \hat{\sigma^{2}}=\frac{1}{n}{\displaystyle \sum_{i=1}^{n} (x_{i}-\mu)^{2}}. Do you think they are unbiased?

\hat{\mu} = \frac{1}{n}{\displaystyle \sum_{i=1}^{n}x_{i}} is the sample mean. Let’s compute the bias of this estimate.

E[\hat{\mu}] = E[\frac{1}{n}{\displaystyle \sum_{i=1}^{n}x_{i}}]

E[\hat{\mu}] = E[\frac{1}{n}(x_{1}+x_{2}+...+x_{n})]

E[\hat{\mu}] = \frac{1}{n}E[(x_{1}+x_{2}+...+x_{n})]

E[\hat{\mu}] = \frac{1}{n}(E[x_{1}]+E[x_{2}]+...+E[x_{n}])

Since x_{1}, x_{2}, ..., x_{n} are random samples from the population with true parameters \mu and \sigma, E[x_{i}]=\mu and V[x_{i}]=\sigma^{2}. In other words, since x_{i}s are independent and identically distributed, they will have the same expected value and the variance as the population.

So, E[\hat{\mu}] = \frac{1}{n}(\mu+\mu+...+\mu)=\frac{1}{n}*n \mu=\mu

The sample mean \frac{1}{n}{\displaystyle \sum_{i=1}^{n}x_{i}} is an unbiased estimate of the population mean \mu.

What about the variance estimator \hat{\sigma^{2}}=\frac{1}{n}{\displaystyle \sum_{i=1}^{n} (x_{i}-\mu)^{2}}?

You know the chore now. Compute the bias of this estimator and find out for yourself. Passive minds have to wait until next week.

♦Θ♦


Joe is a smart chap. The other day, he asked me if he could use any other estimator to get the population mean (truth). I probed him further, and he came up with three estimators to estimate the true mean \mu.

\hat{\mu_{1}} = \frac{1}{n}{\displaystyle \sum_{i=1}^{n}x_{i}}

\hat{\mu_{2}} = x_{[1]}+\frac{x_{[n]}-x_{[1]}}{2}

\hat{\mu_{3}} = median(x_{1},x_{2},...,x_{n})

The first estimator \hat{\mu_{1}} is known to all of us. It is the sample mean.

The second estimator \hat{\mu_{2}} is computing the central tendency as the sum of the smallest value x_{[1]} and half the range of the data \frac{x_{[n]}-x_{[1]}}{2}. It is a reasonable way to tell where the center will be.

The third estimator \hat{\mu_{3}} is the median of the sample.

Can you help Joe choose the correct path?

Your immediate reaction would be, “compute the bias for each estimator and select the one that has the smallest bias.”

Fair enough. Let’s try that then.

We already showed that the first estimator \hat{\mu_{1}} = \frac{1}{n}{\displaystyle \sum_{i=1}^{n}x_{i}} is unbiased.

Employing the bias computation for the second estimator, we can see that

E[\hat{\mu_{2}}] = E[x_{[1]}+\frac{x_{[n]}-x_{[1]}}{2}] = E[x_{[1]}] + \frac{1}{2}(E[x_{[n]}]-E[x_{[1]}])

E[\hat{\mu_{2}}] = \mu + \frac{1}{2}(\mu-\mu)=\mu

The second estimator is also unbiased.

Now, depending on whether there is even number of samples or an odd number of samples, you can derive that the third estimator is also unbiased.

How then, can Joe select the best estimator?

To be continued…

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

error

Enjoy this blog? Please spread the word :)