Lesson 68 – Bias joins Variance

In the last week’s “guess the age” contest, Madam Chairman Yellen was the clear winner. The expected value of the estimated age was 69.69 years, a nominal bias of 1.31 years from her true age.

However, the great Thomas Sowell is the clear winner of our hearts. May he always remain a generation young, like his ideas.

Look at the table once again.

Can you notice the variation in the estimates of the groups?

Yes, Chairman Yellen’s estimated age ranges from 62 years to 78 years, while Dr. Sowell’s estimated age ranges from 55 years to 70 years.

Based on the data from the 16 groups, I computed the variance (and the standard deviation) of the estimated ages.

For Chairman Yellen, the variance of the estimated age is 22.10 and the standard deviation of the estimated age is \sqrt{22.10}=4.7 years.

For Dr. Sowell, the variance is 23.98 and the standard deviation is \sqrt{23.98}=4.9 years.

The variance of the estimator is an important property. It tells us about the spread of the estimate, i.e., how widely the estimate is distributed.

For any estimator \hat{\theta}, we compute the expected value E[\hat{\theta}] as a measure of the central tendency and how far it could be from the truth – Bias, and V[\hat{\theta}] as a measure of the spread of the distribution.

An estimator should be understood as a probability distribution.

The standard deviation of the estimator is called the standard error of an estimator. In our age guessing experiment, the standard error of Dr. Sowell’s estimated age is 4.9 years.

The standard error of \hat{\theta} is \sqrt{V[\hat{\theta}]}.


In lesson 67, while learning the concept of bias, we deduced that the sample mean \hat{\mu} = \bar{x} = \frac{1}{n}{\displaystyle \sum_{i=1}^{n}x_{i}} is an unbiased estimate of the population mean \mu.

In other words, we derived that E[\hat{\mu}] = \mu .

Now, let’s derive the variance of the sample mean V[\hat{\mu}] or V[\bar{x}].

For this, let us consider that we have a random sample x_{1}, x_{2}, ..., x_{n} of size n from a population with a mean \mu and variance \sigma^{2}.

Since x_{1}, x_{2}, ..., x_{n} are independent and identically distributed random samples from the population with true parameters \mu and \sigma^{2}, they will have the same expected value and variance as the population. E[x_{i}]=\mu and V[x_{i}]=\sigma^{2}.

\hat{\mu} = \frac{1}{n}{\displaystyle \sum_{i=1}^{n}x_{i}} is the sample mean.

V[\hat{\mu}] = V[\frac{1}{n}{\displaystyle \sum_{i=1}^{n}x_{i}}]

V[\hat{\mu}] = V[\frac{1}{n}(x_{1}+x_{2}+...+x_{n})]

V[\hat{\mu}] = \frac{1}{n^{2}}V[(x_{1}+x_{2}+...+x_{n})]

Where did that come from? Was it lesson 27? V[aX]=a^{2}V[X]

V[\hat{\mu}] = \frac{1}{n^{2}}(V[x_{1}]+V[x_{2}]+...+V[x_{n}])

Because the samples are independent, the covariance terms do not exist in the summation.

 V[\hat{\mu}] = \frac{1}{n^{2}}(\sigma^{2}+\sigma^{2}+...+\sigma^{2})=\frac{1}{n^{2}}*n \sigma^{2}=\frac{\sigma^{2}}{n}

The variance of the sample mean V[\hat{\mu}] = V[\bar{x}] =\frac{\sigma^{2}}{n}.

The standard error of the sample mean is \frac{\sigma}{\sqrt{n}}. It is a measure of the precision of the estimator.


Joe has a question

Do you remember last week’s dilemma?

Joe wants to choose a path; an estimator for the true population mean.

We tried to help him by computing the bias of each of the three estimators. The idea was to choose the one that has no bias. But it turned out that all three estimators are unbiased.

How then, can we select the best estimator?

Look at the estimators again.

\hat{\mu_{1}} = \frac{1}{n}{\displaystyle \sum_{i=1}^{n}x_{i}}

\hat{\mu_{2}} = x_{[1]}+\frac{x_{[n]}-x_{[1]}}{2}

\hat{\mu_{3}} = median(x_{1},x_{2},...,x_{n})

I told you that estimators should be understood as probability distributions and that we can compute the expected value E[\hat{\theta}] for the bias, and V[\hat{\theta}] to measure of the spread of the distribution.

Let’s compute the variance of these three estimators.

V[\hat{\mu_{1}}] = V[\frac{1}{n}{\displaystyle \sum_{i=1}^{n}x_{i}}]

\hat{\mu_{1}} is the sample mean. We just derived its variance as V[\hat{\mu_{1}}] = \frac{\sigma^{2}}{n}.

So let’s work on the second estimator.

V[\hat{\mu_{2}}] = V[x_{[1]}+\frac{x_{[n]}-x_{[1]}}{2}]

V[\hat{\mu_{2}}] = V[ x_{[1]} + \frac{1}{2}x_{[n]} - \frac{1}{2}x_{[1]}] = V[\frac{1}{2}x_{[1]} + \frac{1}{2}x_{[n]}]

V[\hat{\mu_{2}}] = \frac{1}{4}\sigma^{2} + \frac{1}{4}\sigma^{2}

V[\hat{\mu_{2}}] = \frac{1}{2}\sigma^{2}

The variance of the third estimators \hat{\mu_{3}} will be \sigma^{2} if the sample size is odd-numbered, or \frac{1}{2}\sigma^{2} if the sample size is even-numbered.

So, the three estimators \hat{\mu_{1}}, \hat{\mu_{2}}, \hat{\mu_{3}} are all unbiased. Their variances are as follows:

V[\hat{\mu_{1}}]=\frac{\sigma^{2}}{n}

V[\hat{\mu_{2}}]=\frac{\sigma^{2}}{2}

V[\hat{\mu_{2}}]=\sigma^{2} or \frac{\sigma^{2}}{2}

The three estimator probability distributions are centered on the truth (\mu), but their spreads are different. If the sample size n is greater than 2 (n>2), estimator \hat{\mu_{1}} has the lowest variance.

They will look like this visually.

Among all the estimators that are unbiased, we choose the one which has minimum variance. \hat{\mu_{1}} is the minimum variance unbiased estimator. It is more likely to produce an estimate close to \mu, the truth.

Among all estimators which are unbiased, choose the one which has minimum variance. This chosen \hat{\theta} is called the minimum variance unbiased estimator (MVUE) of \theta, and it is most likely among all unbiased estimators to produce an estimate close to the truth. This principle is called the principle of minimum variance unbiased estimation.


You must be wondering what if an estimator has low variance, but is biased. Like this.

Isn’t \hat{\theta_{1}} more likely to produce an estimate close to the truth?

Perhaps.

So there should be a way to combine bias and variance into one measure to help us with the selection.

The mean squared error is that measure. Let’s see how to derive this measure.

We have an estimator \hat{\theta} for the true paramter \theta.

The error of this estimator from the truth is \hat{\theta}-\theta.

The squared error is (\hat{\theta}-\theta)^{2}.

The mean squared error (MSE) is the expected value of the squared error.

E[(\hat{\theta}-\theta)^{2}]

MSE = E[(\hat{\theta}-\theta)^{2}]

MSE = V[\hat{\theta}-\theta] + (E[\hat{\theta}-\theta])^{2}

Can you tell how we wrote the above expression?

MSE = V[\hat{\theta}] + V[\theta] + (E[\hat{\theta}] - E[\theta])^{2}

MSE = V[\hat{\theta}] + (E[\hat{\theta}] - \theta)^{2}

This is because, E[\theta]=\theta and V[\theta]=0.

V[\hat{\theta}] is the variance of the estimator.

E[\hat{\theta}] - \theta is the bias of the estimator.

So, MSE = Variance + Bias^{2}

For unbiased estimators, the mean squared error is equal to the variance of the estimator.

We can use this measure to compare two estimators. If MSE(\hat{\theta_{1}})<MSE(\hat{\theta_{2}}) we can say that \hat{\theta_{1}} is a better estimator, a more efficient estimator in producing the truth.

The ratio \frac{MSE(\hat{\theta_{1}})}{MSE(\hat{\theta_{2}})} is called relative efficiency of \hat{\theta_{2}} to \hat{\theta_{1}}.

If you can create an estimator that has low variance, but you have to compromise on the bias, you can go for it, as long as the reduced variance is greater than the squared bias.

In estimating true parameters, there is always a tradeoff between bias and variance.

To close off for the week, let me ask you three questions.

If \frac{MSE(\hat{\theta_{1}})}{MSE(\hat{\theta_{2}})} is the relative efficiency, what is consistency?

Is the estimator for variance \hat{\sigma^{2}}=\frac{1}{n}{\displaystyle \sum_{i=1}^{n} (x_{i}-\mu)^{2}} unbiased? Did you do your homework?

When was the last time you were on RStudio?

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

error

Enjoy this blog? Please spread the word :)