Lesson 95 – The Two-Sample Hypothesis Test – Part IV

On the Difference in Means

using Welch’s t-Test

H_{0}: \mu_{1} - \mu_{2} = 0

H_{A}: \mu_{1} - \mu_{2} > 0

H_{A}: \mu_{1} - \mu_{2} < 0

H_{A}: \mu_{1} - \mu_{2} \neq 0

On the 24th day of January 2021, we examined Tom’s hypothesis on the Mohawk Rivers’ arsenic levels.

After a lengthy expose on the fundamentals behind hypothesis testing on the difference in means using a two-sample t-Test, we concluded that Tom could not reject the null hypothesis H_{0}: \mu_{1}-\mu_{2}=0.

He cannot count on the theory that the factory is illegally dumping their untreated waste into the west branch Mohawk River until he finds more evidence.

However, Tom now has a new theory that the factory is illegally dumping their untreated waste into both the west and the east branches of the Mohawk River. So, he took Ron with him to collect data from Utica River, a tributary of Mohawk that branches off right before the factory.

If his new theory is correct, he should find that the mean arsenic concentration in either west or east branches should be significantly greater than the mean arsenic concentration in the Utica River.

There are now three samples whose concentration in parts per billion are:

West Branch: 3, 7, 25, 10, 15, 6, 12, 25, 15, 7
n_{1}=10 | \bar{x_{1}} = 12.5 | s_{1}^{2} = 58.28

East Branch: 4, 6, 24, 11, 14, 7, 11, 25, 13, 5
n_{2}=10 | \bar{x_{2}} = 12 | s_{2}^{2} = 54.89

Utica River: 4, 4, 6, 4, 5, 7, 8
n_{3}=7 | \bar{x_{3}} = 5.43 | s_{3}^{2} = 2.62

Were you able to help Tom with his new hypothesis?

In his first hypothesis test, since the sample variances were close to each other (58.28 and 54.89), we assumed that the population variances are equal and proceeded with a t-Test.

Under the proposition that the population variance of two random variables X_{1} and X_{2} are equal, i.e., \sigma_{1}^{2}=\sigma_{2}^{2}=\sigma^{2}, the test-statistic is t_{0}=\frac{\bar{x_{1}}-\bar{x_{2}}}{\sqrt{s^{2}(\frac{1}{n_{1}}+\frac{1}{n_{2}})}}, where s^{2}=(\frac{n_{1}-1}{n_{1}+n_{2}-2})s_{1}^{2}+(\frac{n_{2}-1}{n_{1}+n_{2}-2})s_{2}^{2} is the pooled variance. t_{0} follows a T-distribution with n_{1}+n_{2}-2 degrees of freedom.

But, can we make the same assumption in the new case? The Utica River sample has a sample variance s_{3}^{2} of 2.62. It will not be reasonable to assume that the population variances are equal.

How do we proceed when \sigma_{1}^{2} \neq \sigma_{2}^{2}?

Let’s go back a few steps and outline how we arrived at the test-statistic.

The hypothesis test is on the difference in means: \mu_{1} - \mu_{2}

A good estimator of the difference in population means is the difference in sample means: y = \bar{x_{1}} - \bar{x_{2}}

The expected value of y, E[y], is \mu_{1}-\mu_{2}, and the variance of y, V[y], is \frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}.

Since y \sim N(\mu_{1}-\mu_{2},\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}), its standardized version z = \frac{y-(\mu_{1}-\mu_{2})}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}} is the starting point to deduce the test-statistic.

This statistic reduces to z = \frac{\bar{x_{1}}-\bar{x_{2}}}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}} under the null hypothesis that \mu_{1}-\mu_{2}=0.

Last week, we entertained the idea that we are comparing the difference in the means of two populations whose variance is equal and reasoned that the test-statistic follows a T-distribution with (n_{1}+n_{2}-2) degrees of freedom.

This is because the pooled population variance \sigma^{2} can be replaced by its unbiased estimator s^{2}, which, in turn is related to a Chi-squared distribution with (n_{1}+n_{2}-2) degrees of freedom.

When the population variance is not equal, i.e., when \sigma_{1}^{2} \neq \sigma_{2}^{2}, there is no pooled variance which can be related to the Chi-square distribution.

The best estimate of V[y] is obtained by replacing the individual population variances (\sigma_{1}^{2}, \sigma_{2}^{2}) by the sample variances (s_{1}^{2}, s_{2}^{2}).

Hence,

z = \frac{\bar{x_{1}}-\bar{x_{2}}}{\sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}}}

We should now identify the limiting distribution of V[y]

Bernard Lewis Welch, a British statistician, in his works in 1936, 1938, and 1947 explained that, with some adjustments, V[y] can be approximated to a Chi-square distribution, and hence the test-statistic \frac{\bar{x_{1}}-\bar{x_{2}}}{\sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}}} can be appriximated to a T-distribution.

Let’s digest the salient points of his work

Assume \lambda_{1} = \frac{1}{n_{1}} and \lambda_{2} = \frac{1}{n_{2}}

Assume f_{1} = (n_{1}-1) and f_{2} = (n_{2}-1)

An estimate for the variance of y is \lambda_{1}s_{1}^{2}+\lambda_{2}s_{2}^{2}

Now, in Lesson 73, we learned that \frac{(n-1)s^{2}}{\sigma^{2}} is a Chi-square distribution. Based on this logic, we can write

s^{2} = \frac{1}{n-1}\sigma^{2}*\chi^{2}

So, \lambda_{1}s_{1}^{2}+\lambda_{2}s_{2}^{2} is of the form,

\lambda_{1}\frac{1}{n_{1}-1}\sigma_{1}^{2}\chi_{1}^{2}+\lambda_{2}\frac{1}{n_{2}-1}\sigma_{2}^{2}\chi_{2}^{2}

or,

\lambda_{1}s_{1}^{2}+\lambda_{2}s_{2}^{2} = a\chi_{1}^{2}+b\chi_{2}^{2}

a = \frac{\lambda_{1}\sigma_{1}^{2}}{f_{1}} | b = \frac{\lambda_{2}\sigma_{2}^{2}}{f_{2}}

Welch showed that if z = a\chi_{1}^{2}+b\chi_{2}^{2}, then, the distribution of z can be approximated using a Chi-square distribution with a random variable \chi=\frac{z}{g} and f degrees of freedom.

He found the constants f and g by equating the moments of z with the moments of this Chi-square distribution, i.e., the Chi-square distribution where the random variable \chi is \frac{z}{g}.

This is how he finds f and g

The first and the second moments of a general Chi-square distribution are the degrees of freedom (f) and two times the degrees of freedom (2f).

The random variable we considered is \chi=\frac{z}{g}.

So,

E[\frac{z}{g}] = f | V[\frac{z}{g}] = 2f

Since g is a constant that needs to be estimated, we can reduce these equations as

\frac{1}{g}E[z] = f | \frac{1}{g^{2}}V[z] = 2f

Hence,

E[z] = gf | V[z] = 2g^{2}f

Now, let’s take the equation z = a\chi_{1}^{2}+b\chi_{2}^{2} and find the expected value and the variance of z.

E[z] = aE[\chi_{1}^{2}]+bE[\chi_{2}^{2}]

E[z] = af_{1}+bf_{2}

V[z] = a^{2}V[\chi_{1}^{2}]+b^{2}V[\chi_{2}^{2}]

V[z] = a^{2}2f_{1}+b^{2}2f_{2}=2a^{2}f_{1}+2b^{2}f_{2}

Now, he equates the moments derived using the equation for z to the moments derived from the Chi-square distribution of \frac{z}{g}

Equating the first moments: gf=af_{1}+bf_{2}

Equating the second moments: 2g^{2}f = 2a^{2}f_{1}+2b^{2}f_{2}

The above equation can be written as

g*(gf) = a^{2}f_{1}+b^{2}f_{2}

or,

g*(af_{1}+bf_{2}) = a^{2}f_{1}+b^{2}f_{2}

From here,

g = \frac{a^{2}f_{1}+b^{2}f_{2}}{af_{1}+bf_{2}}

Using this with gf=af_{1}+bf_{2}, we can obtain f.

f = \frac{(af_{1}+bf_{2})^{2}}{a^{2}f_{1}+b^{2}f_{2}}

Since we know the terms for a and b, we can say,

f = \frac{(\lambda_{1}\sigma_{1}^{2}+\lambda_{2}\sigma_{2}^{2})^{2}}{\frac{\lambda_{1}^{2}\sigma_{1}^{4}}{f_{1}}+\frac{\lambda_{2}^{2}\sigma_{2}^{4}}{f_{2}}}

Since the variance of y follows an approximate Chi-square distribution with f degrees of freedom, we can assume that the test-statistic \frac{\bar{x_{1}}-\bar{x_{2}}}{\sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}}} follows an approximate T-distribution with f degrees of freedom.

Welch also showed that an unbiased estimate for f is

f = \frac{(\lambda_{1}s_{1}^{2}+\lambda_{2}s_{2}^{2})^{2}}{\frac{\lambda_{1}^{2}s_{1}^{4}}{f_{1}+2}+\frac{\lambda_{2}^{2}s_{2}^{4}}{f_{2}+2}} - 2

Essentially, he replaces s_{1}^{2} for \sigma_{1}^{2}, and s_{2}^{2} for \sigma_{2}^{2}, and to correct for the bias, he adds 2 to the degrees of freedom in the denominator, and substracts an overall 2 from this fraction. He argues that this correction produces the best unbiased estimate for f.

Later authors like Franklin E. Satterthwaite showed that the bias correction might not be necessary, and it would suffice to use s_{1}^{2} for \sigma_{1}^{2}, and s_{2}^{2} for \sigma_{2}^{2} in the original equation, as in,

f = \frac{(\lambda_{1}s_{1}^{2}+\lambda_{2}s_{2}^{2})^{2}}{\frac{\lambda_{1}^{2}s_{1}^{4}}{f_{1}}+\frac{\lambda_{2}^{2}s_{2}^{4}}{f_{2}}}

Since we know that \lambda_{1} = \frac{1}{n_{1}}, \lambda_{2} = \frac{1}{n_{2}}, f_{1} = (n_{1}-1), and f_{2} = (n_{2}-1), we can finally say

When the population variances are not equal, i.e., when \sigma_{1}^{2} \neq \sigma_{2}^{2}, the test-statistic is t_{0}^{*}=\frac{\bar{x_{1}}-\bar{x_{2}}}{\sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}}}, and it follows an approximate T-distribution with f degrees of freedom.

The degrees of freedom can be estimated as

f = \frac{(\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}})^{2}}{\frac{(s_{1}^{2}/n_{1})^{2}}{(n_{1} - 1) + 2}+\frac{(s_{2}^{2}/n_{2})^{2}}{(n_{2}-1)+2}} - 2

or,

f = \frac{(\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}})^{2}}{\frac{(s_{1}^{2}/n_{1})^{2}}{(n_{1} - 1)}+\frac{(s_{2}^{2}/n_{2})^{2}}{(n_{2}-1)}}

This is now popularly known as Welch's t-Test.

Let’s now go back to Tom and help him with his new theory.

We will compare the west branch Mohawk River with the Utica River.

West Branch: 3, 7, 25, 10, 15, 6, 12, 25, 15, 7
n_{1}=10 | \bar{x_{1}} = 12.5 | s_{1}^{2} = 58.28

Utica River: 4, 4, 6, 4, 5, 7, 8
n_{3}=7 | \bar{x_{3}} = 5.43 | s_{3}^{2} = 2.62

Since we cannot assume that the population variances are equal, we will use Welch’s t-Test.

We compute the test-statistic and check how likely it is to see such a value in a T-distribution (approximate null distribution) with so many degrees of freedom.

t_{0}^{*}=\frac{\bar{x_{1}}-\bar{x_{3}}}{\sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{3}^{2}}{n_{3}}}}

t_{0}^{*}=\frac{12.5-5.43}{\sqrt{\frac{58.28}{10}+\frac{2.62}{7}}}

t_{0}^{*}=2.84

Let’s compute the bias-corrected degrees of freedom suggested by Welch.

f = \frac{(\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}})^{2}}{\frac{(s_{1}^{2}/n_{1})^{2}}{(n_{1} - 1) + 2}+\frac{(s_{2}^{2}/n_{2})^{2}}{(n_{2}-1)+2}} - 2

f = \frac{(\frac{58.28}{10}+\frac{2.62}{7})^{2}}{\frac{(58.28/10)^{2}}{(10 - 1) + 2}+\frac{(2.62/7)^{2}}{(7-1)+2}} - 2

f = 10.38

We can round it down to 10 degrees of freedom.

The test-statistic is 2.84. Since the alternate hypothesis is that the difference is greater than zero, Tom has to verify how likely it is to see a value greater than 2.84 in the approximate null distribution. Tom has to reject the null hypothesis if this probability (p-value) is smaller than the selected rate of rejection. 

Look at this visual.

The distribution is an approximate T-distribution with 10 degrees of freedom. Since he opted for a rejection level of 10%, there is a cutoff on the distribution at 1.37.

1.37 is the quantile on the right tail corresponding to a 10% probability (rate of rejection) for a T-distribution with ten degrees of freedom.

If the test statistic (t_{0}^{*}) is greater than t_{critical}, which is 1.37, he will reject the null hypothesis. At that point (i.e., at values greater than 1.37), there would be sufficient confidence to say that the difference is significantly greater than zero.

It is equivalent to rejecting the null hypothesis if P(T > t_{0}^{*}) (the p-value) is less than \alpha

We can read t_{critical} off the standard T-table, or we can compute P(T > t_{0}^{*}) from the distribution.

At ten degrees of freedom (df=10), and \alpha=0.1, t_{critical}=1.372 and P(T>t_{0^{*}})=0.009.

Since the test-statistic t_{0}^{*} is in the rejection region, or since p-value < \alpha, Tom can reject the null hypothesis H_{0} that \mu_{1}-\mu_{2}=0.
He now has evidence beyond statistical doubt to claim that the factory is illegally dumping their untreated waste into the west branch Mohawk River.

Is it time for a lawsuit?

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

error

Enjoy this blog? Please spread the word :)