Lesson 53 – Sum of squares: The language of Chi-square distribution

The conversations with people at the local bar last night were the usual local and national politics. I usually join the pool table after a casual drink. Yesterday, upon the request of an old friend, I toyed with the game of darts.

In my heyday, I used to be good at this, so without much trouble, I could hit the bullseye after getting my sight.

Maybe it is the draught that did the trick. I felt jubilant.

A couple of tequilas and a few minutes later, I hoped to beat my previous record.

“Did you change the darts?”, “There’s something in my eye,” “Maybe they moved the target.” These were circling in my mind after my performance.

They did not move the target, they did not change the darts, but there was indeed something in my head. I decided to stop playing the next round and go home, lest someone else becomes the bullseye.

While heading back, I thought about how far off I was from the bullseye (origin). On the X-Y coordinate plane, the distance from origin is R=\sqrt{X^{2}+Y^{2}} .

The squared distance is R^{2}=X^{2}+Y^{2}.

It varies with each try. Random variable.

Today morning, while catching up on my news feed, I noticed this interesting article.

The article starts with a conversation between fans.

People bet millions on heads vs. tails. 

“…the toss is totally biased.” It reminded me of the famous article by Karl Pearson, “Science and Monte Carlo” in the fortnightly review published in 1894. He experimentally proved that the roulette wheels in Monte Carlo were biased.

He computed the error (observed – theoretical) (or squared error) and showed that the roulette outcomes in Monte Carlo could not happen by chance. He explained this based on two sets of 16,500 recorded throws of the ball in a roulette wheel during an eight week period in summer of 1892, manually, I may add. Imagine counting up 33,000 numbers, and performing various sets of hand calculations.

He also writes that he spent his vacation tossing a shilling 25,000 times for a coin-toss bias experiment.

These experiments led to the development of the relative squared error metric, the Pearson’s cumulative test statistic (Chi-square test statistic).

Some excerpts from his concluding paragraph.

“To sum up, then: Monte Carlo roulette, if judged by returns which are published apparently with the sanction of the Societe, is, if the laws of chance rule, from the standpoint of exact science the most prodigious miracle of the nineteenth century.

we are forced to accept as alternative that the random spinning of a roulette manufactured and daily readjusted with extraordinary care is not obedient to the laws of chance, but is chaotic in its manifestation!

By now you might be thinking, “What’s the point here. Maybe it’s his hangover writing.”

Hmm, the square talk is for the Chi-square distribution.

Last week, we visited Mumble’s office. He has transformed the data — log-normal distribution.

If Z follows a normal distribution, then, X = e^{Z} is a log-normal distribution because log of X is normal.

Exponentiating a normal distribution will result in log-normal distribution.

See for yourself, how Z \sim N(0,1) tranforms into X, a log-normal distribution. X is non-negative.

Similarly, squaring a normal distribution will result in a Chi-square distribution.

If Z follows a normal distribution, then, \chi = Z^{2} is a Chi-square distribution with one degree of freedom.

See how the same Z \sim N(0,1) tranforms into \chi, a Chi-square distribution, again non-negative, since we are squaring.

Let’s derive the probability distribution function for the Chi-square distribution using the fact that \chi = Z^{2}. We will assume Z is a standard normal distribution Z \sim N(0,1).

For this, we will start with the cumulative distribution function, F(\chi) and take its derivative to obtain the probability distribution function f(\chi) since f(\chi)=\frac{d}{d\chi}F(\chi).

\chi = Z^{2}

F(\chi) = P(\chi \le \chi) is the cumulative distribution function.

F_{\chi}(\chi) = P(Z^{2} \le \chi)

=P(-\sqrt{\chi} \le Z \le \sqrt{\chi})

=P(Z \le \sqrt{\chi}) - P(Z \le -\sqrt{\chi})

F_{\chi}(\chi) =F_{Z}(\sqrt{\chi}) - F_{Z}(-\sqrt{\chi})

f_{\chi}(\chi)=\frac{d}{d\chi}(F_{\chi}(\chi))

=\frac{d}{d\chi}(F_{Z}(\sqrt{\chi}) - F_{Z}(-\sqrt{\chi}))

Applying the fundamental theorem of calculus and chain rule together, we get,

=f_{Z}(\sqrt{\chi})*\frac{1}{2\sqrt{\chi}} + f_{Z}(-\sqrt{\chi})*\frac{1}{2\sqrt{\chi}}

=\frac{1}{2\sqrt{\chi}}*(f_{Z}(\sqrt{\chi})+f_{Z}(-\sqrt{\chi}))

By now, you are familiar with the probability distribution function for Z. f_{Z}(z) = \frac{1}{\sqrt{2\pi}}e^{\frac{-z^{2}}{2}}. Let’s use this.

f_{\chi}(\chi)=\frac{1}{2\sqrt{\chi}}*(\frac{1}{\sqrt{2\pi}}e^{\frac{-(\sqrt{\chi})^{2}}{2}}+\frac{1}{\sqrt{2\pi}}e^{\frac{-(-\sqrt{\chi})^{2}}{2}})

f_{\chi}(\chi)=\frac{1}{\sqrt{2 \pi \chi}}e^{\frac{-\chi}{2}} for \chi > 0.

As you can see, the function is only defined for \chi > 0. It is 0 otherwise.

With some careful observation, you can tell that this function is the Gamma density function with \lambda=\frac{1}{2} and r = \frac{1}{2}.

Yes, I know, it is not very obvious. Let me rearrange it for you, and you will see the pattern.

f(\chi) = \frac{1}{\sqrt{2 \pi \chi}}e^{-\frac{1}{2}\chi}

=\sqrt{\frac{1}{2}}*\sqrt{\frac{1}{\chi}}*\frac{1}{\sqrt{\pi}}*e^{-\frac{1}{2}\chi}

=\frac{(1/2)^{1/2}(\chi)^{-1/2}e^{-\frac{1}{2}\chi}}{\sqrt{\pi}}

Multiply and divide by 1/2.

=\frac{(1/2)*(1/2)^{1/2}*(\chi)^{-1/2}*e^{-\frac{1}{2}\chi}}{(1/2)\sqrt{\pi}}

=\frac{(1/2)*(1/2)^{-1/2}*(\chi)^{-1/2}*e^{-\frac{1}{2}\chi}}{\sqrt{\pi}}

If \lambda=1/2

f(\chi)=\frac{\lambda*(\lambda \chi)^{-1/2}*e^{-\lambda*\chi}}{\sqrt{\pi}}

Drawing from the factorial concepts, we can replace \sqrt{\pi} with (1/2)! or (-1/2)!, which means, r = 1/2

f(\chi)=\frac{\lambda*(\lambda \chi)^{r-1}*e^{-\lambda*\chi}}{(r-1)!}

This equation, as you know is the density function for the Gamma distribution.

So, Chi-square density function is Gamma density function with \lambda=1/2 and r=1/2. It is a special case of the Gamma distribution.

Now let’s up a level. Sum of squares of two standard normals, like our squared distance (X^{2}+Y^{2}).

\chi = Z_{1}^{2} + Z_{2}^{2}.

We know from lesson 46 on convolution that if X and Y are two independent random variables with probability density functions f_{X}(x) and f_{Y}(y), their sum Z = X + Y is a random variable with a probability density function f_{Z}(z) that is the convolution of f_{X}(x) and f_{Y}(y).

f(z) = f_{X}*f_{Y}(z) = \int_{-\infty}^{\infty}f_{X}(x)f_{Y}(z-x)dx

We can use the principles of convolution to derive the probability density function for \chi = Z_{1}^{2} + Z_{2}^{2}.

Let’s assume Z_{1}^{2}=k. Then, Z_{2}^{2}=\chi - k, and

f_{\chi}(\chi)=\int_{-\infty}^{\infty}f_{Z_{1}^{2}}(k)f_{Z_{2}^{2}}(\chi - k)dk

f_{Z_{1}^{2}} = \frac{1}{\sqrt{2 \pi k}}e^{-\frac{k}{2}}. This is the same function we derived above for \chi = Z^{2}. Using this,

f_{\chi}(\chi)=\int_{-\infty}^{\infty}\frac{1}{\sqrt{2 \pi k}}e^{-\frac{k}{2}}\frac{1}{\sqrt{2 \pi (\chi - k)}}e^{-\frac{(\chi - k)}{2}}dk

=\frac{1}{2\pi}\int_{0}^{\chi}k^{-\frac{1}{2}}e^{-\frac{k}{2}}(\chi - k)^{-\frac{1}{2}}e^{-\frac{(\chi - k)}{2}}dk

=\frac{1}{2\pi}e^{-\frac{\chi}{2}}\int_{0}^{\chi}k^{-\frac{1}{2}}(\chi - k)^{-\frac{1}{2}}dk

The term with the integral integrates to 2*arcsin(\frac{\sqrt{k}}{\sqrt{\chi}}), and its definite integral is \pi since arcsin(1) = \frac{\pi}{2}. Try it for yourself. Put your calculus classes to practice. 

We are left with

f_{\chi}(\chi) = \frac{1}{2}e^{-\frac{\chi}{2}}, again for \chi > 0.

This function is a Gamma distribution with \lambda = \frac{1}{2} and r = 1.

Generalization for n random normal variables

If there are n standard normal random variables, Z_{1}, Z_{2}, ..., Z_{n}, their sum of squares is a Chi-square distribution with n degrees of freedom.

\chi = Z_{1}^{2}+Z_{2}^{2}+ ... + Z_{n}^{2}

Its probability density function is a Gamma density function with \lambda=1/2 and r=n/2. You can derive it by induction.

f(\chi)=\frac{\frac{1}{2}*(\frac{1}{2} \chi)^{\frac{n}{2}-1}*e^{-\frac{1}{2}*\chi}}{(\frac{n}{2}-1)!} for \chi > 0 and 0 otherwise.

Look at this animation for Chi-square distribution with different degrees of freedom. See how it becomes more symmetric for large values of n (degrees of freedom).

We will revisit the Chi-square distribution when we learn hypotheses testing. Till then, the sum of squares and error square should remind you of Chi-square [Square – Square], and tequila square should remind you of

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

2 thoughts on “Lesson 53 – Sum of squares: The language of Chi-square distribution”

  1. Cool blog post! I enjoyed reading it.

    Just some feedback, if a noobie was reading this > “These experiments led to the development of the relative squared error metric, the Pearson’s cumulative test statistic.”; they could get confused as they might not know that Pearson’s cumulative test statistic is also referred to as the chi-squared test.

Comments are closed.

error

Enjoy this blog? Please spread the word :)