Lesson 48 – Normal is the limit

What is this? Is it normal?

The shape is limited to a bell. Is it normal?

It is the same for any variable. Is it normal?

Why is it normal?

What is the normal?

The binomial distribution is the number of successes in n trials. It is the sum of n Bernoulli random variables.

S_{n} = X_{1} + X_{2} + ... + X_{n}, where  X_{i} \in (0,1) \hspace{5} \forall i

A mathematical approximation for the binomial distribution with a large number of trials is the Poisson distribution. We know that the average number of events in an interval ( \lambda ) is the expected number of successes np.

The wait time for the ‘r’th arrival follows a Gamma distribution. Gamma distribution is the sum of r exponential random variables.

 T_{r} = t_{1} + t_{2} + t_{3} + ... + t_{r}

What you observed in the animations last week, and what you saw now for the Binomial, Poisson, and Gamma as examples, is that the sum of random variables is tending towards a particular shape (distribution function).

This observation is “central” to probability theory.

It is called the Central Limit Theorem.

If S_{n} is the sum of  n independent random variables, then the distribution function of  S_{n} can be well-approximated by a continuous function knows as the normal density function given by

f(x) = \frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{\frac{-1}{2}(\frac{x-\mu}{\sigma})^{2}}

where  \mu and \sigma^{2} are the expected value and variance of the original distribution.

It was first proved by a French mathematician Abraham de Moivre in the early 1700s. He showed this in one of the chapters of his thesis, The Doctrine of Chances. Page 243: “A Method of approximating the Sum of the Terms of the Binomial  (a+b)^{n} expanded into a Series, from whence are deduced some practical Rules to estimate the Degree of Assent which is to be given to Experiments.

An interesting observation from his thesis.

As you can see, he derived useful approximations to Binomial series. Imagine computing factorials for large values of n in those times.

It turns out that the binomial distribution can be estimated very accurately using the normal density function.

 f(x) = \frac{n!}{(n-x)!x!}p^{x}(1-p)^{n-x} = \frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{\frac{-1}{2}(\frac{x-\mu}{\sigma})^{2}}

I compiled the modern version of this derivation. It is long with all the steps.

Please CLICK HERE to read and understand the details.

Follow it through the end. You will feel good to see how Binomial converges in the limit to this symmetric distribution.

Most probability distributions are related some way or the other to independent Bernoulli trails (the root events). If you carefully look at the probability distribution functions for each of them and take it to the limit as n \rightarrow \infty, you will see how the normal distribution emerges as the limiting distribution.

That is why it is normal.

The intuition from convolution

A very intuitive and elegant way of understanding the Central Limit Theorem and why the bell shape emerges due to convergence in the center of the distribution is provided in Chapter 9 (Preasymptotic and Central Limit in the Real World) of Silent Risk, technical notes on probability by Nassim Nicholas Taleb.

The gist is as follows.

We are deriving the distribution function for the summation of the random variables.

S_{n} = X_{1} + X_{2} + ... + X_{n}

You know from lesson 46 that this is convolution.

If X and Y are two independent random variables with probability density functions f_{X}(x) and f_{Y}(y), their sum Z = X + Y is a random variable with a probability density function f_{Z}(z) that is the convolution of f_{X}(x) and f_{Y}(y).

 f(z) = f_{X}*f_{Y}(z) = \int_{-\infty}^{\infty}f_{X}(x)f_{Y}(z-x)dx

Convolution is the multiplication of functions.

The probability distribution (for example, f_{Y}(z-x)) is weighted using another function (for example, f_{X}(x)). When we repeat this through induction, we smooth out the function at the center till it gets a bell-like shape and the tails become thin.

In Chapter 9, he provides an example convolution of uniform random variables.

Convolution of two uniform random variables (S_{n} = X_{1} + X_{2} ) is a triangular function (piecewise linear).

Convolution of this triangular function with another uniform random variable (rectangular function) (S_{n} = S_{n-1} + X_{3} ) will now yield a quadratic function.

As the samples grow (n becomes large), these function multiplications yield a smooth center heavy and thin tailed bell function — the normal density.

Look at this animation. See how the uniform rectangular function X_{1} becomes a triangle (X_{1} + X_{2}) and quickly converges to a normal density for n = 5.

This one is for the sum of Poisson random variables.

Notice that even for n = 10, the distribution is not fully normal. It is not converging fast. And that is important to remember.

While the Central Limit Theorem is central to the probability theory, and it is a fundamental assumption for many concepts we will learn later, we should know that some distributions converge quickly, some do not.

We will learn about the normal distribution in detail in the next few lessons. As you prepare for the normal distribution, I will leave you with a comment posted last week about the normal distribution by a good friend who works in the insurance industry.

“As has been shown time and again, there is no such thing as a ‘normal’ distribution in the real world -;)”

Well, what can I say, he works on real work risk with real dollars at stake. Always take the word of a person who has skin in the game.

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

error

Enjoy this blog? Please spread the word :)