Lesson 50 – The Standard Normal

Tom works in FedEx. He is responsible for weighing the parcels and estimating the cost of shipping. He verified the previous data on daily packages and found that the probability distribution resembled a normal distribution with a mean of 12 lbs and standard deviation of 3.5 lbs. There is an extra charge if the package weighs more than 20 lbs. Tom wants to know the probability that the parcels he weighs are within 20 lbs.

 P(X \le 20) = \int_{-\infty}^{20}\frac{1}{\sqrt{2 \pi 3.5^{2}}} e^{\frac{-1}{2}(\frac{x-12}{3.5})^{2}}dx

Perhaps you can help Tom. Did you solve the integral of the normal density function from last week?

Dick is excited to start his first job as a construction site engineer for H&H Constructions. He was told that the pH of the soil on this site follows a normal distribution with a mean pH of 6 and a standard deviation of 0.1. Yesterday he collected a soil sample with the following question in mind: “what is the probability that the pH of this soil sample is between 5.90 and 6.15?

 P(5.90 \le X \le 6.15) = \int_{5.90}^{6.15}\frac{1}{\sqrt{2 \pi 0.1^{2}}} e^{\frac{-1}{2}(\frac{x-6}{0.1})^{2}}dx

Can you answer Dick’s question? Did you solve the integral of the normal density function from last week?

Harry, a regular reader of our blog, is new at the Hamilton Grange. He has an idea to attract customers on Saturday. He suggested offering a free drink to anyone not seated within  x minutes. Mr. Hamilton thought this was a cool idea and agreed to offer free drinks to 1% of the customers.

Can you work with Harry to compute x, the wait time, after which the customers get a free drink?

Before you help Tom, Dick, and Harry, what is your experience trying to solve the integral  P(X \le x) = \int_{-\infty}^{x}\frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{\frac{-1}{2}(\frac{x-\mu}{\sigma})^{2}}dx ?

Could you not solve it?

Hmm, don’t feel bad. A closed form integral for this function is impossible.

How then can we compute these probabilities?

I am sure you are now thinking about some form of a numerical integration method for the definite integral. Maybe the trapezoidal method.

Maybe. That is the point. You can approximate the integral with reasonable accuracy.

But don’t you think it is a bit tedious to do this each time there is a normal distribution problem involved?

Enter Z, the Standard Normal Distribution

Let’s travel back to lesson 28 where we learned standardizing the data.

We can move the distribution from its original scale to a new scale, the Z-scale.

This process of standardization can be achieved by subtracting from the distribution, the mean of the data and dividing by the standard deviation. We have seen that when we subtract the mean and divide by the standard deviation, the expected value and the variance of the new standardized variable is 0 and 1.

Z = \frac{X - \mu}{\sigma}

Z = \frac{X}{\sigma} - \frac{\mu}{\sigma}

Let’s find the expected value and the variance of this new random variable Z.

E[Z] = E[\frac{X}{\sigma}] - E[\frac{\mu}{\sigma}]

E[Z] = \frac{1}{\sigma}E[X] - E[\frac{\mu}{\sigma}]

E[Z] = \frac{\mu}{\sigma} - \frac{\mu}{\sigma}

E[Z] = 0

The expected value of the new standardized variable Z is 0. In other words, it is centered on 0.

Now, for the variance.

V[Z] = V[\frac{X}{\sigma}] + V[\frac{\mu}{\sigma}]

V[Z] = \frac{1}{\sigma^{2}}V[X] + 0

V[Z] = \frac{1}{\sigma^{2}}\sigma^{2}

V[Z] = 1

The variance of the new standardized variable Z is 1. The standard deviation is 1.

The Standard Normal Z has a mean of 0 and a standard deviation of 1.

We just removed the influence of the location (center) and spread (standard deviation) from the original distribution. We are moving it from the original scale to the new Z-scale.

What are the units of Z?

Tom’s distribution can be moved like this.

Dick’s and Harry’s normal distribution will also look like this Z after transforming.

Subtracting the mean will give anomalies, i.e., differences from the mean centered on zero. Positive anomalies are the values greater than the mean, and negative anomalies are the values less than the mean.

Dividing by the standard deviation will provide a scaling factor; unit standard deviation for Z.

A weight of 15.5 lbs will have a Z-score of 1; 12 + 1*(3.5).

A package with a weight of 8.5 lbs will have a Z-score of -1; 12 – 1*(3.5).

Hence, the standardized scores (Z-scores) are the distance measures between the original data value and the mean, the units being the standard deviation. A package with a weight of 15 lbs is 0.857 standard deviations right of the mean 12 lbs.

Now, look at Tom’s question. What is the probability that the parcels he weighs are within 20 lbs?

P(X \le 20)

Let’s subtract the mean and divide X by its standard deviation.

P(X \le 20) = P(\frac{X-\mu}{\sigma} \le \frac{20-12}{3.5})

P(X \le 20) = P(Z \le 2.285)

Dick’s question. What is the probability that the pH of this soil sample is between 5.90 and 6.15?

P(5.90 \le X \le 6.15) = P(\frac{5.90-6}{0.1} \le \frac{X-\mu}{\sigma} \le \frac{6.15-6}{0.1})

P(5.90 \le X \le 6.15) = P(-1 \le Z \le 1.5)

P(5.90 \le X \le 6.15) = P(Z \le 1.5) - P(Z \le 1)

Harry’s problem. Compute x, the wait time, after which 1% of the customers get a free drink.

P(X \ge x) = 1\% = 0.01

P(\frac{X-\mu}{\sigma} \ge \frac{x-20}{3.876}) = 1\% = 0.01

Harry knows that on Saturday night a customer’s wait time X for a table is normally distributed with a mean of 20 minutes and a standard deviation of 3.876 minutes.

P(Z \ge \frac{x-20}{3.876}) = 1\% = 0.01

For Tom and Dick, we are computing the probability. In the case of Harry, we are computing the value for x that will give a specific probability.

You must have observed that the common thread in all the cases is its transformation to the standard normal (Z).

In all the cases, it is sufficient to approximate the integral of Z numerically.

P(X \le 20) = P(Z \le 2.285) = \int_{-\infty}^{2.285}\frac{1}{\sqrt{2 \pi}} e^{\frac{-1}{2}(z)^{2}}dz

The standard normal tables you find in most appendices of statistics textbooks or online are the numerical integral approximations of the standard normal distribution Z. An example is shown here.

The first column z represents the standardized normal variable (z) and the columns after that represent the probability  P(Z \le z). Notice that at z = 0, P(Z \le z) = 0.5 indicating that  P(Z \le 0) is 50%.

The probability ranges from 0 to 0.999 for z = -3.4 to z = 3.4 as we move from left of the scale to the right. The first column is the random variable z, and the subsequent columns are the probabilities or the areas corresponding the values of z. This animation will make it clear.

Look at this example on how to read the probability for a z-score of 2.28, Tom’s case.

P(X \le 20) = P(Z \le 2.285) = 0.9887 .

Dick’s question. What is the probability that the pH of this soil sample is between 5.90 and 6.15?

Answer: 0.7745. Did you check?

Harry has to equate \frac{x-20}{3.876} to the Z-score that gives a probability of 0.99.

P(Z \ge \frac{x-20}{3.876}) = 1% = 0.01

P(Z \le \frac{x-20}{3.876}) = 1 - 0.01 = 0.99

\frac{x-20}{3.876} = 2.33.

Read from the table. The probability for a Z-score of 2.33 is 0.9901.

x = 20 + 2.33*3.876 = 29.03

You get a free drink if your seating wait time is greater than 29 minutes. 😉 Get the drink and continue to wait. It’s Saturday night on Amsterdam Avenue.

While you are waiting, think about the 68-95-99.7 rule of the normal distribution. Also, guess how many page views and users we have on our blog. We’ve traveled for close to a year now. Thank you so so much for this wonderful journey 🙂

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

Lesson 49 – Symmetry: The language of normal distribution

Hello.

I am the normal distribution.

I am one of the most important density functions in probability. People are so used to me, almost to the point of using me for approximating any data.

You have seen in Lesson 47 and Lesson 48 that I am a good approximation for the distribution function of the sum of independent random variables. The Central Limit Theorem.

My functional form is

 f(x) = \frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{\frac{-1}{2}(\frac{x-\mu}{\sigma})^{2}}

Did you notice I am merely a symmetric function e^{-x^{2}}, with some constants and parameters?

x can be positive or negative real numbers (continuous random variable)  -\infty < x < \infty .

\mu and \sigma are my control parameters.

Can you tell me what lesson was it where we learn that \mu is the mean or expected value and \sigma is the standard deviation of the distribution?

People use the notation N(\mu, \sigma) to say that I am a normal distribution with mean \mu and standard deviation \sigma.

\mu is my center. It can be positive or negative (-\infty < \mu < \infty), and the function is symmetric to its right and left. \sigma is how spread out I am from the center. It is positive (\sigma > 0).

Look at this example. I am centered on 60 and changing the spread. Narrow and wide flanks. Larger standard deviation results in a wider distribution with more spread around the mean.

In this example, I have the same spread, but I am changing my location (center).

I told you before that I am symmetric around the mean. So the following properties hold for me.

P(X > \mu) = P(X < \mu) = 0.5

P(X > \mu + a) = P(X < \mu - a)

f(\mu + a) = f(\mu-a)

By the way, if I give you the values for \mu and \sigma, do you know how to compute P(X \le x)?

You guessed it correct. Take my probability density function and integrate it from -\infty to x.

P(X \le x) = \int_{-\infty}^{x}\frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{\frac{-1}{2}(\frac{x-\mu}{\sigma})^{2}}dx

For example,

Is this the cumulative distribution function  F(x)?

Compute the closed form solution of this integral.

P(X \le x) = \int_{-\infty}^{x}\frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{\frac{-1}{2}(\frac{x-\mu}{\sigma})^{2}}dx

You can try change of variables method or the substitution method.

You can try integration by parts.

You can find the anti-derivative and use the fundamental theorem of calculus to solve.

I dare you to compute this integral to a closed form solution by next week.

We will continue our lessons after your tireless effort on this 😉

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

Lesson 48 – Normal is the limit

What is this? Is it normal?

The shape is limited to a bell. Is it normal?

It is the same for any variable. Is it normal?

Why is it normal?

What is the normal?

The binomial distribution is the number of successes in n trials. It is the sum of n Bernoulli random variables.

S_{n} = X_{1} + X_{2} + ... + X_{n}, where  X_{i} \in (0,1) \hspace{5} \forall i

A mathematical approximation for the binomial distribution with a large number of trials is the Poisson distribution. We know that the average number of events in an interval ( \lambda ) is the expected number of successes np.

The wait time for the ‘r’th arrival follows a Gamma distribution. Gamma distribution is the sum of r exponential random variables.

 T_{r} = t_{1} + t_{2} + t_{3} + ... + t_{r}

What you observed in the animations last week, and what you saw now for the Binomial, Poisson, and Gamma as examples, is that the sum of random variables is tending towards a particular shape (distribution function).

This observation is “central” to probability theory.

It is called the Central Limit Theorem.

If S_{n} is the sum of  n independent random variables, then the distribution function of  S_{n} can be well-approximated by a continuous function knows as the normal density function given by

f(x) = \frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{\frac{-1}{2}(\frac{x-\mu}{\sigma})^{2}}

where  \mu and \sigma^{2} are the expected value and variance of the original distribution.

It was first proved by a French mathematician Abraham de Moivre in the early 1700s. He showed this in one of the chapters of his thesis, The Doctrine of Chances. Page 243: “A Method of approximating the Sum of the Terms of the Binomial  (a+b)^{n} expanded into a Series, from whence are deduced some practical Rules to estimate the Degree of Assent which is to be given to Experiments.

An interesting observation from his thesis.

As you can see, he derived useful approximations to Binomial series. Imagine computing factorials for large values of n in those times.

It turns out that the binomial distribution can be estimated very accurately using the normal density function.

 f(x) = \frac{n!}{(n-x)!x!}p^{x}(1-p)^{n-x} = \frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{\frac{-1}{2}(\frac{x-\mu}{\sigma})^{2}}

I compiled the modern version of this derivation. It is long with all the steps.

Please CLICK HERE to read and understand the details.

Follow it through the end. You will feel good to see how Binomial converges in the limit to this symmetric distribution.

Most probability distributions are related some way or the other to independent Bernoulli trails (the root events). If you carefully look at the probability distribution functions for each of them and take it to the limit as n \rightarrow \infty, you will see how the normal distribution emerges as the limiting distribution.

That is why it is normal.

The intuition from convolution

A very intuitive and elegant way of understanding the Central Limit Theorem and why the bell shape emerges due to convergence in the center of the distribution is provided in Chapter 9 (Preasymptotic and Central Limit in the Real World) of Silent Risk, technical notes on probability by Nassim Nicholas Taleb.

The gist is as follows.

We are deriving the distribution function for the summation of the random variables.

S_{n} = X_{1} + X_{2} + ... + X_{n}

You know from lesson 46 that this is convolution.

If X and Y are two independent random variables with probability density functions f_{X}(x) and f_{Y}(y), their sum Z = X + Y is a random variable with a probability density function f_{Z}(z) that is the convolution of f_{X}(x) and f_{Y}(y).

 f(z) = f_{X}*f_{Y}(z) = \int_{-\infty}^{\infty}f_{X}(x)f_{Y}(z-x)dx

Convolution is the multiplication of functions.

The probability distribution (for example, f_{Y}(z-x)) is weighted using another function (for example, f_{X}(x)). When we repeat this through induction, we smooth out the function at the center till it gets a bell-like shape and the tails become thin.

In Chapter 9, he provides an example convolution of uniform random variables.

Convolution of two uniform random variables (S_{n} = X_{1} + X_{2} ) is a triangular function (piecewise linear).

Convolution of this triangular function with another uniform random variable (rectangular function) (S_{n} = S_{n-1} + X_{3} ) will now yield a quadratic function.

As the samples grow (n becomes large), these function multiplications yield a smooth center heavy and thin tailed bell function — the normal density.

Look at this animation. See how the uniform rectangular function X_{1} becomes a triangle (X_{1} + X_{2}) and quickly converges to a normal density for n = 5.

This one is for the sum of Poisson random variables.

Notice that even for n = 10, the distribution is not fully normal. It is not converging fast. And that is important to remember.

While the Central Limit Theorem is central to the probability theory, and it is a fundamental assumption for many concepts we will learn later, we should know that some distributions converge quickly, some do not.

We will learn about the normal distribution in detail in the next few lessons. As you prepare for the normal distribution, I will leave you with a comment posted last week about the normal distribution by a good friend who works in the insurance industry.

“As has been shown time and again, there is no such thing as a ‘normal’ distribution in the real world -;)”

Well, what can I say, he works on real work risk with real dollars at stake. Always take the word of a person who has skin in the game.

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

Lesson 47 – Is it normal?

The wait time for the first/next arrival follows an Exponential distribution. The wait time for the ‘r’th arrival ( T_{r} = t_{1} + t_{2} + t_{3} + ... + t_{r} ) follows a Gamma distribution.

The probability density function of the Gamma distribution is derived using the convolution of individual random variables  t_{1}, t_{2}, ... t_{r} .

 f(t) = \frac{\lambda e^{-\lambda t}(\lambda t)^{r-1}}{(r-1)!}

For increasing values of r, the distribution is like this.

It tends to look like a bell. Is it normal?

Nah, it may be a Gamma thing. Let me add uniform distributions.

 f(x) = 1 \forall 0 < x < 1

For increasing values of n, the distribution of the sum of the uniform random variables is like this.

It tends to look like a bell. Is it normal?

Hmm. I think it is just a coincidence. I will check Poisson distribution for increasing values of \lambda. Afterall, it is a discrete distribution.

P(X=x) = \frac{e^{-\lambda t}(\lambda t)^{x}}{x!}; x = 0, 1, 2, ...

Tends to look like a bell. Is it normal?

Perhaps coincidence should concede to a consistent pattern. If this is a pattern, does it also show up in the Binomial distribution?

P(X=x) = {n \choose x}p^{x}(1-p)^{n-x}; x = 0, 1, 2, ... n

There it is again. It looks like a bell.

What is this? Is it normal?

The shape is limited to a bell. Is it normal?

It is the same for any variable. Is it normal?

Why is it normal?

What is the normal?

To be continued…

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

error

Enjoy this blog? Please spread the word :)