## Lesson 46 – How convoluted can it be?

After last week’s conversation with Devine about Gamma distribution, an inspired Joe wanted to derive the probability density function of the Gamma distribution from the exponential distribution using the idea of convolution.

But first, he has to understand convolution. So he called upon Devine for his usual dialog.

J: Hello D, I can wait no longer, nor can I move on to a different topic when this idea of convolution is not clear to me. I feel anxious to know at least the basics that relate to our lesson last week.

D: It is a good anxiety to have. Will keep you focused on the mission. Where do we start?

J: We are having a form of dialog since the time we met. Why don’t you provide the underlying reasoning, and I will knit the weave from there.

D: Sounds good to me. Let me start by reminding you of our conversation in Lesson 23 about probability distribution. You introduced me to the Chicago dice game where you throw a pair of dice to score the numbers 2 – 12 in the order of the rounds.

J: Yes, I remember.

D: Let’s assume that Z is that outcome which is the sum of the numbers on each dice, say X and Y.

Create a table of these outcomes and what combinations can give you those outcomes.

J: We did this too during lesson 23. Here is the table.

D: Now, take any one outcome for Z, let’s say Z = 3, and find out the probability that the random variable Z takes a value of 3, i.e., how do you compute P(Z = 3)?

J: There are two ways of getting a number 3, when X = 1 and Y = 2, or when X = 2 and Y = 1. The total combinations are 36, so P(Z = 3) = 2/36.

D: Excellent. Now let me walk you through another way of thinking. You said the there are two ways of getting a number 3.

X = 1 and Y = 2
or
X = 2 and Y = 1

What is the probability of the first combination?

J: P(X = 1 and Y = 2) = P(X = 1).P(Y = 2) since X and Y are independent.

D: What is the probability of the second combination?

J: P(X = 2 and Y = 1) = P(X = 2).P(Y = 1), again since X and Y are independent.

D: What is the probability of Z = 3 based on these combinations?

J: Ah, I see. Since either of these combinations can occur to get an outcome 3, P(Z = 3) is the union of these combinations.

P(Z = 3) = P(X = 1).P(= 2) + P(X = 2).P(Y = 1) = 2/36

D: Yes. If you represent these as their probability mass functions, you get

Let me generalize it to any function of X and Y so that it can help in your derivations later.

We are attempting to determine f(z), the distribution function of Z. P(Z = z). If X = x, then for the summation Z = X + Y to be true, Y = zx.

This corollary means we can find out f(z) over all possible values of x as,

This property is called the convolution of and .

J: Then, I suppose for the continuous distribution case it will be analogous. The summation will become an integration.

D: Yes. If X and Y are two independent random variables with probability density functions and , their sum Z = X + Y is a random variable with a probability density function that is the convoluton of and .

The density of the sum of two independent random variables is the convolution of their densities.

The exact mathematical proof can also be derived, but maybe we leave that to a later conversation.

J: Understood. But like all basics, we saw this for two random variables. How then, can we extend this to the sum of n random variables. I am beginning to make connections to the Gamma distribution case that has the sum of n exponential random variables.

D: That is a good point. Now let’s suppose is the sum of n independent random variables. We can always rewrite this as and find the probabilty distribution function of through induction.

J: Got it. It seems to follow the logic. Now, let me use this reasoning and walk through the derivation of the Gamma distribution.

D: Go for it. The floor is yours.

J: I will start with the two variable case. Our second meeting happened at lesson 9, and the time to the second arrival from the origin is .

The random variable is the sum of two random variables and . I want to determine the probability density function of . I will apply the convolution theory. For consistency with today’s notations, let me take , , and .

Both X and Y are bounded at 0, and if this is true, implies , and implies . Either way, the limits of the integral are from 0 to z.

D: Excellent. Let me show how this function looks.

Do you see how the Gamma distribution is evolving out of exponential distribution?

J: Yes. Very clear.

J: For a three variable case, I will assume P = X + Y + S. I can also write this as P = Z + S analogous to .

Then I have the distribution function of P as

We can rewrite this as

so that the general Gamma distribution function for n variables becomes,

D: Joe that is some well thought out derivation. You are really into this data analysis stuff now.

J: 😎 😎 Do we have anything else to cover today?

D: Using the same logic, can you derive the distribution for the sum of normals?

J: Normals? 😕 😕 😕

Oops, I think that is for the next week.
Don’t you have to get ready for the new year parties? It may be the coldest New Year’s eve on record. So you better bundle up!

Happy New Year.

If you find this useful, please like, share and subscribe.

## Lesson 45 – Time to ‘r’th arrival: The language of Gamma distribution

###### Joe and Devine Meet Again — for the ‘r’th time

J: It’s been 13 lessons since we met last time. Thought I’d say hello. You did not show up last week. I kept waiting as you asked me to in lesson 44.

D: Hey Joe! Sorry for the wait. I had a tough choice between travel/work and the weekly lesson. Could only do one. It was not intentional, although where we left off kind of hinted toward the wait. End of the year is a busy time for all.

J: I noticed you covered exponential distribution and its memoryless property in the previous two lessons. Isn’t time to our meetings also exponential?

D: That is correct. The first time we met was in lesson 6. The wait time was 6. We met again in lesson 9. The wait time (or wait lessons) was 3. Between the last time we met and now, as you pointed out, the wait time is 13. In lesson 43, where we first discussed the exponential distribution, I showed how its probability density function is derived. Did you follow the logic there?

J: Yes, I did. We begin with the fact that the arrival time (to the first or next event) exceeds some value t only if there are no events in the interval [0, t].

The probability that T > t is equal to the probability that there are 0 events in the period. P(N = 0) is computed from the Poisson distribution.

Since , .

is the cumulative density function for the exponential distribution.

We can get the probability density function f(t) by taking the derivative of F(t).

D: Well done. The inter-arrival time follows an exponential probability distribution.

J: Isn’t the exponential distribution like the Geometric distribution? I learned in lesson 33 that the random variable which measures the number of trials it takes to see the first success is Geometrically distributed.

D: That is a shrewd observation. Yes, the exponential distribution is the continuous analog of the discrete geometric distribution.

In geometric distribution, the shape is controlled by p, the parameter. The greater the value of p, the steeper the fall.

In exponential distribution, the shape is controlled by .

J: In that case, does the exponential distribution also have a related distribution that measures the wait time till the ‘r’th arrival?

D: Can you be more specific?

J: The geometric distribution has the Negative binomial distribution that measures the number of trials it takes to see the ‘r’th success. Remember lesson 35?

Just like the exponential distribution is the continuous analog of the discrete geometric distribution, is there a continuous analog for the discrete negative binomial distribution?

D: Yes, there is a related distribution that can be used to estimate the time to the ‘r’th arrival. It is called the Gamma distribution.

Look at our timeline chart for instance. The time to the first arrival is . The time to the second arrival since the first arrival is . But, our second meeting happened at lesson 9, so the time to the second arrival from the origin is .

Similarly, the second time we meet again after lesson 9 is in lesson 16. So, the time to the second arrival since lesson 9 is 16 – 9 = 7. Put together, these times to second meeting follow a Gamma distribution. More generally,

the wait time for the ‘r’th arrival follows a Gamma distribution.

J: That seems to be a logical extension. I believe we can derive the probability density function for the Gamma distribution using the exponential distribution. They seem to be related. Can you help me with that?

D: Sure. If you noticed, I said that our second meeting happened at lesson 9, and the time to the second arrival from the origin is .

J: Yes. That is because it is the total time — the first arrival and the second arrival since.

D: So the random variable is the sum of two random variables and

The time to ‘r’th arrival .

We can derive the probability density function of using the convolution of the individual random variables .

J: 😕 What is convolution?

D: It might require a full lesson to explain it from first and show some examples, but for now remember that convolution is the blending of two or more functions. If you have two continuous random variables X and Y with probability density functions and , then, the probability density function of the new random variable Z = X + Y is

Employing this definition on r variables () using induction, we can get the probability density function of the Gamma distribution as

J: 😕 😕 😕   😕 😕 😕

D: Not to worry. We will learn some of the essential steps of convolution soon.

J: I have to say, the density function looks a little convoluted though. 😉

D: Ah, that’s a good one. Perhaps it is. Why don’t you check what happens to the equation when you choose r = 1, i.e., the arrival time for the first event.

J: Let me try. . This is the density function for the exponential distribution. It has to, because we measure the arrival time to the first event.

D: You are correct. The Gamma distribution has two control parameters. is called the scale parameter because it controls the width of the distribution and r is called the shape parameter because it controls the shape parameter.

J: Can we make some quick graphics to see how the distribution looks.

D: Yes, here it is. This one is for a of 0.2 and r changes from 1 to 4, i.e., for use to meet the first time, second time, third time and the fourth time.

J: This is cool. I see that the tails are getting bigger as the value of r increases.

D: Good observation again. That is why Gamma distribution is also used to fit data with significant skewness. It is widely used for fitting rainfall data. Insurance agents also use it to model the claims.

J: Understood. When do we meet again? We have to figure out the convolution stuff.

You now have all the tools to estimate this. Figure out the probability that the wait time is more than one week while we celebrate the emergence of the light from darkness.

Merry Christmas.

If you find this useful, please like, share and subscribe.

## Lesson 44 – Keep waiting: The memoryless property of exponential distribution

###### Bob Waits for the Bus

As the building entrance door closes behind, Bob glances at his post-it note. It has the directions and address of the car dealer. Bob is finally ready to buy his first (used) car. He walks to the nearby bus stop jubilantly thinking he will seldom use the bus again. Bob is tired of the waiting. Throughout these years the one thing he could establish is that the average wait time for his inbound 105 at the Cross St @ Main St is 15 minutes.

Bob may not care, but we know that his wait time follows an exponential distribution that has a probability density function .

The random variable T, the wait time between buses is an exponential distribution with parameter . He waits 15 minutes on average. Some days he boards the bus earlier than 15 minutes, and some days he waits much longer.

Looking at the function , and the typical information we have for exponential distribution, i.e., the average wait time, it will be useful to relate the parameter to the average wait time.

The average wait time is the average of the distribution — the expected value E[.].

E[X] for a continuous distribution, as you know from lesson 24 is .

Applying this using the limits of the exponential distribution, we can derive the following.

The definite integral is .

So we have

The parameter is a non-negative real number (), and represents the reciprocal of the expected value of T.

In Bob’s case, since the average wait time (E[T]) is 15 minutes, the parameter is 0.066.

Bob gets to the bus shelter, greets the person next to him and thinks to himself “Hope the wait will not exceed 10 minutes today.”

Please tell him the probability he waits more than 10 minutes is 0.5134.

Bob is visibly anxious. He turns his hand and looks at his wristwatch. “10 minutes. The wait won’t be much longer.”

Please tell him about the memoryless property of the exponential distribution. The probability that he waits for another ten minutes, given he already waited 10 minutes is also 0.5134.

Let’s see how. We will assume t represents the first ten minutes and s represents the second ten minutes.

The probability distribution of the remaining time until the event occurs is always the same regardless of the time that passed.

There is no memory in the process. The history is not relevant. The time to next arrival is not influenced by when the last event or arrival occurred.

This property is unique to the strictly decreasing functions: exponential and the geometric distributions.

The probability that Bob has to wait another s minutes (t + s) given that he already waited t minutes is the same as the probability that Bob waited the first s minutes. It is independent of the current wait time.

###### Bob Gets His First Chevy

Bob arrives at the dealers. He loves the look of the red 1997 Chevy. He looks over the window pane; “Ah, manual shift!” That made up his mind. He knows what he is getting. The price was reasonable. A good running engine is all he needed to drive it away.

The manager was young, a Harvard alum, as Bob identified from things in the room. “There is no guarantee these days with academic inflation, … the young lad is running a family business, … or his passion is to sell cars,” he thought to himself.

The manager tells him that the engine is in perfect running condition and the average breakdown time is four years. Bob does some estimates () in his mind while checking out the car. He is happy with what he is getting and closes the deal.

Please tell Bob that there is a 22% likelihood that his Chevy manual shift will break down in the first year.

The number of years this car will run ~ exponential
distribution with a rate () of 1/4.

Since the average breakdown time (expected value E[T]) is four years, the parameter = 1/4.

Bob should also know that there is a 37% chance that his car will still be running fine after four years.

###### Bob in Four Years

Bob used the car for four years now with regular servicing, standard oil changes, and tire rotations. The engine is great.

Since the average lifetime has passed, should he think about a new car? How long should we expect his car to continue without a breakdown? Another four years?

Since he used it for four years, what is the probability that there will be no breakdown until the next four years?

You guessed it, 37%.

#### Now let’s have a visual interpretation of this memoryless property.

The probability distribution of the wait time (engine breakdown) for = 1/4 looks like this.

Let us assume another random variable , as the breakdown time after four years of usage. The lower bound for is 0 (since we measure from four years), and the upper bound is .

For any values , the distribution is another exponential function — it is shifted by four years.

Watch this animation, you will understand it better.

The original distribution is represented using the black line. The conditional distribution is shown as a red line using links.

The same red line with links (truncated at 4) is shown as the shifted exponential distribution (). So, the red line with links from t = 4 is the same as the original function from t = 0. It is just shifted.

The average value of is four years. The average value of is also four. They have the same distribution.

If Bob reads our lessons, he’d understand that his Chevy will serve him, on the average, another four years.

Just like the car dealer’s four-year liberal arts degree from Harvard is forgotten, Bob’s four-year car usage history is forgotten — Memoryless.

As the saying goes, some memories are best forgotten, but the lessons from our classroom are never forgotten.

If you find this useful, please like, share and subscribe.

## Lesson 43 – Wait time: The language of exponential distribution

###### Wednesday, no, the Waiting Day

November 29, 2017

6:00 AM

As the cool river breeze kisses my face, I hear the pleasant sound of the waves. “How delightful,” I think, as I drop into the abyss of eternal happiness. The sound of the waves continues to haunt me. I run away from the river; the waves run with me. I close my ears; the waves are still here.

It’s the time when your dream dims into reality. Ah, it’s the sound of the “waves” on my iPhone. Deeply disappointed, I hit the snooze and wait for my dream to come back.

6:54 AM

“Not again,” I screamed. I have 30 minutes to get ready and going. I-95 is already bustling. I can’t afford to wait long in the toll lane. Doctor’s check-in at 8 AM.

7:55 AM

“Come on, let’s go.” For the 48th time, waiting in the toll lane, I curse myself for not having gotten the EZ pass that week. “Let’s go, let’s go.” I maneuver my way while being rude to the nasty guy who tried to sneak in front of my car. Finally, I pay cash at the toll and drive off in a swift to my doctor’s.

8:15 AM

“Hi, I have an appointment this morning. Hope I am not late.” The pretty lady at the desk stared at me, gave me a folder and asked me to wait. Dr. D will be with you shortly. As I was waiting for my turn, I realized that the lady’s stare was for my stupid question. My appointment was at 8 AM after all.

8:50 AM

The doctor steps in; “Please come in” he said. A visibly displeased me walked-in instantly, all the way shaking my head for the delay. My boss will be waiting for me at the office. We are launching a new product today.

9:15 AM

“You are in perfect health. The HDLs and LDLs are normal. Continue the healthy eating and exercise practices you have. See you next time, but don’t wait too long for the next visit.”

9:25 AM

My wait continues, this time for the train. “The next downtown 1-train will arrive in 10 minutes,” said the man (pre-recorded).

10:00 AM

My boss expressed his displeasure at my delay in his usual sarcastic ways. “But, I told you I was going to be late today,” I said to myself. We get busy with work and the product launch.

1:00 PM

I am waiting in the teller line at the local bank. Essential bank formalities and some checks to deposit. There were already ten people before me; there is only one teller, and for some reason, she is taking her own sweet time to serve each customer.

The only other living being in the bank (bank employees of course) is the manager; she is busy helping a person with his mortgage. “Poor guy seems to be buying a house at the peak,” I thought as I start counting the time it is talking to serve each customer.

1:35 PM

“One extra-hot Cappuccino,” said Joe at Starbucks in his usual stern voice. The wait for my coffee was not as annoying. There’s something about coffee and me. Can wait forever ! or maybe it is Starbucks; I can’t say.

7:00 PM

After a long tiring, waiting day, I am still waiting for my train.

I waited 22 minutes. The train surely has to come in the next minute,” I said to myself.

The clock ticks, my energy drops, still no trail.

.

.

.

“The next uptown 1-train is now arriving. Please stand away from the platform edge.”

I step in and grab the one remaining seat. “Finally; no more waiting for the day,” I said to myself.

The wheels rattle, the brains muffle, and the eyes scuttle. Same beautiful abyss of happy, restful state from the morning.

8:00 PM

As I park my car and check my door mail, I realize that my day was filled with wait times. I said to myself, “Aren’t these the examples of exponential distribution that data analysis guy from college used to talk about? I finally understand it. You live and learn.”

9:00 PM

I start logging my Wednesday, no, the waiting day.

“Let me derive the necessary functions for the exponential distribution before I go to bed,” I said to myself.

The time between arrivals at service facilitates, time to failure of systems, flood occurrence, etc., can be modeled as exponential distributions.

Since I want to measure the time between events, I should think of time T as a continuous random variable, , etc., like this.

That means, this distribution is positive only, as (non-negative real numbers).

We can have a small wait time or a long wait time. It varies, and we are estimating the probability that T is less than or greater than a particular time, and between two times.

The distribution of the probability of these wait times is called the exponential distribution.

As I watch the events and wait times figure carefully, I can sense that there is a relation between the Poisson distribution and the Exponential distribution.

The Poisson distribution represents the number of events in an interval of time, and the exponential distribution represents the time between these events.

If N is the number of events during an interval ( a span of time) with an average rate of occurrence ,

If T is measured as the time to next occurrence or arrival, then it should follow an exponential distribution.

The time to arrival exceeds some value t, only if N = 0 within t, i.e., if there are no events in an interval [0, t].

If , then .

I know that is the cumulative density function. It is the integral of the probability density function. .

The probability density function f(t) can then be obtained by taking the derivative of F(t).

The random variable T, the wait time between successive events is an exponential distribution with parameter .

Let me map this on to the experiences I had today.

If on average, 25 vehicles pass the toll per hour, per hour. Then the wait time distribution for the next vehicle at the toll should look like this.

The probability that I will wait more than 5 minutes to pass the toll is .

So, the probability that my wait time will be less than 5 minutes is 0.875. Not bad. I should have known this before I swore at the guy who got in my way.

It is clear that the distribution will be flatter if is smaller and steeper if is larger.

10:00 PM

I lay in my bed with a feeling of accomplishment. My waiting day was eventful; I checked off all boxes on my to-do list. I now have a clear understanding of exponential distribution.

10:05 PM

I am hoping that I get the same beautiful dream. My mind is still on exponential distribution with one question.

“I waited 22 minutes for the train in the evening, why did it not arrive in the next few minutes? Since I waited a long time, shouldn’t the train arrive immediately?”

A tired body always beats the mind.

It was time for the last thought to dissolve into the darkness. The SHIREBOURN river is “waiting” for me on the other side of the darkness.

If you find this useful, please like, share and subscribe.