Lesson 45 – Time to ‘r’th arrival: The language of Gamma distribution

Joe and Devine Meet Again — for the ‘r’th time

J: It’s been 13 lessons since we met last time. Thought I’d say hello. You did not show up last week. I kept waiting as you asked me to in lesson 44.

D: Hey Joe! Sorry for the wait. I had a tough choice between travel/work and the weekly lesson. Could only do one. It was not intentional, although where we left off kind of hinted toward the wait. End of the year is a busy time for all.

J: I noticed you covered exponential distribution and its memoryless property in the previous two lessons. Isn’t time to our meetings also exponential?

D: That is correct. The first time we met was in lesson 6. The wait time  t_{1} was 6. We met again in lesson 9. The wait time (or wait lessons)  t_{2} was 3. Between the last time we met and now, as you pointed out, the wait time  t_{8} is 13. In lesson 43, where we first discussed the exponential distribution, I showed how its probability density function is derived. Did you follow the logic there?

J: Yes, I did. We begin with the fact that the arrival time (to the first or next event) exceeds some value t only if there are no events in the interval [0, t].

The probability that T > t is equal to the probability that there are 0 events in the period. P(N = 0) is computed from the Poisson distribution.

 P(T > t) = P(N = 0) = \frac{e^{-\lambda t}(\lambda t)^{0}}{0!} = e^{-\lambda t}

Since  P(T > t) = e^{-\lambda t} ,  P(T \le t) = 1 - e^{-\lambda t}.

 P(T \le t) is the cumulative density function  F(t) for the exponential distribution.

We can get the probability density function f(t) by taking the derivative of F(t).

 f(t) = \frac{d}{dt}F(t) = \frac{d}{dt}(1-e^{-\lambda t}) = \lambda e^{-\lambda t}

D: Well done. The inter-arrival time follows an exponential probability distribution.

J: Isn’t the exponential distribution like the Geometric distribution? I learned in lesson 33 that the random variable which measures the number of trials it takes to see the first success is Geometrically distributed.

D: That is a shrewd observation. Yes, the exponential distribution is the continuous analog of the discrete geometric distribution.

In geometric distribution, the shape is controlled by p, the parameter. The greater the value of p, the steeper the fall.

In exponential distribution, the shape is controlled by  \lambda .

J: In that case, does the exponential distribution also have a related distribution that measures the wait time till the ‘r’th arrival?

D: Can you be more specific?

J: The geometric distribution has the Negative binomial distribution that measures the number of trials it takes to see the ‘r’th success. Remember lesson 35?

Just like the exponential distribution is the continuous analog of the discrete geometric distribution, is there a continuous analog for the discrete negative binomial distribution?

D: Yes, there is a related distribution that can be used to estimate the time to the ‘r’th arrival. It is called the Gamma distribution.

Look at our timeline chart for instance. The time to the first arrival is t_{1} = 6 . The time to the second arrival since the first arrival is t_{2} = 3 . But, our second meeting happened at lesson 9, so the time to the second arrival from the origin is  T_{2} = t_{1} + t_{2} = 9 .

Similarly, the second time we meet again after lesson 9 is in lesson 16. So, the time to the second arrival since lesson 9 is 16 – 9 = 7. Put together, these times to second meeting follow a Gamma distribution. More generally,

the wait time for the ‘r’th arrival follows a Gamma distribution.

J: That seems to be a logical extension. I believe we can derive the probability density function for the Gamma distribution using the exponential distribution. They seem to be related. Can you help me with that?

D: Sure. If you noticed, I said that our second meeting happened at lesson 9, and the time to the second arrival from the origin is  T_{2} = t_{1} + t_{2} = 9 .

J: Yes. That is because it is the total time — the first arrival and the second arrival since.

D: So the random variable  T_{2} is the sum of two random variables  t_{1} and  t_{2}

The time to ‘r’th arrival  T_{r} = t_{1} + t_{2} + t_{3} + ... + t_{r}.

We can derive the probability density function of  T_{r} using the convolution of the individual random variables t_{1}, t_{2}, ... t_{r} .

J: 😕 What is convolution?

D: It might require a full lesson to explain it from first and show some examples, but for now remember that convolution is the blending of two or more functions. If you have two continuous random variables X and Y with probability density functions f(x) and  g(y) , then, the probability density function of the new random variable Z = X + Y is

 (f*g)(z) = \int_{-\infty}^{\infty}f(z-y)g(y)dy

Employing this definition on r variables ( t_{1}, t_{2}, ..., t_{r}) using induction, we can get the probability density function of the Gamma distribution as

 f(t) = \frac{\lambda e^{-\lambda t}(\lambda t)^{r-1}}{(r-1)!}

J: 😕 😕 😕   😕 😕 😕

D: Not to worry. We will learn some of the essential steps of convolution soon.

J: I have to say, the density function looks a little convoluted though. 😉

D: Ah, that’s a good one. Perhaps it is. Why don’t you check what happens to the equation when you choose r = 1, i.e., the arrival time for the first event.

J: Let me try.  f(t) = \lambda e^{-\lambda t} . This is the density function for the exponential distribution. It has to, because we measure the arrival time to the first event.

D: You are correct. The Gamma distribution has two control parameters. \lambda is called the scale parameter because it controls the width of the distribution and r is called the shape parameter because it controls the shape parameter.

J: Can we make some quick graphics to see how the distribution looks.

D: Yes, here it is. This one is for a  \lambda of 0.2 and r changes from 1 to 4, i.e., for use to meet the first time, second time, third time and the fourth time.

J: This is cool. I see that the tails are getting bigger as the value of r increases.

D: Good observation again. That is why Gamma distribution is also used to fit data with significant skewness. It is widely used for fitting rainfall data. Insurance agents also use it to model the claims.

J: Understood. When do we meet again? We have to figure out the convolution stuff.

 

You now have all the tools to estimate this. Figure out the probability that the wait time is more than one week while we celebrate the emergence of the light from darkness.

Merry Christmas.

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

Lesson 44 – Keep waiting: The memoryless property of exponential distribution

Bob Waits for the Bus

As the building entrance door closes behind, Bob glances at his post-it note. It has the directions and address of the car dealer. Bob is finally ready to buy his first (used) car. He walks to the nearby bus stop jubilantly thinking he will seldom use the bus again. Bob is tired of the waiting. Throughout these years the one thing he could establish is that the average wait time for his inbound 105 at the Cross St @ Main St is 15 minutes.

Bob may not care, but we know that his wait time follows an exponential distribution that has a probability density function  f(t) = \lambda e^{-\lambda t} .

The random variable T, the wait time between buses is an exponential distribution with parameter  \lambda . He waits 15 minutes on average. Some days he boards the bus earlier than 15 minutes, and some days he waits much longer.

Looking at the function  f(t) = \lambda e^{-\lambda t} , and the typical information we have for exponential distribution, i.e., the average wait time, it will be useful to relate the parameter  \lambda to the average wait time.

The average wait time is the average of the distribution — the expected value E[.].

E[X] for a continuous distribution, as you know from lesson 24 is  E[X] = \int x f(x) dx.

Applying this using the limits of the exponential distribution, we can derive the following.

 E[T] = \int_{0}^{\infty} t f(t) dt

 E[T] = \int_{0}^{\infty} t \lambda e^{-\lambda t} dt

 E[T] = \lambda \int_{0}^{\infty} t e^{-\lambda t} dt

The definite integral is  \frac{1}{\lambda^{2}} .

So we have

E[T] = \frac{1}{\lambda}

The parameter  \lambda is a non-negative real number ( \lambda > 0), and represents the reciprocal of the expected value of T.

In Bob’s case, since the average wait time (E[T]) is 15 minutes, the parameter \lambda is 0.066.

Bob gets to the bus shelter, greets the person next to him and thinks to himself “Hope the wait will not exceed 10 minutes today.”

Please tell him the probability he waits more than 10 minutes is 0.5134.

 P(T > 10) = e^{-\lambda t} = e^{-10/15} = 0.5134

Bob is visibly anxious. He turns his hand and looks at his wristwatch. “10 minutes. The wait won’t be much longer.”

Please tell him about the memoryless property of the exponential distribution. The probability that he waits for another ten minutes, given he already waited 10 minutes is also 0.5134.

Let’s see how. We will assume t represents the first ten minutes and s represents the second ten minutes.

 P(T > t + s \mid T > t) = \frac{P(T > t \cap T > t+s)}{P(T > t)}

\hspace{10cm} = \frac{P(T > t+s)}{P(T > t)}

\hspace{10cm} = \frac{e^{-\lambda (t+s)}}{e^{-\lambda t}}

\hspace{10cm} = \frac{e^{-\lambda t} e^{-\lambda s}}{e^{-\lambda t}}

\hspace{10cm} = e^{-\lambda s}

 P(T > 10 + 10 \mid T > 10) = e^{-10\lambda} = 0.5134

The probability distribution of the remaining time until the event occurs is always the same regardless of the time that passed.

There is no memory in the process. The history is not relevant. The time to next arrival is not influenced by when the last event or arrival occurred.

This property is unique to the strictly decreasing functions: exponential and the geometric distributions.

The probability that Bob has to wait another s minutes (t + s) given that he already waited t minutes is the same as the probability that Bob waited the first s minutes. It is independent of the current wait time.

Bob Gets His First Chevy

Bob arrives at the dealers. He loves the look of the red 1997 Chevy. He looks over the window pane; “Ah, manual shift!” That made up his mind. He knows what he is getting. The price was reasonable. A good running engine is all he needed to drive it away.

The manager was young, a Harvard alum, as Bob identified from things in the room. “There is no guarantee these days with academic inflation, … the young lad is running a family business, … or his passion is to sell cars,” he thought to himself.

The manager tells him that the engine is in perfect running condition and the average breakdown time is four years. Bob does some estimates ($$$$) in his mind while checking out the car. He is happy with what he is getting and closes the deal.

Please tell Bob that there is a 22% likelihood that his Chevy manual shift will break down in the first year.

The number of years this car will run ~ exponential
distribution with a rate (\lambda) of 1/4.

Since the average breakdown time (expected value E[T]) is four years, the parameter \lambda = 1/4.

 P(T \le 1) = 1 - e^{-\lambda t} = 1 - e^{-(1/4)} = 0.22

Bob should also know that there is a 37% chance that his car will still be running fine after four years.

P(T > 4) = e^{-4\lambda} = e^{-4/4} = 0.37

Bob in Four Years

Bob used the car for four years now with regular servicing, standard oil changes, and tire rotations. The engine is great.

Since the average lifetime has passed, should he think about a new car? How long should we expect his car to continue without a breakdown? Another four years?

Since he used it for four years, what is the probability that there will be no breakdown until the next four years?

You guessed it, 37%.

 P(T > 8 \mid T > 4) = \frac{P(T > 8)}{P(T > 4)} = e^{-4\lambda} = 0.37

Now let’s have a visual interpretation of this memoryless property.

The probability distribution of the wait time (engine breakdown) for  \lambda = 1/4 looks like this.

Let us assume another random variable  T_{2} = T - 4, as the breakdown time after four years of usage. The lower bound for  T_{2} is 0 (since we measure from four years), and the upper bound is \infty.

For any values  T > 4 , the distribution is another exponential function — it is shifted by four years.

 f(t_{2}) = \lambda e^{-\lambda t_{2}} = \lambda e^{-\lambda (t-4)}

Watch this animation, you will understand it better.

The original distribution is represented using the black line. The conditional distribution  P(T > 4+s \mid T > 4) is shown as a red line using links.

The same red line with links (truncated at 4) is shown as the shifted exponential distribution (f(t_{2})=\lambda e^{-\lambda (t-4)}). So, the red line with links from t = 4 is the same as the original function from t = 0. It is just shifted.

The average value of T is four years. The average value of T_{2} = T - 4 is also four. They have the same distribution.

If Bob reads our lessons, he’d understand that his Chevy will serve him, on the average, another four years.

Just like the car dealer’s four-year liberal arts degree from Harvard is forgotten, Bob’s four-year car usage history is forgotten — Memoryless.

As the saying goes, some memories are best forgotten, but the lessons from our classroom are never forgotten.

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

Lesson 43 – Wait time: The language of exponential distribution

Wednesday, no, the Waiting Day

November 29, 2017

6:00 AM

As the cool river breeze kisses my face, I hear the pleasant sound of the waves. “How delightful,” I think, as I drop into the abyss of eternal happiness. The sound of the waves continues to haunt me. I run away from the river; the waves run with me. I close my ears; the waves are still here.

It’s the time when your dream dims into reality. Ah, it’s the sound of the “waves” on my iPhone. Deeply disappointed, I hit the snooze and wait for my dream to come back.

6:54 AM

“Not again,” I screamed. I have 30 minutes to get ready and going. I-95 is already bustling. I can’t afford to wait long in the toll lane. Doctor’s check-in at 8 AM.

7:55 AM

“Come on, let’s go.” For the 48th time, waiting in the toll lane, I curse myself for not having gotten the EZ pass that week. “Let’s go, let’s go.” I maneuver my way while being rude to the nasty guy who tried to sneak in front of my car. Finally, I pay cash at the toll and drive off in a swift to my doctor’s.

8:15 AM

“Hi, I have an appointment this morning. Hope I am not late.” The pretty lady at the desk stared at me, gave me a folder and asked me to wait. Dr. D will be with you shortly. As I was waiting for my turn, I realized that the lady’s stare was for my stupid question. My appointment was at 8 AM after all.

8:50 AM

The doctor steps in; “Please come in” he said. A visibly displeased me walked-in instantly, all the way shaking my head for the delay. My boss will be waiting for me at the office. We are launching a new product today.

9:15 AM

“You are in perfect health. The HDLs and LDLs are normal. Continue the healthy eating and exercise practices you have. See you next time, but don’t wait too long for the next visit.”

9:25 AM

My wait continues, this time for the train. “The next downtown 1-train will arrive in 10 minutes,” said the man (pre-recorded).

10:00 AM

My boss expressed his displeasure at my delay in his usual sarcastic ways. “But, I told you I was going to be late today,” I said to myself. We get busy with work and the product launch.

1:00 PM

I am waiting in the teller line at the local bank. Essential bank formalities and some checks to deposit. There were already ten people before me; there is only one teller, and for some reason, she is taking her own sweet time to serve each customer.

The only other living being in the bank (bank employees of course) is the manager; she is busy helping a person with his mortgage. “Poor guy seems to be buying a house at the peak,” I thought as I start counting the time it is talking to serve each customer.

1:35 PM

“One extra-hot Cappuccino,” said Joe at Starbucks in his usual stern voice. The wait for my coffee was not as annoying. There’s something about coffee and me. Can wait forever ! or maybe it is Starbucks; I can’t say.

7:00 PM

After a long tiring, waiting day, I am still waiting for my train.

I waited 22 minutes. The train surely has to come in the next minute,” I said to myself.

The clock ticks, my energy drops, still no trail.

.

.

.

“The next uptown 1-train is now arriving. Please stand away from the platform edge.”

I step in and grab the one remaining seat. “Finally; no more waiting for the day,” I said to myself.

The wheels rattle, the brains muffle, and the eyes scuttle. Same beautiful abyss of happy, restful state from the morning.

8:00 PM

As I park my car and check my door mail, I realize that my day was filled with wait times. I said to myself, “Aren’t these the examples of exponential distribution that data analysis guy from college used to talk about? I finally understand it. You live and learn.”

9:00 PM

I start logging my Wednesday, no, the waiting day.

“Let me derive the necessary functions for the exponential distribution before I go to bed,” I said to myself.

The time between arrivals at service facilitates, time to failure of systems, flood occurrence, etc., can be modeled as exponential distributions.

Since I want to measure the time between events, I should think of time T as a continuous random variable,  t_{1}, t_{2}, t_{3} , etc., like this.

That means, this distribution is positive only, as  T \ge 0 (non-negative real numbers).

We can have a small wait time or a long wait time. It varies, and we are estimating the probability that T is less than or greater than a particular time, and between two times.

The distribution of the probability of these wait times is called the exponential distribution.

As I watch the events and wait times figure carefully, I can sense that there is a relation between the Poisson distribution and the Exponential distribution.

The Poisson distribution represents the number of events in an interval of time, and the exponential distribution represents the time between these events.

If N is the number of events during an interval ( a span of time) with an average rate of occurrence \lambda,

P(N = k) = \frac{e^{-\lambda t}(\lambda t)^{k}}{k!}

If T is measured as the time to next occurrence or arrival, then it should follow an exponential distribution.

The time to arrival exceeds some value t, only if N = 0 within t, i.e., if there are no events in an interval [0, t].

 P(T > t) = P(N = 0) = \frac{e^{-\lambda t}(\lambda t)^{0}}{0!} = e^{-\lambda t}

If  P(T > t) = e^{-\lambda t} , then  P(T \le t) = 1 - e^{-\lambda t} .

I know that P(T \le t) = F(t) is the cumulative density function. It is the integral of the probability density function. F(t) = \int_{0}^{t}f(t)dt.

The probability density function f(t) can then be obtained by taking the derivative of F(t).

f(t) = \frac{d}{dt}F(t) = \frac{d}{dt}(1-e^{-\lambda t}) = \lambda e^{-\lambda t}

The random variable T, the wait time between successive events is an exponential distribution with parameter \lambda.

Let me map this on to the experiences I had today.

If on average, 25 vehicles pass the toll per hour, \lambda=25 per hour. Then the wait time distribution for the next vehicle at the toll should look like this.

The probability that I will wait more than 5 minutes to pass the toll is  P(T > 5) = e^{-\lambda t} = e^{-25*(5/60)} = 0.125.

So, the probability that my wait time will be less than 5 minutes is 0.875. Not bad. I should have known this before I swore at the guy who got in my way.

It is clear that the distribution will be flatter if \lambda is smaller and steeper if \lambda is larger.

10:00 PM

I lay in my bed with a feeling of accomplishment. My waiting day was eventful; I checked off all boxes on my to-do list. I now have a clear understanding of exponential distribution.

10:05 PM

I am hoping that I get the same beautiful dream. My mind is still on exponential distribution with one question.

“I waited 22 minutes for the train in the evening, why did it not arrive in the next few minutes? Since I waited a long time, shouldn’t the train arrive immediately?”

A tired body always beats the mind.

It was time for the last thought to dissolve into the darkness. The SHIREBOURN river is “waiting” for me on the other side of the darkness.

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

Lesson 42 – Bounded: The language of Beta distribution

Last week, in Lesson 41, we started toying with the idea of continuous probability distributions. When a random variable X is continuous (i.e., can be any Real number), we can compute the probability of X between any two values,  P(X \in (a,b)) = P(a \le X < b) using a continuous probability distribution function  f(x) .

The continuous probability density function (pdf) is the limiting shape of the frequency plot (histogram) of the data as the number of possible observations (n) goes to infinity. While the probability that the random variable X takes any specific value x is 0, the height of the smooth curve measures how dense the probability is at that point.

In the limit, as the number of observations approaches infinity (continuous), the proportion of observations that belong to an interval (a, b) is the probability that X is in this interval;  P(X \in (a,b)) = P(a \le X < b) .

This area under the curve is computed using the integral of the function over the range a to b.

 P(a < X < b) = \int_{a}^{b} f(x) dx

I left you with a few practice questions:

If X is a random variable with a probability distribution function defined as

 f(x) = 90x^{8}(1-x) for 0 < x < 1

  1. What is the probability that X is between 0.2 and 0.3?
  2. What is the probability that X will exceed 0.9?
  3. What is the median of X?

If you have solved these questions, more power to you. If you are waiting for the stars to align, this is that auspicious moment!

Let’s solve this problem step by step to understand the nuts and bolts of continuous distributions. At the end of the problem, I will lead you to our first type of continuous probability distribution functions, the Beta distribution and its special case, the uniform distribution. You will see why I selected this problem as the primer.

Our function is  f(x) = 90x^{8}(1-x) for 0 < x < 1.

The plot of this function reveals a bell-like shape. Notice that x is between 0 and 1 and the function is continuous.

Let’s take the first question: What is the probability that X is between 0.2 and 0.3?

 P(0.2 < X < 0.3) = \int_{0.2}^{0.3} f(x) dx

 = \int_{0.2}^{0.3} 90x^{8}(1-x) dx

 = \int_{0.2}^{0.3}90x^8 dx - \int_{0.2}^{0.3}90x^{9}dx

 = \frac{90}{9} x^{9}\Big|_{0.2}^{0.3} - \frac{90}{10} x^{10}\Big|_{0.2}^{0.3}

 = \frac{90}{9}(0.3^{9}-0.2^{9}) - \frac{90}{10}(0.3^{10}-0.2^{10})

 = 0.00014

Using the same procedure, we can solve for the probability that X will exceed 0.9?

 P(X > 0.9) = \int_{0.9}^{1} f(x) dx

 = \frac{90}{9} x^{9}\Big|_{0.9}^{1} - \frac{90}{10} x^{10}\Big|_{0.9}^{1}

 = \frac{90}{9}(1^{9}-0.9^{9}) - \frac{90}{10} (1^{10}-0.9^{10})

 = 0.264

We could have integrated the function from 0 to 0.9 and then subtracted this number from 1 because  P(X > x) = 1 - P(X \le x) and  P(X \le x) = \int_{0}^{x}f(x)dx in this case.

Also, remember that  P(X \le x) = F(x), the cumulative distribution function. We will be using the cumulative distribution function very often from now on.

Now, let us look at the third question: what is the median of X?

We know from order statistics that median is the 50th percentile, i.e., the value for which 50% of the values of X are below this number.

 P(X \le x_{median}) = F(x_{median}) = 0.5

\int_{0}^{x_{median}}f(x)dx = 0.5

 \int_{0}^{x_{median}}90x^{8}(1-x)dx = 0.5

This reduces to  10x_{median}^{9} - 9x_{median}^{10} = 0.5

We can use the Newton Raphson iterative method to find that the root of this equation is 0.84 when 0 < x < 1.

Hence,  x_{median} = 0.84

Beta Distribution

Now, look at the function I gave you carefully.
 f(x) = 90x^{8}(1-x) for 0 < x < 1

It is bounded between 0 and 1.

It has some exponents for x and (1-x); 8 and 1 in this case.

It has a constant, 90, acting as a multiplier.

The function we solved is a Beta distribution. The standard form, i.e., the probability density function of a Beta distribution is

 f(x) = cx^{a-1}(1-x)^{b-1} for 0 < x < 1.

As you can see, it is defined only in the 0 to 1 range. The beta distribution is a bounded distribution. The function is 0 everywhere else.

a and b are the parameters that control the shape of the distribution. They can take any positive real numbers; a > 0 and b > 0. In our example, a = 9 and b = 2. The distorted bell shape we have for the function is because of these two values.

c is called the normalizing constant. It ensures that the pdf integrates to 1. Take, for example, our function  f(x) = 90x^{8}(1-x) .

If we integrate the function  x^{8}(1-x) between 0 to 1 (over the range of x), we will get

 \int_{0}^{1} x^{8}(1-x)= \frac{1}{90}

For the pdf f(x) to integrate to unity, we need to multiply it with a constant 90. Hence, we had 90 as the multiplier for our function.

This normalizing constant c is called the beta function and is defined as the area under the graph of  x^{a-1}(1-x)^{b-1} between 0 and 1.

 c = \frac{1}{\int_{0}^{1} x^{a-1}(1-x)^{b-1} dx}

For integer values of a and b, this constant c is defined using the generalized factorial function.

 c = \frac{(a + b - 1)!}{(a-1)!(b-1)!}

In our example, a = 9 and b =2. Applying these numbers will give

 c = \frac{(9+2-1)!}{(9-1)!(2-1)!} = \frac{10!}{8!1!} = 90.

Did you observe that we just need the values of a and b to get the Beta distribution?

We call it the beta family as the curve will have different shapes depending on the values of a and b.

Substitute a = 1 and b = 1 in the standard function and see what you get.

 c = \frac{(1 + 1 - 1)!}{(1-1)!(1-1)!} = 1

 f(x) = 1x^{1-1}(1-x)^{1-1} = 1

A constant value 1 for all x. This flat function is called the uniform distribution. It is a special case of the beta distribution when a and b are 1. It looks like a rectangle or a flat line.

Now substitute a = 2 and b = 2 and see.

a = 0.5 and b = 0.5 will be a u-shape with asymptotic ends.

I want you to experiment with different values of a and b and visualize how the shape changes, like in the opening animation. Try it this week. Don’t wait till the R lesson. You have come this far, and you are already a good coder in R.

As you see here, the beta distribution is flexible to take on different shapes.

The uniform distribution is used for simulating data from different probability distributions. Again, meditate on this idea before we see it in an R lesson.

The beta distribution is also used as a probability distribution for the probability p of an outcome. The probability of the probability 😉
In other words, if we want to estimate the probability p of an outcome, we assume prior to having any data, that p follows a beta distribution (0 < p < 1). Once we have the data, we can update this knowledge using the Bayes rule.

The beta distribution is also typically used in project management when we want to estimate the probability of completing the project ahead of schedule. The duration of each job is a random variable that can be approximated using a beta distribution as it is bounded between the worst completion time (pessimistic) and best completion time (optimistic).

Knowing this about project management, I set out to complete several pending tasks during this Thanksgiving break. My initial estimated probability of completion was 0.91. After a somewhat lazy turkey day, I now realize that my lower bound (best completion time) should have been my upper bound (worst completion time). The fix is in. The probability of completing the pending tasks’ project in the Christmas break is 0.91.

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

Lesson 41 – Struck by a smooth function

Review lesson 32.

If you assume X is a random variable that represents the number of successes in a Bernoulli sequence of n trials, then this X follows a binomial distribution. The probability that this random variable X takes any value k, i.e., the probability of exactly k successes in n trials is:

Review lesson 33.

If we consider independent Bernoulli trials of 0s and 1s with some probability of occurrence p and assume X to be a random variable that measures the number of trials it takes to see the first success, then, X is said to be Geometrically distributed. The probability of first success in the kth trial is:

Review lesson 36.

The number of times an event occurs (counts) in an interval follows a Poisson distribution. The probability that X can take any particular value P(X = k) is:

The characteristic feature in all these distributions is that the random variable X is discrete. The possible outcomes are distinct numbers, which is why we called them discrete probability distributions.

Have you asked yourself, “what if the random variable X is continuous?” What is the probability that X can take any particular value x on the real number line which has infinite possibilities?

Let’s start with the most cliched thought experiment.

Yes, your guess is correct.

I am going to ask you to draw a ball at random from a box of ten balls. I am also going to ask you “what is the probability of selecting any particular ball?

Your answer will include, “not again,” and “since the balls are all identical, and there are ten in the box, the probability of selecting any particular ball is one-tenth (1/10); P(X = any ball out of ten balls) = 1/10.”

As I am about to ask my next question, you will interrupt me and give me the answer. “And if there are 20 balls, the probability will be 1/20.” You might also say, “spare your next question, because the answer is 1/100, and the visual for increasing number of balls looks like this.”

I am sure you have recognized the pattern here. As the sample size (n) becomes large, the probability of any one value approaches zero. For a continuous random variable, the number of possible outcomes is infinite, hence,

P(X = x) = 0.

For continuous random variables, the probability is defined in an interval between two values. It is computed using continuous probability distribution functions.

If you go back to lesson 15, you will recall how we made frequency plots. We partitioned the real number space into intervals or groups, recorded the number of observations (values) that fall into each group and used this grouping to build stacks.

Based on the number of observations in each interval, we can compute the probability that the random variable will occur in that intervals. For example, if there are ten observations out of 100 observations in a group, we estimate the probability that the variable occurs in this group as 10/100.

For continuous random variables, the proportion of observations in the group approaches the probability of being in the group, and the size of the group (interval range) approaches zero. For a large n, we can imagine a large number of very small intervals.

Is it too abstract?

If so, let’s take some data and observe this behavior.

We will use the same data that we used last week — daily temperature data for New York City. We have this data from 1869 to 2017, a large sample of 54227 values. We can assume that temperature data is a continuous random variable that has infinite possible values on the real number line.

I will take 500 data points at a time and place them on the number line. If there are two or more observations with the same temperature value, I will stack them. Recall that this is how we create histograms, the only difference is that I am not grouping. Each value is independent.

Observe this animation.

As the number of data points (sample size) increases, the stacks get denser and denser with overlaps. The final compact histogram can be approximated using a smooth function – a continuous probability distribution function.

Since for continuous random variables, the proportion of observations in the group approaches the probability of being in the group, the area of the interval block or the area under the curve of the smooth function is the probability that X is in that interval.

Finally, from calculus, you can see that the probability of a continuous variable in an interval a and b is:

An example area computation between -1 and 2 is shown.

The continuous probability distribution functions should obey the property of unit probability.

The limits of the integral are negative to positive infinity.

Now, we can integrate this function, f(x) up to any value x to get the cumulative distribution function (F(x)).

Since cumulative distribution function is the area under the curve up to a value of x, we are essentially computing .

Having this cumulative function is handy for computing the percentiles of the random variable.

Do you remember the concept of percentiles?

We learned in lesson 14 that percentiles are order statistics that can be used to summarize the data. A 75th percentile is that value of x which has 75% of the data less than this number. In other words, F(x) = 0.75.

Can you see how the cumulative distribution function, F(x) =  can be used to compute the percentiles?

Over the next few weeks, we will learn some special types of continuous distribution functions. Since you’ve been struck by smooth functions today, I will invite you to solve this.

If X is a random variable with a probability distribution function defined as

  • What is the median of X?
  • What is the probability that X is between 0.2 and 0.3?
  • What is the probability that X will exceed 0.9?

Post your answers in the comments section. The first correct answer will be highlighted next week.

No medal for solving it, but you don’t have to feel bad if you cannot. Many “modern engineering Ph.D. students” cannot solve this. Basic mathematics is no more a requirement in the new world with diverse backgrounds.

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

Lesson 38 – Correct guesses: The language of Hypergeometric distribution

Now that John prefers Pepsi let’s put him on the spot and ask him to choose it correctly. We will tell him that there is one Pepsi drink. We’ll ask him to chose it correctly when given three drinks.

These are his possible outcomes. He will pick the first, second or third can as Pepsi. If he chooses the first can, his guess is correct. So the probability that John is correctly guessing the one Pepsi can is 1/3. The probability that his guess is wrong is 2/3.

Since he is only choosing one correct Pepsi can, his outcomes are 0 or 1 right guess.

P(X = 0) = 2/3
P(X = 1) = 1/3

Let’s say that he liked Pepsi so much that he guessed it correctly in our first test.

We will put him through another test. This time, we will ask him to choose Pepsi correctly, when there are two cans in three.

He is to choose two cans, 1-2, 1-3, or 2-3. His guess outcomes are selecting a combination of two cans that will yield one Pepsi or both Pepsis. The only way to correctly guess both Pepsis is to choose 1-2. The probability is 1/3 (one option among three combinations).

There are two other ways (1-3, 2-3) for him to choose one Pepsi correctly. So the probability is 2/3.

P(X = 1) = 2/3
P(X = 2) = 1/3

John is good at this. He correctly guessed both the Pepsi cans. Let’s troll him by saying he got lucky so that he will take on the next test.

This time, he has to choose Pepsi correctly where there are two cans in four.

His picks can be 1-2, 1-3, 1-4, 2-3, 2-4, and 3-4. Six possibilities, since he is choosing two from four – 4C2. Based on his pick, he can get zero Pepsi, one Pepsi or both Pepsis. X = 0, 1, 2.

There is one possibility of both Pepsis; the combination 1-2. The probability that he chooses this combination among all others is 1/6.

P(X = 2) = 1/6

Similarly, there is one possibility of no Pepsi, the combination 3-4. The probability that he chooses these cans among all his options is 1/6.

P(X = 0) = 1/6

The other guess is choosing one Pepsi correctly. There are two Pepsi cans; he can pick either the first can or the second can as his choice, and then his second can will be the third or the fourth can.

1 with 3
1 with 4

2 with 3
2 with 4

There are 2C1 ways of choosing one Pepsi can among the two Pepsi cans. There are 2C1 ways of picking one Coke can from the two Coke cans.
The total possibilities are 2C1*2C1.

P(X = 1) = 4/6

Fortune favors the brave. John again correctly picked 1-2. Ask him to take one more test, and he will ask us to get out.

If you are like me, you want to play out one more scenario to understand the patterns and where we are going with this. So, let’s leave John alone and do this thought experiment ourselves. I will pretend you are John.

I will ask you to correctly pick Pepsis when there are three Pepsi cans among five.

When you pick three cans out of five, only one of them can be Pepsi, two of them can be Pepsi, or all three of them can be Pepsi. So X = 1, 2, 3.
You have a total of 10 combinations; pick three correctly from five. 5C3 ways as shown in this table.

The probability of picking all three Pepsis is 1/10.

P(X = 3) = 1/10

Now, let’s work out how many options we have for picking exactly one Pepsi in three cans. It has to be one Pepsi (one correct guess) with two Cokes (two wrong guesses). One Pepsi from three Pepsi cans can be selected in 3C1 ways. Two Cokes from two Cokes can be selected in 2C2 ways making it a total of 3C1*2C2 = 3 options. We can identify these options from our table as 1-4-5, 2-4-5 and 3-4-5. So,

P(X = 1) = 3/10

By the same logic, picking exactly two Pepsis in three cans can be done in 3C2*2C1 ways. Two Pepsi cans selected from three and one Coke chosen from two Coke cans. Six ways. Can you identify them in the table?

P(X = 2) = 6/10

Let us generalize.

If there are R Pepsi cans in a total of N cans (N-R Cokes) and we are asked to identify them correctly, in our choice selection of R Pepsis, we can get k = 0, 1, 2, … R Pepsis. The probability of correctly selecting k Pepsis is

X, the number of correct guesses (0, 1, 2, …, R) assumes a hypergeometric distribution.

The control parameters of the hypergeometric distribution are N and R.

The probability distribution for N = 5, and R = 2 is given.

When N =5 and R = 3.

In more generalized terms, if there are R Pepsi cans in a total of N cans (N-R Cokes) and we randomly select n cans from this lot of N and define X to be the number of Pepsis in our sample of n, then the distribution of X is hypergeometric distribution. P(X=k) in this sample is then:

The denominator NCn is the number of ways you can select n cans out of a total of N. A sample of n from N. The first term in the numerator is selecting k (correct guesses) out of R Pepsis. The second term is selecting (n-k) (wrong guesses) from the remaining (N-R) Cokes.

I want you to start visualizing how the probability distribution changes for different values of N, R, and n.

Next week, we will learn how to work with all discrete distributions in our computer programming tool R.

Hypergeometric distribution is typically used in quality control analysis for estimating the probability of defective items out of a selected lot.

The Pepsi-Coke marketing analysis is another example application. Companies can analyze the preferences of one product to other among a subset of customers in their region.

Now think of this:

There are R Republican leaning voters in a population of N. For simplicity (and for all practical purposes since LP and GP will not win); the other N-R voters are leaning Democratic. If you select a random sample of n voters, what is the probability that you will have more than k of them voting Republican? You know that the control parameters for this “election forecast” model are N, R, and n. What if you underestimated the number of R leaning voters? What if your sample of n voters is not random?

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

Lesson 37 – Still counting: Poisson distribution

The conference table was arranged neatly with a notebook and pen at each chair. Mumble’s Macbook Air is hooked up to the projector at the other end.

He peeps over the misty window. A hazy and rainy outside did not elevate his senses. He will meet Lani from the risk team of California builders. Able invited him to New York to discuss a potential deal on earthquake insurance products.

Mumble goes over the talking points in his mind. He is to discuss the technical details with Able and Lani in the morning. Later, he will present his analysis of product lines and market studies to the executive officers.

Still facing the window, Mumble takes a deep breath, feels confident about his opening jokes, and turns around to greet Able and Lani who just entered the conference room.

Lani initiates the conversation and talks about teamwork and success. Mumble was not impressed. He has heard this teamwork mantra many a time now. Lani came across as an all talk no action guy.

Lani nods and picks up the pen to take some notes.

Able’s eyes rolled over towards Lani. He expected him to interject. Afterall, Lani’s bio says that he is a mathematician.

Lani did not interrupt. He was busy writing down points on his notepad.

Able was again expecting Lani to latch onto the equations.

“This is very good. Spending time on these details is essential. I am looking forward to our successful collaboration Mumble. I love Math too.”

Able and Mumble glanced at each other and politely concealed their emotions.

“Definitely. California Builders are at the forefront in Earthquake risk for housing projects. We can collect data for tremors in California. We work on several models for the benefit of our clients. Our team consists of many mathematicians and engineers. Mumble, I agree with Mr. Able. You did a fantastic job. We can work together on this project to produce risk models.”

Mumble has now confirmed that Lani is all talk no substance. He is just looking to delegate work and take credit.

Able has little patience for these platitudes. He is beginning to realize that Lani is a paper mathematician who may have taken one extra Math course in college and just calls himself a “mathematician” since it sounds “Einsteinian.” The deal with California Builders just became “no deal.”

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

Lesson 36 – Counts: The language of Poisson distribution

I want you to meet my friends, Mr. Able and Mr. Mumble.

Unlike me who talks about risk and risk management for a living, they take and manage risk and make a killing.

True to his name, Mr. Mumble is soft-spoken and humble. He is the go-to person for numbers, data, and models. Like many of his contemporaries, he checked off the bucket list given to him and stepped on the ladder to success.

Some of you may be familiar with this bucket list.

Mumble was a “High School STEM intern,” was part of a “High School to College Bridge Program for STEM,” has a “STEM degree,” and completed his “STEM CAREER Development Program.” Heck, he even was called a “STEM Coder” during his brief stint with a computer programming learning center. A few more “STEMs,” and he’d be ready for stem cell research.

These “STEM initiatives” have landed him a risk officer job for an assets insurance company in the financial district.

On the contrary, Mr. Able is your quintessential American pride who has a standard high school education, was able to pay for his college through odd jobs, worked for a real estate company for some time, learned the trade and branched off to start his own business that insures properties against catastrophic risk.

He is very astute and understands the technical aspects involved in his business. Let’s say you cannot throw in some lingo into your presentation and get away without his questions. He is not your “I don’t get the equations stuff” person. He is the BOSS.

Last week, while you were learning the negative binomial distribution, Able and Mumble were planning a new hurricane insurance product. Their company would sell insurance against hurricane damages. Property owners will pay an annual premium to collect payouts in case of damages.

As you know, the planning phase involves discussion about available data and hurricane and damage probabilities. The meeting is in their 61st-floor conference room that oversees the Brooklyn Bridge.

Mumble, is there an update on the hurricane data? Do you have any thoughts on how we can compute the probabilities of a certain number of hurricanes per year?

 

Mr. Able, the National Oceanic and Atmospheric Administration’s (NOAA) National Hurricane Center archives the data on hurricanes and tropical storms. I could find historical information on each storm, their track history, meteorological statistics like wind speeds, pressures, etc. They also have information on the casualties and damages.

That is excellent. A good starting point. Have you crunched the numbers yet? There must be a lot of these hurricanes this year. I keep hearing they are unprecedented.

 

Counting Ophelia, we had ten hurricanes this year. Take a look at this table from their website. I am counting hurricanes of all categories. I recall from our last meeting that Hurin will cover all categories. By the way, I never liked the name Hurin for hurricane insurance. It sounds like aspirin.

Don’t worry about the name. Our marketing team has it covered. Funny name branding has its influence. You will learn when you rotate through the sales and marketing team. Tell me about the counts for 2016, 2015, etc. Did you count the number of hurricanes for all the previous years?

 

Yes. Here are the table and a plot showing the counts for each year from 1996 to 2017.

 

Based on this 22-year data, we see that the lowest number per year is two hurricanes and the highest number is 15 hurricanes. When we are designing the payout structure, we should have this in mind. Our claim applications will be a function of the number of hurricanes. Can we compute the probability of having more than 15 hurricanes in a year using some distribution?

Absolutely. If we assume hurricane events are independent (the occurrence of one event does not affect the probability that a second event will occur), then the counts per year can be assumed a random variable that follows a probability distribution. Counts, i.e., the number of times an event occurs in an interval follows a Poisson distributionIn our case, we are counting events that occur in time, and the interval is one year.

Let’s say the random variable is X and it can be any value zero hurricanes, one hurricane, two hurricanes, ….. What is the probability that X can take any particular value P(X = k)? What are the control parameters?

 

Poisson distribution has one control parameter 

It is the rate of occurrence; the average number of hurricanes per year. Based on our data, lambda is 7.18 hurricanes per year. The probability P(X = k) for a unit time interval t is

The expected value and the variance of this distribution are both

We can compute the probability of having more than 15 hurricanes in a year by adding P(X = 16) + P(X = 17) + P(X = 18) and so on. Since 15 happened to be in the extreme, the probability will be small, but our risk planning should include it. Extreme events will create a catastrophic damage. I see you have more slides on your deck. Do you also have the probability distribution plotted?

 

Yes, I have them. I computed the P(X = k) for k = 0, 1, 2, …, 20 and plotted the distribution. It looks like this for = 7.18.

Let me show you one more probability distribution. This one is for storms originating in the western Pacific. They are reported to the Joint Typhoon Warning Center. Since we also insure assets in Asia, this data and the probability estimates will be useful to design premiums and payouts there. The rate of events is higher in Asia; an average of 14.95 typhoons per year. The maximum number of typhoons is 21.

 

Very impressive Mumble. You have the foresight to consider different scenarios.

 

As the meeting comes to closure, Mr. Able is busy checking his emails on the phone. A visibly jubilant Mumble sits in his chair and collects the papers from the table. He is happy for having completed a meeting with Mr. Able without many questions. He is already thinking of his evening drink.

The next meeting is in one week. Just as Mr. Able gets up to leave the conference room, he pauses and looks at Mumble.

“Why is it called Poisson distribution? How is this probability distribution different from the Binomial distribution? Didn’t you say in a previous meeting that exactly one landfall in the next four hurricanes is binomial?”

Mumble gets cold feet. His mind already switched over to the drinks after the last slide; he couldn’t come up with a quick answer. As he begins to mumble, Able gets sidetracked with a phone call. “See you next week Mumble.” He leaves the room.

Mumble gets up and watches over the window — bright sunny afternoon. He refills his coffee mug, takes a sip and reflects on the meeting and the question.

To be continued…

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

Lesson 35 – Trials to ‘r’th success: The language of Negative Binomial distribution

We all know that the trials to the first success is a Geometric distribution. It can take one trial, two trials, three trials, etc., to see the first success. These trials are assumed a random variable X = {1, 2, 3, … }; they have a probability, i.e., P(X = 1), P(X = 2), P(X = 3), and so on.

However, Mr. Gardner needs more success. He has already sold his first “time machine” (aka bone density scanner). He’d have to sell more. He is looking for his second success, third success and so forth to get through.

The number of trials it takes to see the second success is Negative Binomial distribution.

The number of trials it takes to see the third success is Negative Binomial distribution.

The number of trials it takes to see the ‘r’th success is Negative Binomial distribution.

Are you paying attention to the pattern here? Negative Binomial distribution is Geometric distribution if r is 1 (trials to first success).

The Geometric distribution has one control parameter, p, the probability of success.

Since we are interested in more than the first success, r is another parameter in the Negative Binomial distribution. Together, p and r determine how the distribution looks.

Let’s take the example of Mr. Gardner. “I’d have to sell one more.” Each of his visit to a doctor’s office is a trial. They will buy the bone density scanner or show him the door. So Mr. Gardner is working with a probability of success, p. Let’s say that p is 0.25, i.e., there is a 25% chance that he will succeed in selling it.

Let’s assume that he sold one machine. He is looking for his second sale. r = 2.

His second success can occur in the second trial, the third trial, the fourth trial, etc. When r = 2, X, the random variable of the number of trials will be X = {2, 3, 4, …}.

When r = 3, X = {3, 4, 5, …}.

You correctly guessed my next line. When r = 1, i.e., for Geometric distribution, X = {1, 2, 3, …}.

There is a pattern. We are learning a distribution which is an extension of Geometric distribution.

Now let us compute the probability that X can take any integer value.

P(X = 2) is the probability that he makes his second sale (second success) on the second trial.

Remember the trials are independent. The second doctor’s decision is not dependent on what happened before. He buys or not with a 0.25 probability.

So, P(X = 2) is 0.25*0.25 = 0.0625. The first trial is a success and the second trial is a success.

P(X = 3) is the probability that he makes his second sale on the third trial. It means he must have made his first sale in either the first trial or the second trial, and then he makes his second sale on the third trial.

The probability of succeeding second time on the third trial is the probability of succeeding once in two trials, and the third trial is a success.

P(1 success in 2 trials) * P(3rd is a success)

The probability of the one success in two trials is computed using the Binomial distribution.

2C1*p^1*(1-p)^2-1

This probability is multiplied by p, the probability of success in the third trial.

P(X = 3) = 2C1*p^1*(1-p)^2-1*p

If this is your expression now, 😕 let’s take another case to clear up the concept.

Suppose we want P(X = 6), the probability of making the second sale on the sixth trial.

This will happen in the following way.

Mr. Gardner has to sell one machine in five trials. P(1 in 5), one success in five trials → Binomial.

5C1*p^1(1-p)^(5-1)

Then, the sixth trial is a sell. So we multiply the above binomial probability with p, the hit in the sixth.

You should have noticed the origin of the name → Negative Binomial.

To generalize,

I computed these probabilities for X ranging from 2 to 50. Here is the probability distribution. Remember r = 2 and p = 0.25.

Now I change the value for r to 3 and 5 to see how the Negative Binomial distribution looks. r = 3 means Mr. Gardner will sell his third machine.

r = 3

r = 5

Notice how the probability distribution shifts with increasing values of r.

The control parameters are r and p.

You can try different values of p and see what happens. For a fixed value of p and changing values of r, the tail is getting bigger and bigger. What does it mean regarding the number trials and their probability for Mr. Gardner?

Think about this. If he sets a target of selling three machines in the day, what is the probability that he can achieve his goal within 20 doctor visits?

What if he needs to sell five machines within this 20 visits to make his ends meet. Can he make it?

If you try out the probability distribution plots for p = 0.5, you will see that his chance of selling the fifth machine within 20 visits goes up tremendously. So perhaps he should learn the six principles of influence and persuasion to increase p, the probability of saying yes by the doctors.

People mostly prefer to say yes to the request of someone they know and like. I want our blog to have the 9000th user within nine months. See, I am requesting in the language of Negative Binomial distribution.

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

Lesson 34 – I’ll be back: The language of Return Period

The average time between Arnold Schwarzenegger’s being back is the return period of his stunt.

I have to wait 5 minutes for my next bus. Some days I wait for 1 minute; some days, I wait for 15 minutes. The wait time is variable → random variable 😉 The average of these wait times is the return period of my bus.

Your recent vocabulary may include “100-year event” (happening more often), (drainage system designed for) “10-year storm,” and so on, courtesy mainstream media and news outlets.

Houston drainage grid ‘so obsolete it’s just unbelievable’

What exactly is this return period business?

Does a 10-year return period event occur diligently every ten years?

Can a 100-year event occur three times in a row?

THE LOGIC

Let’s visit annual maximum rainfall for Houston. If we take the daily rainfall data for each year from January 1 to December 31 and choose the maximum rainfall among these days, we call it annual maximum rainfall for that year. So this is the rainfall for the wettest day of the year. Likewise, if we do this for all the years that we have data for, we get a data series (also called time series since we are recording this in time units).

You can get the data from here if you like. You may have to register using your email, but its free. We have 79 years of recorded data from 1939 to 2017. 79 data points, one number per year as the rainfall for the wettest day in that year. You will see that five years are missing between 1942 and 1946.

I want you to understand that these numbers represent a random variable X. Each number (outcome) is assumed to be independent, i.e., the occurrence of one event in one year does not influence the occurrence of the subsequent event. In other words, 2017 rainfall does not depend on 2016 rainfall.

Now, I want you to see Brays Bayou, the lake that detains excess rainfall in Houston. Let us assume that it can store up to eight inches of rainfall on any day. If it rains more than eight inches in a day, the Bayou will overflow and cause flood — as we saw in Houston during hurricane Harvey.

So, if the rainfall is greater than eight inches, we define this as an event. Let us call him Bob. The first time we see Bob was in 1949. We started recording data in 1939. Bob happened after 11 years. The wait time for Bob (1949) is 11 years.

Then we get on with our lives, 11 years passed, Bob is not back, 22 years passed, no sign of Bob. Suddenly, after 30 years of waiting from 1949, Bob Strikes Back (1979).

Two years after this event happened, Bob wanted to greet the Millenials, so he came back in 1981. This time, the waiting period is only two years.

Then, in 1989, for no particular reason, Bob returns. The return of Bob (1989) is after eight years.

You must be thinking: “I don’t see any pattern here.” Yes, that is because there is none.

Years pass, Bob seems to be resting. At the turn of the century, Bob decided to come back. So Bob Meets the 21st Century in 2001 after 12 years since his prior occurrence. Bob re-occurs. Recurrence.

During the first decade of the 21st century, Bob re-occurs two times, once in 2006 as the Restless Bob (5-year wait time) and again in 2008 as Miss Me Yet, Bob (2-year wait time).

We all know what happened after that. Vengeant Bob (2017), aka Harvey, happened after nine years.

Now, let’s summarize all Bobs along with their recurrence times. We started with the assumption that the maximum rainfall events represent a random variable X. Let us define T as another random variable that measures the time between the event Bob (wait time or time to the next event or time to the first event since the previous event).

The return period of the event Bob, (X > 8 inches) is the expected value of T, i.e., E[T], its average measured over a large number of such occurrences.

As you can see here, in the table, the return period of Bob is approximately ten years. Bob is a 10-year return period event.

Another way of thinking about this: Since there are eight Bob events in 79 years, they occur at an average rate of 79/8. Approximately, once in 10 years. Hence originated the 10-year event concept.

Remember, they don’t happen cyclically every ten years. If we average the wait times of a lot of events, we will get approximately ten years.

Just like when you wait for the bus, you wait for short time or a long time, but you think of the average time you wait for a bus everyday, you can see events happening in a cluster or spaced out, but all average to an nyear return period.

The relation to Geometric distribution

Last week when we learned Geometric distribution, I told you that we would relate the expected value of the Geometric distribution to return period of an event. Let’s see how Bob relates to Geometric distribution.

I want you to convert the maximum rainfall data series into a series of independent Bernoulli trials of 0s and 1s. 0 if the rainfall is < eight inches (No Bob), 1 if the rainfall is > eight inches (Yes Bob). The 1s can occur with some probability of occurrence p. In our example, since we have 8 Bobs in 79 years the probability of occurrence p = 8/79 = 0.101.

Now, assume T to be a random variable that measures the number of trials (years) it takes to see the first success (event), or the next event from each such event. For the first event, Bob (1949), it took 11 years to occur. The probability that T = 11, P(T = 11) is (1 – p)^10*p. Similarly, the next Bob happened after 30 years and so on. T is the time to first success (next success) → Geometrically distributed.

We can derive the expected value of T using the expectation operation we learned in lesson 24.

Now, recall from your math classes that the expression inside the parenthesis looks like a power series. Ponder over it and confirm that the whole expression will reduce to

E[T] = 1/p

The expected value of the wait time that is Geometrically distributed is the inverse of the probability of the event. Since the probability of Bob is 0.101, the return period (expected value of the wait times) is 1/0.101 ~ ten years. A 10-year return period event.

The Question

We measured the probability over 79 years; n = 79. We assumed that the probability is constant over all the trials.

In other words, we are assuming that we know p and it does not change.

If I were writing this lesson last year, the probability would have been 7/78 = 0.089. Since Harvey (The Vengeant Bob), the probability became 8/79 = 0.101. There are also five missing years.

Perhaps we do not know the true value of p, and perhaps it is not constant.

How then, can you estimate the risk of anything? How then, can you predict anything? How then, can you design anything? 

If I haven’t confused you enough, let me end with one of my favorite quotes from Nicholas Taleb’s book Antifragile: Things that gain from disorder.

“It is hard to explain to naive data-driven people that risk is in the future, not in the past.”

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

error

Enjoy this blog? Please spread the word :)