Lesson 34 – I’ll be back: The language of Return Period

The average time between Arnold Schwarzenegger’s being back is the return period of his stunt.

I have to wait 5 minutes for my next bus. Some days I wait for 1 minute; some days, I wait for 15 minutes. The wait time is variable → random variable 😉 The average of these wait times is the return period of my bus.

Your recent vocabulary may include “100-year event” (happening more often), (drainage system designed for) “10-year storm,” and so on, courtesy mainstream media and news outlets.

Houston drainage grid ‘so obsolete it’s just unbelievable’

What exactly is this return period business?

Does a 10-year return period event occur diligently every ten years?

Can a 100-year event occur three times in a row?

THE LOGIC

Let’s visit annual maximum rainfall for Houston. If we take the daily rainfall data for each year from January 1 to December 31 and choose the maximum rainfall among these days, we call it annual maximum rainfall for that year. So this is the rainfall for the wettest day of the year. Likewise, if we do this for all the years that we have data for, we get a data series (also called time series since we are recording this in time units).

You can get the data from here if you like. You may have to register using your email, but its free. We have 79 years of recorded data from 1939 to 2017. 79 data points, one number per year as the rainfall for the wettest day in that year. You will see that five years are missing between 1942 and 1946.

I want you to understand that these numbers represent a random variable X. Each number (outcome) is assumed to be independent, i.e., the occurrence of one event in one year does not influence the occurrence of the subsequent event. In other words, 2017 rainfall does not depend on 2016 rainfall.

Now, I want you to see Brays Bayou, the lake that detains excess rainfall in Houston. Let us assume that it can store up to eight inches of rainfall on any day. If it rains more than eight inches in a day, the Bayou will overflow and cause flood — as we saw in Houston during hurricane Harvey.

So, if the rainfall is greater than eight inches, we define this as an event. Let us call him Bob. The first time we see Bob was in 1949. We started recording data in 1939. Bob happened after 11 years. The wait time for Bob (1949) is 11 years.

Then we get on with our lives, 11 years passed, Bob is not back, 22 years passed, no sign of Bob. Suddenly, after 30 years of waiting from 1949, Bob Strikes Back (1979).

Two years after this event happened, Bob wanted to greet the Millenials, so he came back in 1981. This time, the waiting period is only two years.

Then, in 1989, for no particular reason, Bob returns. The return of Bob (1989) is after eight years.

You must be thinking: “I don’t see any pattern here.” Yes, that is because there is none.

Years pass, Bob seems to be resting. At the turn of the century, Bob decided to come back. So Bob Meets the 21st Century in 2001 after 12 years since his prior occurrence. Bob re-occurs. Recurrence.

During the first decade of the 21st century, Bob re-occurs two times, once in 2006 as the Restless Bob (5-year wait time) and again in 2008 as Miss Me Yet, Bob (2-year wait time).

We all know what happened after that. Vengeant Bob (2017), aka Harvey, happened after nine years.

Now, let’s summarize all Bobs along with their recurrence times. We started with the assumption that the maximum rainfall events represent a random variable X. Let us define T as another random variable that measures the time between the event Bob (wait time or time to the next event or time to the first event since the previous event).

The return period of the event Bob, (X > 8 inches) is the expected value of T, i.e., E[T], its average measured over a large number of such occurrences.

As you can see here, in the table, the return period of Bob is approximately ten years. Bob is a 10-year return period event.

Another way of thinking about this: Since there are eight Bob events in 79 years, they occur at an average rate of 79/8. Approximately, once in 10 years. Hence originated the 10-year event concept.

Remember, they don’t happen cyclically every ten years. If we average the wait times of a lot of events, we will get approximately ten years.

Just like when you wait for the bus, you wait for short time or a long time, but you think of the average time you wait for a bus everyday, you can see events happening in a cluster or spaced out, but all average to an n–year return period.

The relation to Geometric distribution

Last week when we learned Geometric distribution, I told you that we would relate the expected value of the Geometric distribution to return period of an event. Let’s see how Bob relates to Geometric distribution.

I want you to convert the maximum rainfall data series into a series of independent Bernoulli trials of 0s and 1s. 0 if the rainfall is < eight inches (No Bob), 1 if the rainfall is > eight inches (Yes Bob). The 1s can occur with some probability of occurrence p. In our example, since we have 8 Bobs in 79 years the probability of occurrence p = 8/79 = 0.101.

Now, assume T to be a random variable that measures the number of trials (years) it takes to see the first success (event), or the next event from each such event. For the first event, Bob (1949), it took 11 years to occur. The probability that T = 11, P(T = 11) is (1 – p)^10*p. Similarly, the next Bob happened after 30 years and so on. T is the time to first success (next success) → Geometrically distributed.

We can derive the expected value of T using the expectation operation we learned in lesson 24.

Now, recall from your math classes that the expression inside the parenthesis looks like a power series. Ponder over it and confirm that the whole expression will reduce to

E[T] = 1/p

The expected value of the wait time that is Geometrically distributed is the inverse of the probability of the event. Since the probability of Bob is 0.101, the return period (expected value of the wait times) is 1/0.101 ~ ten years. A 10-year return period event.

The Question

We measured the probability over 79 years; n = 79. We assumed that the probability is constant over all the trials.

In other words, we are assuming that we know p and it does not change.

If I were writing this lesson last year, the probability would have been 7/78 = 0.089. Since Harvey (The Vengeant Bob), the probability became 8/79 = 0.101. There are also five missing years.

Perhaps we do not know the true value of p, and perhaps it is not constant.

How then, can you estimate the risk of anything? How then, can you predict anything? How then, can you design anything?

If I haven’t confused you enough, let me end with one of my favorite quotes from Nicholas Taleb’s book Antifragile: Things that gain from disorder.

“It is hard to explain to naive data-driven people that risk is in the future, not in the past.”

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

THE LOGIC

The relation to Geometric distribution

The Question

Enjoy this blog? Please spread the word :)