Lesson 58 – Max (Min): The language of extreme value distribution

Mumble, Joe, and Devine meet to discuss extreme values and extreme value distribution. Extremus Distributus.

 

 

J: Extreme value distribution? Are we talking about the pictures we saw last week? The maximum of any distribution following specific probability distribution. How does my driving experience today morning fit the discussion?

M: Can you describe a bit more about your experience?

J: Yes. I was waiting at the merging lane near GW Bridge. The arrival time between successive vehicles was pretty fast; I was left there stranded.

M: We can assume that the arrival times between successive vehicles has an exponential distribution. As you have shown in your graphic, let’s call this random variable X.

X \sim Exp(\lambda)

Let’s say there are n+1 vehicles so that there are n arrival times between successive vehicles that can be shown as a set of random variables.

X_{1}, X_{2}, X_{3}, ..., X_{n}

This means your wait time to merge between successive vehicles could have been any of these n random variables. The maximum wait time is the max of these numbers. Let’s call this maximum time, Y.

Y = max(X_{1}, X_{2}, X_{3}, ..., X_{n})

Y follows an extreme value distribution and it is related to the distribution of X_{1}, X_{2}, X_{3}, ..., X_{n}.

D: Since in your illustration, you have an exponential wait time as the parent distribution, Y, the maximum of X’s will have a distribution that is related to the exponential distribution. We can derive the exact function of Y based on what we know about X.

M: In fact, we use exponential distribution as the parent distribution to model maximum wait time between earthquakes, floods, droughts, etc. to price our insurance products.

J: How can we derive the probability function of Y from X? Can we go through that?

D: Sure. Mumble showed before that Y = max(X_{1}, X_{2}, X_{3}, ..., X_{n}).

The cumulative distribution function F_{Y}(y)=P(Y \le y). Because Y is the maximum X_{1}, X_{2}, X_{3}, ..., X_{n}, y should be greater than all these values. In other words,

F_{Y}(y)=P(Y \le y) = P(X_{1} \le y \cap X_{2} \le y \cap ... X_{n} \le y)

We assume that X_{1}, X_{2}, X_{3}, ..., X_{n} are statistically independent; hence, the cumulative distribution of Y is the product of the cumulative distribution of individual parent distributions.

F_{Y}(y)=P(Y \le y) = P(X_{1} \le y)P( X_{2} \le y) ...P(X_{n} \le y) = [F_{X}(y)]^{n}

From this cumulative function, we can derive the probability function.

f_{Y}(y) = \frac{d}{dy}F_{Y}(y) = \frac{d}{dy}[F_{X}(y)]^{n}

f_{Y}(y) = n[F_{X}(y)]^{n-1}f_{X}(y)

M: If we know the distribution of X (parent), we can substitute that in our equations here and get the distribution of Y. Do you want to try it for the exponential wait times?

J: Sure. The cumulative function for an exponential distribution is F_{X}(x) = 1 - e^{-\lambda x}. The probability function is f_{X}(x)=\lambda e^{-\lambda x}. For Y it will be

F_{Y}(y) = [1 - e^{-\lambda y}]^{n}

f_{Y}(y) = n[1 - e^{-\lambda y}]^{n-1}\lambda e^{-\lambda y}

D: Excellent. Did you also notice that the cumulative and probability functions are dependent on n, the number of independent random variables X_{1}, X_{2}, X_{3}, ..., X_{n}.

Here is how they vary for exponential parent distribution. They shift to the right with increasing values of n.

J: We can derive the functions for the minimum in a similar way I suppose.

M: That is correct. If Y_{min} is the minimum of the random variables, then, each of these independent X_{1}, X_{2}, X_{3}, ..., X_{n} are greater than y_{min}.

P(X_{1}>y_{min}, X_{2}>y_{min}, ..., X_{n}>y_{min}) = 1 - F_{Y_{min}}(y_{min})

Using this equation, we can derive F_{Y_{min}}(y_{min}) = 1 - [1-F_{X}(y_{min})]^{n} and f_{Y_{min}}(y_{min})=n[1-F_{X}(y_{min})]^{n-1}f_{X}(y_{min}).

J: This is cool. All we need to know is the function for X, and we know everything about the extremes.

M: That is true to an extent. But what will we do if we do not know the form of the parent distribution with certainty? Remember Lesson 51? Did you check what happens to the functions with increasing values of n, i.e., n tends to \infty?

J: Let me plot the cumulative function for increasing values of n and see.

D: Joe, as you can see, F_{Y}(y) = [F_{X}(y)]^{n} tends to 0 as n tends to \infty. The function degeneates to a point. It is unstable.

J: Hmm. How do we stabilize it then?

M: Good question. We can work with a normalized version of Y instead of Y.

If there are two normalizing constants a_{n}>0 and b_{n}, we can create a normalized version of Y.

Y^{*} = \frac{Y-b_{n}}{a_{n}}

Proper values of these two normalizing constants a_{n}>0 and b_{n} will stabilize Y for increasing values of n. We need to find the distribution that Y^{*} can take in the limit as n tends to \infty.

D: Look at this visual to understand what stabilizing (norming) for increasing value of n means. It is for an exponential parent, our example. I am assuming a_{n} = 1 and b_{n} varies with n.

The normalized function does not degenerate for large values of n.

M: It turns out that the limit distribution of Y^{*}, the normalized version of Y, can be one of the three types, Type I, Type II and Type III.

If there exist normalizing constants a_{n}>0 and b_{n}, then,

P(\frac{Y - b_{n}}{a_{n}} \le z) \to G(z) as n \to \infty.

G(x) is the non-degenerate cumulative distribution function.

Type I (Gumbel Distribution): G(z) = e^{-e^{-\frac{z-\alpha}{\beta}}}. Double exponential distribution.

Type II (Frechet Distribution): G(z) = e^{-(\frac{z-\alpha}{\beta})^{-\gamma}} for x > \alpha and 0 for x \le \alpha. This is single exponential function.

Type III (Weibull Distribution): G(z) = e^{-[-\frac{z-\alpha}{\beta}]^{\gamma}} for x < \alpha and 1 for x \ge \alpha. This is also a single exponential distribution.

The normalized version of Y converges to these three types, also called the family of extreme value distribution. It is said that the distribution F_{Y}(y) is in the domain of attraction of G(z). \alpha, \beta, \gamma are the location (central tendency), scale and the shape controlling parameters.

J: This convergence theorem sounds like the central limit theorem.

D: Yes, like the sum of the independent random variables converges to the normal distribution in the limit when n \to \infty, the max of independent random variables converges to one of these three types, Gumbel, Frechet or Weibull depending on the parent distribution. Generally, exponential tails, for example, exponential and normal parent distribution tend to Gumbel, and polynomial tails tend to Frechet distribution.

You have already seen how the Gumbel looks like when exponential wait times converge. Here is an example Frechet and Weibull distribution.

Fisher and Tippet derived these three limiting distributions in the late 1920s. Emil Julius Gumbel, in chapter 5 of his book “Statistics of Extremes” explains the fundamentals of deriving the distributions.

J: This is plentyful. Are there any properties of the three types? Can we guess what distribution is used where? The converge of extremes to these three types seems very powerful. We can model outliers without much trouble. Some examples in R might help understand these better.

You already have a handful to digest. Let’s continue next week after you have processed it.

Be cautious. Coming up with the values for \alpha, \beta, \gamma in the real world involves uncertainty. The extreme value distribution need not be an elixir for outliers.

 

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

error

Enjoy this blog? Please spread the word :)