Lesson 62 – Knowing the unknown: Inferentia

Think about your recent wine tasting retreat. While you were relaxing and refreshing, you were also tasting/testing wine samples and guessing the quality of the product. You were knowing the unknown using samples and the process of induction. In other words, you were inferring about the true quality of the product based on the taste of the wine sample. A method of generalization from observations.

 

Think about your recent visit to the theater after being impressed by the trailer. You must have judged how good the movie will be, based on the sample they showed you. If you are like me, you would have told yourself that this is the last time you are watching a movie based on the trailer. But you keep going back. Here again, you are inferring about the film based on the sample teaser. Knowing the unknown via inference.

 

Do those robocalls for election survey or ‘get out the vote’ campaign bother you. Be nice to them. They are taking a sample survey to estimate/guess the election results. They are trying to know the unknown. Inferring the true outcome from the sample data.

 

Abraham Wald in 1942 used the sample of fighter planes that returned with bullet holes to judge/infer where to put the armor plates. It is where the holes aren’t there. The truth is that the planes with bullet holes in the wings returned, while the others did not.

Statistical Inference

Deriving general conclusion from what we observe (sample data) is central to increasing our knowledge about the world. Statistical inference helps us with this.

Imagine we wish to know about incomes or IQ levels of adult population in the US. The ideal thing to do is to test every adult for IQ and record their income levels. In other words, get the data for the entire population. But this is not practical. We can, however, take a sample that represents the population, know the IQ and incomes of this sample. We can make a general statement about the population from the sample. The statement could be “the average IQ of adults in the US is 100.” We get this based on a large enough representative sample. The average IQ from the sample is an estimate of the true IQ for the population. Since we are basing it on the sample, the estimate will be sensitive to how much data (sample size) we have.

Imagine we wish to know about the climate of a region we want to relocate to, i.e., what is the temperature distribution and how much does it rain or how many hurricanes does it get per year. Proper measurements of rainfall and temperature are a recent phenomenon. Maybe the last 80 years with good quality. We don’t have an observational record of what happened before that. But, we can take the data we have (sample of climate observations) and estimate the temperature distribution and the hurricane counts and make a statement about the overall climate in this place so you can relocate. The generalization can be “the average temperature of this place is 60F with four hurricanes per year.” Don’t complain after you move that the area is getting more hurricanes. The sample may not have represented the climate history. Our inference is based on what we had. We can always update it.

Come 2020, the media and polling companies will predict the outcome of the presidential election. They will make voter calls or take online surveys to infer which side the election will swing. Ask for sample breakdowns and survey questions. You will know what biases are built into it.

In the most recent lessons on the generalized extreme value distribution, we assumed that the data on extreme wind speed (block maxima) would most likely follow an extreme value distribution since maximum values converge to an extreme value distribution in the limit. So the data we observed is a sample that represents the true GEV functional form

G(z) = e^{-[1+\xi(\frac{z-\mu}{\sigma})]^{-1/\xi}_{+}}

This compact GEV function is the population. A true form of all extreme wind values in this place. The data we observed for a few years is a sample that represents this function (population). We can use the sample to draw conclusions (inference) about this GEV function.

The true location, scale and shape parameters for this GEV function are \mu, \sigma and \xi. Based on the data we observed, an estimate (a good guess) of this location, scale and shape parameters is 47.44, 5.03 and 0. We are drawing inference about the true function from the sample data we have at hand.

Hence, the objective of inference is to use a ‘sample’ to compute a number which represents a good guess for the true value of the parameter. The true population parameter is unknown, and what we estimate from the sample will enable us to obtain the closest answer in some sense.

The general rule, or a formula which is used to get the estimate, is called an estimator.

For example, if you compute the mean (\bar{x}) and variance (s^{2}) of the data, they are good guesses (estimates or estimators) of the mean (\mu) and variance (\sigma^{2}) of the population.

A different sample will yield a different estimate for the parameter. So we have to think of the estimate as a range of values, an interval or a probability distribution instead of a single value. The truth may be in this interval if we have good representative samples.

Over the next few lessons, we will learn some methods of estimation from data, how reasonable the estimates are, or the criteria for being good, their range or intervals, and many more concepts.

In summary, statistical inference is about understanding the entire population from samples and sample statistics.

The sample has to be representative of the population for good inference.

In other words, the sample distribution must be similar to the population distribution. Extreme winds data would be a good representation of an extreme value function as population, not normal distribution.

 

 

Joe and his girlfriend broke up. They inferred their future based on the sample dates. Is it a good inference?

 

 

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

Lesson 61 – Resto


It is a journey, long haul. Let’s take an exit, stretch our legs, get a coffee and trace our tracks.

Seeing Data and Making Friends with R

Lesson 1: When you see something, say data

Lesson 2: R is your companion

Sets and Properties

Lesson 3: Sets

Lesson 4: Visualizing Sets

Probability, Axioms, And rules

Lesson 6: Defining Probability

Lesson 7: Axioms of Probability

Lesson 9: Conditional Probability

Lesson 10: Independent Events

Lesson 12: Total Probability

Lesson 13: Bayes Rule

More R: Shortcuts, for, if-else, and functions
Lesson 5: Let us 'R'eview

Lesson 8: The search 'for' w'if'i

Lesson 11: Fran, the functionin' R-bot
Exploratory Data Analysis

Lesson 14: Order Statistics

Lesson 15: Frequency Plots

Lesson 16: Average

Lesson 17: Variance

Lesson 18: Skewness

Lesson 19: Outliers

Lesson 20: Coefficient of Variation

Random Variables, Expectation and others

Lesson 22: Random Variable

Lesson 23: Probability Distribution

Lesson 24: Expectation Operation

Lesson 25: More Expectation

Lesson 26: Variance Operation

Lesson 27: More Variance

Lesson 28: Standardization

More R: data exploration and large number games
Lesson 21: Beginners guide to summarize data in R

Lesson 29: Large number games in R
Discrete Probability Distributions

Lesson 31: Bernoulli Trials

Lesson 32: Binomial Distribution

Lesson 33: Geometric Distribution

Lesson 34: Return Period

Lesson 35: Negative Binomial Distribution

Lesson 36: Poisson Distribution

Lesson 37: Still Poisson Distribution

Lesson 38: Hypergeometric Distribution

Continuous Probability Distributions

Lesson 41: Continuous Functions

Lesson 42: Beta Distribution

Lesson 43: Exponential Distribution

Lesson 44: Memoryless Property

Lesson 45: Gamma Distribution

Lesson 46: Convolution

Lesson 47: Central Limit Theorem Visuals

Lesson 48: Central Limit Theorem Derivation

Lesson 49: Normal Distribution

Lesson 50: The Standard Normal Distribution

Lesson 51: Guest Blog: Prof. Upmanu Lall from Columbia University

Lesson 52: Lognormal Distribution

Lesson 53: Chi-square Distribution

Lesson 54: The Connection of Probability Distributions to Bernoulli

More R: Discrete and Continuous Distributions
Lesson 39: Discrete Distributions in R Part I

Lesson 40: Discrete Distributions in R Part II

Lesson 55: Continuous Distributions in R Part I

Lesson 56: Continuous Distributions in R Part II
Extreme Value Distributions

Lesson 57: Visual Understanding of Extreme Values

Lesson 58: Basics of Extreme Value Distributions

Lesson 59: Generalized Extreme Value Distribution

More R: Extreme Value Distributions
Lesson 60: Extreme Value Distributions in R
Inference

This is a classroom – Use it.

This is a resource – Take it.

This is a free source – Bookmark it.

This is knowledge, share it.

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

 

Lesson 60 – Extreme value distributions in R

“In theory, there is no difference between theory and practice. In practice, there is.” — ??

By now, you recognize the pattern in this classroom. We follow up theory with practice. A set of lessons with a common theme will culminate with some experience in R. Being true to our trend, today, we leap into the world of extRemes. Put on your hard hat and bring your tools and machinery. Don’t forget your drink of the day.

On the seventeenth day of March, two thousand eighteen, we met Maximus Extremus Distributus. He was taking different forms depending on the origin (parent) distribution. You can create him too using a few simple lines of code in R.

Here is the creation code for normal origins. We generate N = 1000 normally distributed random variables with a zero mean and unit standard deviation, select the maximum value out of these 1000 values, and repeat the process 1000 times to get 1000 maximum values. These maximum values converge to the Type I extreme value distribution – Gumbel (e^{-e^{-y}}). The code runs like an animation. You can control the speed by changing the number in Sys.sleep(). There is a plot that shows the convergence of the average values to a Normal distribution. Do you know why?

library(locfit)
library(logspline)

# Normal Origins #
par(mfrow=c(3,1))
x = rnorm(10000,0,1)
plot(logspline(x),xlim=c(-5,5),ylim=c(0,0.5),font=2,cex.lab=2,font.lab=2,xlab="Normal Distribution")

N = 1000
ave = matrix(NA,nrow=N,ncol=1)
max1 = matrix(NA,nrow=N,ncol=1)

for (i in 1:N)
{
 x = rnorm(N,0,1)
 
 lines(locfit(~x),col="grey")
 points(mean(x),0,col="blue",pch=17)
 points(max(x),0,col="red",pch=17)
 ave[i] = mean(x)
 max1[i] = max(x)
 Sys.sleep(0.01) 
}

Sys.sleep(1)
plot(locfit(~ave),xlim=c(-5,5),ylim=c(0,9),ylab="",cex.lab=2,font=2,font.lab=2,xlab="Normal Distribution",col="white")
lines(locfit(~ave),lwd=2,col="blue")

Sys.sleep(1)
plot(locfit(~max1),xlim=c(-5,5),font=2,font.lab=2,ylab="",xlab="Extreme Value Distribution (Gumbel)",cex.lab=2,col="white")
lines(locfit(~max1),lwd=2,col="red")


 

The creation code for exponential origins has the same procedure. We generate N = 1000 exponentially distributed random variables with \lambda = 0.5 as the parent. The maximum values of an exponential distribution again converge to the Gumbel distribution.

## EVD from Exponential## 
par(mfrow=c(3,1))
x = rexp(10000,0.5)
hist(x,prob=T,xlim=c(0,25),ylab="",main="",font=2,cex.lab=2,font.lab=2,xlab="Exponential Distribution")
plot(logspline(x),add=T)

N = 1000

ave = matrix(NA,nrow=N,ncol=1)
max1 = matrix(NA,nrow=N,ncol=1)

for (i in 1:N)
{
 x = rexp(N,0.5)
 
 lines(locfit(~x),col="grey")
 points(mean(x),0,col="blue",pch=17)
 points(max(x),0,col="red",pch=17)
 ave[i] = mean(x)
 max1[i] = max(x)
 Sys.sleep(0) 
}

Sys.sleep(1)
plot(locfit(~ave),xlim=c(0,25),ylim=c(0,4),ylab="",cex.lab=2,font=2,font.lab=2,xlab="Normal Distribution",col="white")
lines(locfit(~ave),lwd=2,col="blue")

Sys.sleep(1)
plot(locfit(~max1),xlim=c(0,25),font=2,font.lab=2,ylab="",xlab="Extreme Value Distribution (Gumbel)",cex.lab=2,col="white")
lines(locfit(~max1),lwd=2,col="red")

 

Here is the creation code for uniform origins. This time the maximum values from uniform distribution converge to a different type of extreme value distribution, the Type III Weibull distribution (e^{-(-y)^{\gamma}}).

# Uniform Origins #
x = runif(100000,0,1)
#hist(x,prob=T,xlim=c(0,1),ylab="",main="",font=2,cex.lab=2,font.lab=2,xlab="Uniform Distribution")
plot(density(x),xlim=c(0,1),ylab="",main="",font=2,cex.lab=2,font.lab=2,xlab="Uniform Distribution")

N = 200

ave = matrix(NA,nrow=N,ncol=1)
max1 = matrix(NA,nrow=N,ncol=1)

for (i in 1:N)
{
 x = runif(N,0,1)
 
 # lines(locfit(~x),col="grey")
 points(mean(x),0,col="blue",pch=17)
 points(max(x),0,col="red",pch=17)
 ave[i] = mean(x)
 max1[i] = max(x)
 Sys.sleep(0) 
}

Sys.sleep(1)
plot(locfit(~ave),xlim=c(0,1),ylim=c(0,30),ylab="",cex.lab=2,font=2,font.lab=2,xlab="Normal Distribution",col="white")
lines(locfit(~ave),lwd=2,col="blue")

Sys.sleep(1)
plot(locfit(~max1),xlim=c(0,1),ylim=c(0,65),font=2,font.lab=2,ylab="",xlab="Extreme Value Distribution (Weibull)",cex.lab=2,col="white")
lines(density(max1),lwd=2,col="red")

I want you to experiment with other types of origin distributions.

The three Types (Gumbel, Frechet, Weibull) and the GENERALIZED EXTREME VALUE DISTRIBUTION (GEV)

The three types of extreme value distributions have double exponential and single exponential forms. The maxima of independent random variables converge (in the limit when n \to \infty) to one of the three types, Gumbel (e^{-e^{-y}}), Frechet (e^{-y^{-\gamma}}) or Weibull (e^{-(-y)^{\gamma}}) depending on the parent distribution.

We saw last week that these three types could be combined into a single function called the generalized extreme value distribution (GEV).

G(z) = e^{-[1+\xi(\frac{z-\mu}{\sigma})]^{-1/\xi}_{+}}

\mu is the location parameter. \sigma > 0 is the scale parameter. \xi controls the shape of the distribution (shape parameter).

When \xi \to 0, GEV tends to a Gumbel distribution.

When \xi > 0, GEV tends to the Frechet distribution.

When \xi < 0, GEV tends to the Weibull distribution.

Its okay if you don’t know the origin distribution for an extreme dataset. GEV folds all the three types into one form, and the parameters \mu, \sigma, and \xi can be estimated from the data. The function has a closed form solution to compute the quantiles and probabilities.

GEV IN R

Let’s play with some data and use GEV in R. We will use two datasets, NYC temperature, and cycles to fatigue of steel. The NYC temperature data is the same file we used in lesson 56. The cycles to fatigue is the data from our labs where we measured the maximum number of cycles before failure due to fatigue for ten steel specimens.

Read the files into the workspace using the following lines.

# fatigue data
fatigue = read.table("cycles_fatigue.txt",header=F)

# temperature data 
temperature_data = read.csv("cp_temperature.csv",header=T)

ExtRemes Package

We need to install a package in R called extRemes. Type the following lines in your code.

install.packages("extRemes")
library(extRemes)

This package has functions built for GEV distribution. Let’s try a few simple things first by generating random variables of the three types. Type the following lines in your code.

x = revd(10000,loc=0,scale=1,shape=0)

This command (revd) will generate 10000 GEV random variables with a location of 0, scale of 1 and shape of 0. So, this is a Gumbel distribution.

Type these lines and see what you get.

# Generate Gumbel Random numbers
x = revd(10000,loc=0,scale=1,shape=0)
hist(x,prob=T,xlab="Random Variables from Gumbel (location = 0,scale = 1, shape =0)",main="Gumbel Distribution",ylab="f(x)",font=2,family="serif",font.lab=2,cex.lab=1.5)
plot(logspline(x),add=T)

Now hold the shape parameter constant at 0 and alter the location and scale parameters. A change in the location parameter will shift the distribution; a change in the scale parameter will stretch or shrink the distribution.

I guess you know the chores now.

Yes, we experiment with positive and negative shape parameters to generate the Frechet and Weibull distributions.

Generate Frechet Random numbers

# Frechet distribution plot
y = revd(10000,loc=0,scale=1,shape=0.2)
hist(y,prob=T,ylim=c(0,0.4),xlab="Random Variables from Frechet (location = 0, scale = 1, shape = 0.2)",main="Frechet Distribution",ylab="f(x)",font=2,family="serif",font.lab=2,cex.lab=1.5)
plot(logspline(y),add=T,col="red")
plot(logspline(x),add=T)

Generate Weibull Random numbers

## Weibull Distribution Plot 
z = revd(10000,loc=0,scale=1,shape=-0.6)
hist(z,prob=T,ylim=c(0,0.5),xlab="Random Variables from Weibull (location = 0, scale = 1, shape = -0.6)",main="Weibull Distribution",ylab="f(x)",font=2,family="serif",font.lab=2,cex.lab=1.5)
plot(logspline(z),add=T,col="red")

If you are comfortable with this, it is time to get your hand dirty with real data.

Fitting GEV distribution to data

Let’s examine the maximum cycles to fatigue data. We do not know which extreme value distribution it follows. If we fit a GEV and observe the shape parameter, we can say with certain confidence that the data follows Type I, Type II or Type III distribution.

For this, we can use the fevd command. Let’s use the fevd function to fit a GEV to the cycles to fatigue data.

fatigue_cycles = as.matrix(fatigue)

fit = fevd(fatigue_cycles,type="GEV")
summary(fit)

[1] "Estimation Method used: MLE"

Negative Log-Likelihood Value: 132.8045

Estimated parameters:
 location scale shape 
 5.765016e+05 1.680629e+05 -5.231015e-01

Standard Error Estimates:
 location scale shape 
6.276363e+04 6.709826e+04 4.199849e-01

Estimated parameter covariance matrix.
 location scale shape
location 3.939273e+09 -527662537.18 -9.654182e+03
scale -5.276625e+08 4502177071.08 -2.306339e+04
shape -9.654182e+03 -23063.39 1.763873e-01

AIC = 271.6089

BIC = 272.5167

Pay attention to the red highlighted text for now. It shows the results for the estimated parameters. The shape parameter is -0.52 (\xi < 0). So, a Weibull distribution fits the data with high likelihood.

Block Maxima

Let’s examine the temperature data. Assume we are interested in analyzing the data for maximum temperature each year. Remember we only care about the extremes. For this, we should first extract the annual maximum temperature values. This idea is called the block maxima. Each year is a block, and we get the maximum for each year.

Type the following lines in your code to get the annual maximum temperature values from 1951 to 2017.

# Annual Maximum Ave Temperature #
yr1 = 1951
yr2 = 2017
n = yr2 - yr1 + 1

annmax = matrix(NA,nrow=n,ncol=1)
for (i in 1:n)
{
 yr = 1950 + i
 index = which((temperature_data[,1] == yr))
 temperature_yr = temperature_data[index,4]
 annmax[i,1] = max(temperature_yr,na.rm=T)
}

hist(annmax,xlab="Annual Maximum Temperature",font=2,main="",font.lab=2)

There is one value very different and far away from all other values. This phenomenon is the feature of the extreme values.

Rare events never seen before can occur. Using a model (e.g., GEV function) for these “unknowns” comes with uncertainty.

Now let’s fit GEV to the temperature data and look at a few things.

fit_temperature = fevd(annmax,type="GEV")

The fevd function will fit a GEV distribution to the data. The location, scale and shape parameters of the function are estimated based on the data.

Now type the following line in your code.

plot(fit_temperature)

You should see the following figure appear in the plot window.

For now, focus on the bottom two images. The one in the bottom left is showing how well the GEV function (blue line) is matching the observed data (black line). Pretty reasonable.

Can you tell me what is the image on the bottom right?

Okay, let me ask you this. What is the probability that the annual maximum temperature will be less than or equal to 92 degrees F?

Yes, you can compute this from the cumulative distribution function of the GEV distribution.

G(z) = P(Z \le z) = e^{-[1+\xi(\frac{z-\mu}{\sigma})]^{-1/\xi}_{+}}

Substitute z = 92, \mu = 86.15, \sigma = 2.65, \xi = -0.03 in the equation and you get P(Z \le z) = 0.903.

I am sure you know why I substituted  \mu = 86.15, \sigma = 2.65, \xi = -0.03. These are the estimated parameters you get from the summary of the fit.

So, the probability that the annual maximum temperature will be less than or equal to 92 degrees F is 0.9.

Can I say, that the probability of exceeding 92 degrees F is 0.1?

P(Z > 92) = 0.1

A temperature of 92 degrees F is exceeded 10% of the times.

What do you know about exceedance probability and return period? If a random variable is exceeded with 10% probability, what is the frequency of its occurrence? Lesson 34? The language of return period?

Return Period T = \frac{1}{P(exceedance)}

So if 92 degrees is the temperature that exceeds 10% of the times, its return period is ten years.

On the average, daily temperature as high as 92 degrees will occur once every ten years in New York City.

In equation form, Return Period of a quantile z is T = \frac{1}{1 - G(z)}.

Now, can you answer this?

What temperature (z) occurs once in 50 years?

I know you are thinking of inverting the return period equation. That is the way.

G(z) = 1 - \frac{1}{T}

e^{-[1+\xi(\frac{z-\mu}{\sigma})]^{-1/\xi}} = 1 - \frac{1}{T}

[1+\xi(\frac{z-\mu}{\sigma})]^{-1/\xi} = -ln(1 - \frac{1}{T})

1+\xi(\frac{z-\mu}{\sigma}) = (-ln(1 - \frac{1}{T}))^{-\xi}

z = \mu + \frac{\sigma}{\xi}((-ln(1 - \frac{1}{T}))^{-\xi} - 1)

Now if you substitute T = 50 years and  \mu = 86.15, \sigma = 2.65, \xi = -0.03, you get z = 95.95.

The 50 year return period temperature is 95.95 degrees F.

This is what is being computed internally within the extRemes package to create the plot you see. For quantile z, extRemes package has qevd() function where you have to input probability p and other parameters. For probability P(Z \le z), it is pevd(), and you have to input the quantile z and the other parameters.

We can get their relations from the plot.

What are those dashed lines in the return period plot?

The 500 year return period temperature is 101.3 degree F. What does it even mean when they say the return period is 500 years?

How can something happen once in 500 years when we do not have 500 years of data?

Do you want to believe it? How confident are you in this estimate?

If this does not spin your head, let me add more. Look at the summary of the model once again.

summary(fit_temperature)

fevd(x = annmax, type = "GEV")

[1] "Estimation Method used: MLE"

 Negative Log-Likelihood Value: 169.1602

 Estimated parameters:
 location scale shape 
86.14530247 2.64886498 -0.02706872

Standard Error Estimates:
 location scale shape 
0.3552756 0.2500867 0.0663774

Estimated parameter covariance matrix.
 location scale shape
location 0.126220730 0.028413268 -0.006875094
scale 0.028413268 0.062543342 -0.003859995
shape -0.006875094 -0.003859995 0.004405960

AIC = 344.3204

BIC = 350.9344

The shape parameter is negative but very close to 0. Can we say that the GEV is a Gumbel distribution? Why not Weibull?

Did you ask yourself how these parameters are estimated from the data? How much data is required for this?

Did you ask yourself whether the shape parameter value is significantly different from 0 to say it is not Gumbel?

You can see so many new terms, “Estimation Method used: MLE,” standard error, and so on. What are these? What about the other two images in the fevd plot? Aren’t you curious about them? Well, hold on to that suspense for a few weeks. Very soon, we will start a new journey of inference.

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

Lesson 59 – The Generalized extreme value distribution

Recap

Two weeks ago, we met Maximus Extremus Distributus. He was taking different forms depending on the origin (parent) distribution.

Last week, Mumble, Joe, and Devine met to discuss the central ideas behind extreme value distribution. They derived the exact function for it.

F_{Y}(y)=P(Y \le y) = P(X_{1} \le y)P( X_{2} \le y) ...P(X_{n} \le y) = [F_{X}(y)]^{n}.

From this cumulative function, we can derive the probability function.

f_{Y}(y) = n[F_{X}(y)]^{n-1}f_{X}(y)

Mumble pointed out the issue of degeneration of the exact function as n \to \infty and introduced the idea of normalizing constants to stabilize the function.

If there are two normalizing constants a_{n}>0 and b_{n}, we can create a normalized version of Y as Y^{*} = \frac{Y-b_{n}}{a_{n}}.

Before we closed off for the week, we knew that Y^{*} converges to three types, Type I, Type II and Type III extreme value distributions.

If there exist normalizing constants a_{n}>0 and b_{n}, then,

P(\frac{Y - b_{n}}{a_{n}} \le z) \to G(z) as n \to \infty.

G(z) is the non-degenerate cumulative distribution function.

Type I (Gumbel Distribution): G(z) = e^{-e^{-\frac{z-\alpha}{\beta}}}. Double exponential distribution.

Type II (Frechet Distribution): G(z) = e^{-(\frac{z-\alpha}{\beta})^{-\gamma}} for z > \alpha and 0 for z \le \alpha. This is single exponential function.

Type III (Weibull Distribution): G(z) = e^{-[-\frac{z-\alpha}{\beta}]^{\gamma}} for z < \alpha and 1 for z \ge \alpha. This is also a single exponential distribution.

The convergence

Let’s get some intuition on why the parent distributions converge to these three types. The mathematical foundation is much more in-depth. We will skip that for some later lessons and get a simple intuition here.

Exponential origin: Let’s take Joe’s wait time example from last week. We assume that the arrival times between successive vehicles has an exponential distribution. Let’s call this random variable X.

X \sim Exp(\lambda)

There are n arrival times between successive vehicles that can be shown as a set of random variables.

X_{1}, X_{2}, X_{3}, ..., X_{n}

The maximum wait time is the max of these numbers. Let’s call this maximum time, Y.

Y = max(X_{1}, X_{2}, X_{3}, ..., X_{n})

The cumulative function for an exponential distribution is F_{X}(x) = 1 - e^{-\lambda x}. Hence, for Y (maximum wait time), it will be

F_{Y}(y) = [1 - e^{-\lambda y}]^{n}

For simplicity, let’s assume a value of 1 for \lambda and take the binomial series expansion for F_{Y}(y) = [1 - e^{-\lambda y}]^{n}.

F_{Y}(y) = 1 - ne^{-y} + \frac{n(n-1)}{2!}e^{-2y} - ...

As n \to \infty, this series converges to e^{-ne^{-y}}, an asymptotic double exponential functions.

Norming Constants: Now let’s derive the double exponential function with the idea of norming constants. We know F_{Y}(y) = [1 - e^{-y}]^{n} for \lambda = 1.

Let’s introduce a variable z = y + ln(n) and evaluate the function F_{Y}(y) at z. We are adding a constant ln(n) to y.

F_{Y}(y+ln(n)) = [1 - e^{-(y+ln(n))}]^{n}

F_{Y}(y+ln(n)) = [1 - \frac{1}{e^{ln(n)}}e^{-y}]^{n}

F_{Y}(y+ln(n)) = [1 - \frac{1}{n}e^{-y}]^{n}

As n \to \infty, F_{Y}(y+ln(n)) converges to e^{-e^{-y}}, a double exponential function. If you observe the equation carefully, it is of the form [1+\frac{1}{n}x]^{n}, which in the limit is e^{x}. In our case, x = -e^{-y}. Hence,

F_{Y}(y+ln(n)) =e^{-e^{-y}}

If we replace y = z - ln(n), we get,

F_{Y}(z) =e^{-e^{-(z-ln(n))}}

So, with appropriate scaling (stabilization/norming), we see a double exponential function when the origin is an exponential function.

Power law origin: Let’s try one more parent function. This time, it is a power law function with a cumulative density function F_{X}(x)=1 - x^{-a}. We did not learn this distribution, but it is like a decay function with a controlling the degree of decay. The distribution of the maximum value Y of this function is

F_{Y}(y) = [1 - y^{-a}]^{n}

We will assume a new variable z = n^{a}y and evaluate the function at z.

F_{Y}(n^{a}y) = [1 - (n^{a}y)^{-a}]^{n}

=[1 - n^{-1}y^{-a}]^{n}

=[1 + \frac{1}{n}(-y^{-a})]^{n}

In the limit, as n \to \infty, F_{Y}(n^{a}y) = e^{-y^{-a}}

Hence, F_{Y}(z)=e^{-(\frac{z}{n^{a}})^{-a}}

So, the origin functions with a power law type of functions converge to single exponential Type II Frechet distribution. Similar norming constants can be observed for other distributions that converge to Type III Weibull distribution.

To summarize, the three types are e^{-e^{-\frac{z-\alpha}{\beta}}}, e^{-(\frac{z-\alpha}{\beta})^{-\gamma}}, e^{-[-\frac{z-\alpha}{\beta}]^{\gamma}}.

They are of the form e^{-e^{-y}}, e^{-y^{-\gamma}}, e^{-(-y)^{\gamma}}.

Here’s a visual of how these three distributions look.

If the right tail is of exponential type, the extreme value distribution is a Gumbel distribution. Here the parent distribution (or the distribution of X) is unbounded on the right tail. Extremes of most common exponential type distributions such as normal, lognormal, exponential and gamma distributions converge to the double exponential Gumbel distribution. It is most commonly used to model maximum streamflow, maximum rainfall, earthquake occurrence and in some cases, maximum wind speed.

The Frechet distribution, like the Gumbel distribution, is unbounded on the right tail and is much fatter. Extremes from Pareto distribution (Power Law) and Cauchy distributions converge to Frechet Distribution. Rainfall and streamflow extremes, air pollution and economic impacts can be modeled using this type. Notice how the red line (Frechet distribution) has a heavy tail and is bigger than the black line (Gumbel distribution).

If the right tail converges to a finite endpoint, it is a Weibull distribution. The parent distribution is also bounded on the right. Remember the extremes of Uniform, which is bounded converged to a Weibull. It is most widely used in minima of the strength of materials and fatigue analysis. It is also used in modeling temperature extremes and sea level.

The Generalized Extreme Value Distribution (GEV)

The three types of extreme value distributions can be combined into a single function called the generalized extreme value distribution (GEV). Richard von Mises and Jenkinson independently showed this.

G(z) = e^{-[1+\xi(\frac{z-\mu}{\sigma})]^{-1/\xi}_{+}}

\mu is the location parameter. \sigma > 0 is the scale parameter. \xi is the shape parameter.

When \xi \to 0, GEV tends to a Gumbel distribution. It is the same form [1+\frac{1}{n}x]^{n}, which in the limit is e^{x}. For GEV, the exponential term goes to e^{-(\frac{z-\mu}{\sigma})} hence yielding the double exponential G(z) = e^{-e^{-(\frac{z-\mu}{\sigma})}}.

When \xi > 0, GEV tends to the Frechet distribution. Replace \xi = 1 and see for yourself what you get.

When \xi < 0, GEV tends to the Weibull distribution. Replace \xi = -1 and check.

\mu and \sigma are the surrogate norming variables and \xi controls the shape.

GEV folds all the three types into one form. The parameters \mu, \sigma and  \xi can be estimated from the data, hence negating the necessity to know which Type a parent distribution or data converges to. The function has a closed form solution to compute the quantiles and probabilities. GEV also has the max stable property about which we will learn in some later lessons.

All these concepts will become more concrete once we play with some data. Aren’t you itching to do some GEV in R? You know what is coming next week.

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.
error

Enjoy this blog? Please spread the word :)