Lesson 66 – Spare a moment

Hello! Do you have a moment?

What is the first thing that comes to your mind when you hear the word “moment?”

For some of you, it is “let me capture and share the moment.”

For some of you, it is the coffee cup cliches — “live in the moment,” “enjoy every moment,” and “be happy for this moment.”

Some of you may actually be believing in the “power of the present moment” or “don’t wait for the perfect moment, take the moment and make it perfect” stuff!

How many of you remembered torque from your high school physics lesson?

The torque of a force, or moment of a force about the axis of rotation?

\Gamma = \textbf{r} * \textbf{F}

Moment of a force is the product of the force and its distance from an axis. It is a measure of the tendency of the force to cause a body to rotate.

I am sure you are thinking that the moment you spared for me is not worth it. You don’t want to recollect those moments from the horrid highschool days.

Bear with me for a few more moments. You will see why we are talking moments in data analysis.

You know by now that we can use a sample to estimate the true value of the parameter of the population. The formulation used to get that estimate is called an estimator, and there are methods of estimation of these parameters. In lesson 63, lesson 64 and lesson 65, we dabbled with the method of Maximum Likelihood Estimation (MLE).

Another method of estimating population parameters from the sample is the Method of Moments.

Let’s see how the idea of the moment from physics is related to the Method of Moments.

Assume that we have a sample of five data points x_{1}, x_{2}, x_{3}, x_{4}, x_{5} from a population with a probability density function f_{X}(x). Let’s place them on a number line like this.

Imagine that each number is represented by an equal-weighted ball. Since each data point is independent and identically distributed, i.e., equally likely, it is reasonable to assume that the probability of their individual occurrences (\frac{1}{n}) is the mass of the data point.

For any data point, the torque, or the moment about the axis of rotation is \Gamma = \textbf{r} * \textbf{F}.

For example, a data point at a distance of x_{i} from the axis has a moment of x_{i}*\frac{1}{n}. I am dropping the standard gravity term, and you will see why shortly.

Since there are more than one forces acting on the sample number line, we can compute the total torque acting as \Gamma = {\displaystyle \sum_{i=1}^{n}r_{i} * F_{i}}.

In other words, the total moment of the sample data about the axis of rotation is {\displaystyle \sum_{i=1}^{n}x_{i}*\frac{1}{n}}=\frac{1}{n}{\displaystyle \sum_{i=1}^{n}x_{i}}.

Did you notice that this is the equation for the sample mean?\bar{x}=\frac{1}{n}{\displaystyle \sum_{i=1}^{n}x_{i}}. The centroid of the data points.

The population has a probability density function f_{X}(x).

We saw in Lesson 64 on how to divide the space into equal intervals of length dx. The probability that a given sample data point x_{i} falls in the range dx is f_{X}(x_{i})dx for an infinitesimally small range dx. Like in the sample case, we can assume this is the mass of any subdivision. Hence, the moment of this infinitesimal body about the rotation axis is x_{i}*f_{X}(x_{i})dx.

For the entire population, the total moment is {\displaystyle \int_{- \infty}^{\infty}x*f_{X}(x)dx}.

I know. You will tell me that this is the expected value E[X], or the centroid of the probability density function.

E[X] = {\displaystyle \int_{- \infty}^{\infty}x*f_{X}(x)dx}

If the five data points x_{1}, x_{2}, x_{3}, x_{4}, x_{5} are originating from a population with a probability density function f_{X}(x), it is reasonable to equate the total moment of the sample data to the total moment of the population data. The centroid of the sample is equal to the centroid of the probability density functions.

{\displaystyle \int_{- \infty}^{\infty}x*f_{X}(x)dx} =\frac{1}{n}{\displaystyle \sum_{i=1}^{n}x_{i}}


E[X] = \frac{1}{n}{\displaystyle \sum_{i=1}^{n}x_{i}}


Let me pester you again!

What is the moment of inertia?

No worries. Let’s activate your deep memory neurons. Imagine that the sample data points are rotating on the plane about the axis with a velocity v = r\omega, where \omega is the constant angular velocity of the plane. In our example of five data points, the velocity of any given point will be v_{i} = x_{i}\omega.

If the body is rotating, it will be subject to a tangential force of m\frac{dv}{dt}. For our sample point which is at a radial distance of x_{i} from the axis, the tangential force will be \frac{1}{n}\frac{dv}{dt} = \frac{1}{n}x_{i}\frac{d \omega}{dt}.

To simplify life, let’s call \frac{d \omega}{dt}=\alpha, the angular acceleration of the point.

The tangential force is \frac{1}{n}x_{i}\alpha, and the torque of the tangential force applied on one point is x_{i}*\frac{1}{n}x_{i}\alpha = \frac{1}{n}x_{i}^{2}\alpha.

The total torque of all the tangential forces = {\displaystyle \sum_{i=1}^{n}\frac{1}{n}x_{i}^{2}\alpha}.

The term {\displaystyle \sum_{i=1}^{n}\frac{1}{n}x_{i}^{2}} is called the moment of inertia of the points.

Just looking at the equation, you can say that the moment of inertia of points far apart is larger than the moment of inertia of the points closer to the axis.

This moment of inertia is called the second moment and it gives us a sense of how spread out the data points are about the axis.

Sample moment of inertia = \frac{1}{n}{\displaystyle \sum_{i=1}^{n}x_{i}^{2}}.

By extension to the population, we can write the population moment of inertia, or the population second moment as

E[X^{2}]={\displaystyle \int_{- \infty}^{\infty}x^{2}f_{X}(x)dx}.

Notice the generalization E[X^{2}] for the second moment, just like E[X]={\displaystyle \int_{- \infty}^{\infty}x*f_{X}(x)dx} is the first moment.

Population moments are the expected values of the powers of the random variable X.

I can sense that you are charged up by the word “spread.” Yes, the second moment will be used to get the second parameter, the variance of the distribution.

We can equate the sample moment of inertia to the population moment of inertia.


E[X^{2}] = \frac{1}{n}{\displaystyle \sum_{i=1}^{n}x_{i}^{2}}


This form of equating sample moments to the population moments is called the Method of Moments for parameter estimation.

It was developed by Karl Pearson in the early 1900s. Pearson demonstrated this method by fitting different forms of frequency curves (probability density functions) to data by calculating as many moments from the sample data as there are parameters of the probability density function.

A generalized representation of the Method of Moments can be done like this.


E[X^{k}] = \frac{1}{n}{\displaystyle \sum_{i=1}^{n}x_{i}^{k}}


To estimate the k parameters of the probability density function, we can equate the k population moments to the k sample moments.

If the probability density function has one parameter, the first moment equation is sufficient. If the function has two parameters, we have the first and the second moment equations — two unknowns, two equations to solve. If the function has three parameters, we go for three moment equations.

Time for some action

Let’s solve for the parameters of the normal distribution.

There is a sample of n data points x_{1}, x_{2}, x_{3}, ..., x_{n} that we think originated from a normal distribution (population) with a probability density function f_{X}(x) = \frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{\frac{-1}{2}(\frac{x-\mu}{\sigma})^{2}}.

x_{1}, x_{2}, x_{3}, ..., x_{n} \sim N(\mu, \sigma)

\mu and \sigma are the parameters of the normal probability density function. Let’s apply the Method of Moments to estimate these parameters from the sample. In other words, let’s formulate the equations for \mu and \sigma.

The first population moment is E[X^{1}]. The first sample moment is \frac{1}{n}{\displaystyle \sum_{i=1}^{n}x_{i}^{1}}.

For the normal distribution, we know that the expected value E[X] is the mean \mu.

\mu = E[X] = \frac{1}{n}{\displaystyle \sum_{i=1}^{n}x_{i}}

The second population moment is E[X^{2}]. The second sample moment is \frac{1}{n}{\displaystyle \sum_{i=1}^{n}x_{i}^{2}}.

E[X^{2}]. Hmm. You have to go down the memory lane, to Lesson 26 to realize that E[X^{2}]=V[X] + (E[X])^{2}.

.
.
.
Did you see that? Yes, so E[X^{2}]=\sigma^{2} + \mu^{2}. Equating this to the sample second moment, we get

\sigma^{2} + \mu^{2} = \frac{1}{n}{\displaystyle \sum_{i=1}^{n}x_{i}^{2}}

\sigma^{2} = \frac{1}{n}{\displaystyle \sum_{i=1}^{n}x_{i}^{2}} - \mu^{2}

\sigma^{2} = \frac{1}{n}{\displaystyle \sum_{i=1}^{n}x_{i}^{2}} - (\bar{x})^{2}

\sigma^{2} = \frac{1}{n} \Big ( \sum x_{i}^{2} - n\bar{x}^{2}\Big )

\sigma^{2} = \frac{1}{n} \Big ( \sum x_{i}^{2} - 2n\bar{x}^{2} + n \bar{x}^{2} \Big )

\sigma^{2} = \frac{1}{n} \Big ( \sum x_{i}^{2} - 2\bar{x}n\bar{x} + n \bar{x}^{2} \Big )

\sigma^{2} = \frac{1}{n} \Big ( \sum x_{i}^{2} - 2\bar{x}\sum x_{i} + n \bar{x}^{2} \Big )

\sigma^{2} = \frac{1}{n} \Big ( \sum x_{i}^{2} - \sum 2\bar{x} x_{i} + n \bar{x}^{2} \Big )

\sigma^{2} = \frac{1}{n} \Big ( \sum x_{i}^{2} - \sum 2\bar{x} x_{i} + \sum \bar{x}^{2} \Big )

\sigma^{2} = \frac{1}{n} \sum (x_{i}^{2} - 2\bar{x} x_{i} + \bar{x}^{2})

\sigma^{2} = \frac{1}{n} \sum (x_{i} - \bar{x})^{2}


Let’s spend a few more moments and solve for the parameters of the Gamma distribution.

The probability density function of the Gamma distribution is

f(x) = \frac{\lambda e^{-\lambda x}(\lambda x)^{r-1}}{(r-1)!}

It has two parameters, r and \lambda.

\lambda is called the scale parameter. It controls the width of the distribution, and r is called the shape parameter because it controls the shape of the distribution.

We know that the expected value of the Gamma distribution E[X] = \frac{r}{\lambda}. If you forgot this, solve the integral \int xf(x)dx for the Gamma distribution. Equating this first population moment to the sample moment, we can see that

\frac{r}{\lambda} = \frac{1}{n}\sum x_{i} = \bar{x}

r = \lambda \bar{x}

Then, we know that V[X] for the Gamma distribution is \frac{r}{\lambda^{2}}.

Using this with the second population moment, E[X^{2}]=V[X] + (E[X])^{2}, we can see that E[X^{2}] = \frac{r(r+1)}{\lambda^{2}}.

We can now equate the second population moment to the second sample moment.

E[X^{2}] = \frac{1}{n}\sum x_{i}^{2}

\frac{r(r+1)}{\lambda^{2}} = \frac{1}{n}\sum x_{i}^{2}

\frac{\lambda \bar{x} (\lambda \bar{x}+1)}{\lambda^{2}} = \frac{1}{n}\sum x_{i}^{2}

\frac{\bar{x}(\lambda \bar{x}+1)}{\lambda} = \frac{1}{n}\sum x_{i}^{2}

 \bar{x} + \lambda \bar{x}^{2} = \frac{\lambda}{n}\sum x_{i}^{2}

\lambda (\frac{1}{n}\sum x_{i}^{2} - \bar{x}^{2}) = \bar{x}

\lambda = \frac{\bar{x}}{\frac{1}{n}\sum(x_{i}-\bar{x})^{2}}

So,

r = \frac{\bar{x}^{2}}{\frac{1}{n}\sum(x_{i}-\bar{x})^{2}}

Very tiring, I know. You will rarely find all the algebra in textbooks. So, the moments are worth it.

I asked you to spare a moment. Now you are going “this is quite a moment.” What do you think? Would you remember the Method of Moments next time you spare a moment?

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

error

Enjoy this blog? Please spread the word :)