Lesson 42 – Bounded: The language of Beta distribution

Last week, in Lesson 41, we started toying with the idea of continuous probability distributions. When a random variable X is continuous (i.e., can be any Real number), we can compute the probability of X between any two values, $P(X \in (a,b)) = P(a \le X < b)$ using a continuous probability distribution function $f(x)$ .

The continuous probability density function (pdf) is the limiting shape of the frequency plot (histogram) of the data as the number of possible observations (n) goes to infinity. While the probability that the random variable X takes any specific value x is 0, the height of the smooth curve measures how dense the probability is at that point.

In the limit, as the number of observations approaches infinity (continuous), the proportion of observations that belong to an interval (a, b) is the probability that X is in this interval; $P(X \in (a,b)) = P(a \le X < b)$ .

This area under the curve is computed using the integral of the function over the range a to b.

$P(a < X < b) = \int_{a}^{b} f(x) dx$

I left you with a few practice questions:

If X is a random variable with a probability distribution function defined as

$f(x) = 90x^{8}(1-x)$ for 0 < x < 1

What is the probability that X is between 0.2 and 0.3?
What is the probability that X will exceed 0.9?
What is the median of X?

If you have solved these questions, more power to you. If you are waiting for the stars to align, this is that auspicious moment!

Let’s solve this problem step by step to understand the nuts and bolts of continuous distributions. At the end of the problem, I will lead you to our first type of continuous probability distribution functions, the Beta distribution and its special case, the uniform distribution. You will see why I selected this problem as the primer.

Our function is $f(x) = 90x^{8}(1-x)$ for 0 < x < 1.

The plot of this function reveals a bell-like shape. Notice that x is between 0 and 1 and the function is continuous.

Let’s take the first question: What is the probability that X is between 0.2 and 0.3?

$P(0.2 < X < 0.3) = \int_{0.2}^{0.3} f(x) dx$

$= \int_{0.2}^{0.3} 90x^{8}(1-x) dx$

$= \int_{0.2}^{0.3}90x^8 dx - \int_{0.2}^{0.3}90x^{9}dx$

$= \frac{90}{9} x^{9}\Big|_{0.2}^{0.3} - \frac{90}{10} x^{10}\Big|_{0.2}^{0.3}$

$= \frac{90}{9}(0.3^{9}-0.2^{9}) - \frac{90}{10}(0.3^{10}-0.2^{10})$

$= 0.00014$

Using the same procedure, we can solve for the probability that X will exceed 0.9?

$P(X > 0.9) = \int_{0.9}^{1} f(x) dx$

$= \frac{90}{9} x^{9}\Big|_{0.9}^{1} - \frac{90}{10} x^{10}\Big|_{0.9}^{1}$

$= \frac{90}{9}(1^{9}-0.9^{9}) - \frac{90}{10} (1^{10}-0.9^{10})$

$= 0.264$

We could have integrated the function from 0 to 0.9 and then subtracted this number from 1 because $P(X > x) = 1 - P(X \le x)$ and $P(X \le x) = \int_{0}^{x}f(x)dx$ in this case.

Also, remember that $P(X \le x) = F(x)$ , the cumulative distribution function. We will be using the cumulative distribution function very often from now on.

Now, let us look at the third question: what is the median of X?

We know from order statistics that median is the 50th percentile, i.e., the value for which 50% of the values of X are below this number.

$P(X \le x_{median}) = F(x_{median}) = 0.5$

$\int_{0}^{x_{median}}f(x)dx = 0.5$

$\int_{0}^{x_{median}}90x^{8}(1-x)dx = 0.5$

This reduces to $10x_{median}^{9} - 9x_{median}^{10} = 0.5$

We can use the Newton Raphson iterative method to find that the root of this equation is 0.84 when 0 < x < 1.

Hence, $x_{median} = 0.84$

Beta Distribution

Now, look at the function I gave you carefully.
$f(x) = 90x^{8}(1-x)$ for 0 < x < 1

It is bounded between 0 and 1.

It has some exponents for x and (1-x); 8 and 1 in this case.

It has a constant, 90, acting as a multiplier.

The function we solved is a Beta distribution. The standard form, i.e., the probability density function of a Beta distribution is

$f(x) = cx^{a-1}(1-x)^{b-1}$ for 0 < x < 1.

As you can see, it is defined only in the 0 to 1 range. The beta distribution is a bounded distribution. The function is 0 everywhere else.

a and b are the parameters that control the shape of the distribution. They can take any positive real numbers; a > 0 and b > 0. In our example, a = 9 and b = 2. The distorted bell shape we have for the function is because of these two values.

c is called the normalizing constant. It ensures that the pdf integrates to 1. Take, for example, our function $f(x) = 90x^{8}(1-x)$ .

If we integrate the function $x^{8}(1-x)$ between 0 to 1 (over the range of x), we will get

$\int_{0}^{1} x^{8}(1-x)= \frac{1}{90}$

For the pdf f(x) to integrate to unity, we need to multiply it with a constant 90. Hence, we had 90 as the multiplier for our function.

This normalizing constant c is called the beta function and is defined as the area under the graph of $x^{a-1}(1-x)^{b-1}$ between 0 and 1.

$c = \frac{1}{\int_{0}^{1} x^{a-1}(1-x)^{b-1} dx}$

For integer values of a and b, this constant c is defined using the generalized factorial function.

$c = \frac{(a + b - 1)!}{(a-1)!(b-1)!}$

In our example, a = 9 and b =2. Applying these numbers will give

$c = \frac{(9+2-1)!}{(9-1)!(2-1)!} = \frac{10!}{8!1!} = 90$ .

Did you observe that we just need the values of a and b to get the Beta distribution?

We call it the beta family as the curve will have different shapes depending on the values of a and b.

Substitute a = 1 and b = 1 in the standard function and see what you get.

$c = \frac{(1 + 1 - 1)!}{(1-1)!(1-1)!} = 1$

$f(x) = 1x^{1-1}(1-x)^{1-1} = 1$

A constant value 1 for all x. This flat function is called the uniform distribution. It is a special case of the beta distribution when a and b are 1. It looks like a rectangle or a flat line.

Now substitute a = 2 and b = 2 and see.

a = 0.5 and b = 0.5 will be a u-shape with asymptotic ends.

I want you to experiment with different values of a and b and visualize how the shape changes, like in the opening animation. Try it this week. Don’t wait till the R lesson. You have come this far, and you are already a good coder in R.

As you see here, the beta distribution is flexible to take on different shapes.

The uniform distribution is used for simulating data from different probability distributions. Again, meditate on this idea before we see it in an R lesson.

The beta distribution is also used as a probability distribution for the probability p of an outcome. The probability of the probability 😉
In other words, if we want to estimate the probability p of an outcome, we assume prior to having any data, that p follows a beta distribution (0 < p < 1). Once we have the data, we can update this knowledge using the Bayes rule.

The beta distribution is also typically used in project management when we want to estimate the probability of completing the project ahead of schedule. The duration of each job is a random variable that can be approximated using a beta distribution as it is bounded between the worst completion time (pessimistic) and best completion time (optimistic).

Knowing this about project management, I set out to complete several pending tasks during this Thanksgiving break. My initial estimated probability of completion was 0.91. After a somewhat lazy turkey day, I now realize that my lower bound (best completion time) should have been my upper bound (worst completion time). The fix is in. The probability of completing the pending tasks’ project in the Christmas break is 0.91.

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

Beta Distribution

Enjoy this blog? Please spread the word :)