Lesson 32 – Exactly k successes: The language of Binomial distribution


I may be in one of those transit buses. Since I moved to New Jersey, I am going through this mess every day.



Well, you wanted to enjoy Manhattan skyline. It has a price tag.



D, glad you are here. It’s been a while. In our last meeting, we were discussing the concepts of variance operation and its properties. I continue to read your lessons every week. As I paused and reflected on all the lessons, I noticed that there is a systematic approach to them. You started with the basics of sets and probability, introduced lessons on visualizing, summarizing and comparing data using various statistics, then extended those ideas into random variables and probability distributions. The readership seems to have grown considerably, and people are tweeting about our classroom. Have you reached 25000 pageviews yet?

We are at 24350 pageviews now. We will certainly hit the 25k mark today 😉 I am thankful to all the readers for their time. Special thanks to all those who are spreading the word. Our classroom is a great resource for anyone starting data analysis.



So, whats on the show today?



As you correctly pointed out, we are now slowly getting into various types of probability distributions. I mentioned in lesson 31 that we would learn several discrete probability distributions that are based on Bernoulli trials. We start this adventure with Binomial distribution.


Great. Let me refresh my memory of probability distributions before we get started. We discussed the basics of probability distribution in lesson 23. Let’s assume X is a random variable, and P(X = x) is the probability that this random variable takes any value x (i.e., an outcome). Then, the distribution of these probabilities on a number line, i.e., the probability graph is called the probability distribution function f(x) for a random variable. We are now looking at various mathematical forms for this f(x).


Fantastic. Now imagine you have a Bernoulli sequence of yes or no.



Sure. It is a sequence of 0s and 1s with a probability p; 0 if the trial yields a no (failure, or event not happening) and 1 if the trial yields a yes (success, or event happening). Something like this: 00101001101


From this sequence, if you are interested in the number of successes (1s) in n trials, this number follows a Binomial distribution. If you assume X is a random variable that represents the number of successes in a Bernoulli sequence of n trials, then this X should follow a binomial distribution. The probability that this random variable X takes any value k, i.e., the probability of exactly k successes in n trials is:




The expected value of this random variable, E[X] = np, and the variance V[X] = np(1-p).



😯 Wow, that’s a fastball. Can we parse through the lingo?



Oops… Okay, let us take the example of your daily commute. Imagine buses and cars pass through the tunnel each morning. Can you guesstimate the probability of buses?



Yeah, I usually see more buses than cars in the morning. Let’s say the likelihood of seeing a bus is p=0.7.



Now let us imagine that buses and cars come in a Bernoulli sequence. Assign a 1 if it is a bus, and 0 if it is a car.


That is reasonable. The vehicle passage is usually random. If we take that as a Bernoulli sequence, there will be some 1s and some 0s with a 0.7 probability of occurrence. In the long run, you will have 70% buses and 30% cars in any order.



Correct. Now think about this. In the next four vehicles that pass through the tunnel, how many of them will be buses?



Since there is randomness in the sequence, in the next four vehicles, I can say, all of them may be buses, or none of them will be buses or any number in between.


Exactly. The number of buses in a sequence of 4 vehicles can be 0, 1, 2, 3 or 4. These are the random variables represented by X. In other words, if X is the number of buses in 4 vehicles coming at random, then X can take 0, 1, 2, 3 or 4 as the outcomes. The probability distribution of X is binomial.


I understand how we came up with X. Why is the probability distribution of X called binomial?


It originates from the idea of the binomial coefficient that you may have learned in an elementary math/combinations class. Let us continue with our logical deduction to see how the probability is derived, and you will see why.


Sure. We have X as 0, 1, 2, 3 and 4. We should calculate the probability P(X = 0), P(X = 1), P(X = 2), P(X = 3) and P(X = 4). This will give us the distribution of the probabilities.


Take an example, say 2. Let us compute P(X = 2). The probability of seeing exactly two buses in 4 vehicles. The probability of exactly k successes in n trials. If the buses and cars come in a Bernoulli sequence (1 for bus and 0 for a car) with a probability p, in how many ways can you see two buses out of 4 vehicles?


Ah, I see where we are going with this. Let me list out the possibilities. Two buses in four vehicles can occur in six ways. 0011, 0101, 1100, 1010, 1001, 0110. In each of these six possible sequences, there will be exactly two buses among four vehicles. I remember from my combinations class that this is four choose two. Four factorial divided by the product of two factorial and (four minus two) factorial. 4C2 = 4!/4!(4-2)!

For each possibility, the probability of that sequence can also be written down. Let me make a table like this:








You can see from the table that there are six possibilities. Any of the possibilities, 1 or 2 or 3 or 4 or 5 or 6 can occur. Hence, the probability of seeing two in four is the sum of these probabilities. Remember P(A or B) = P(A) + P(B). If you follow through this, you will get, 6*p*p*(1-p)*(1-p). = 6*p^2*(1-p)^(4-2). Can you see where the formula for binomial distribution comes from?




Absolutely. For each outcome of X, i.e., 0, 1, 2, 3 and 4, we should apply this logic/formula and compute the probability of the outcome. Let me finish it and make a plot.


Very nicely done. Let me jump in here and show you another plot with a different n and p. If p = 0.5 (equal probability) and n = 100; this is how the binomial distribution looks like.


Nice. It looks like an inverted bell centered around 50.



Yeah. You noticed that the distribution is centered around 50. It is the expected value of the distribution. Remember E[X] is the central tendency of the distribution. For binomial, you can derive it as np = 100 (0.5) = 50. In the same way, the variance, i.e. spread of the function around this center is np(1-p) = 100(0.5)(0.5) = 25. Or standard deviation is 5. You can see that the distribution is spread out within three standard deviations from the center. Can you now imagine how the distribution will look like for p = 0.3 or p = 0.7?


Following the same logic, those distributions will be centered on 100*0.3 = 30 and 100*0.7 = 70 with their variance. Now it all makes sense.


You see how easy it is when you go through the logic. We started with Bernoulli sequence. When we are interested in the random variable that is the number of successes in so many trials, it follows a binomial distribution. Exactly k successes” is the language of Binomial distribution. Can you think of any other examples that can be modeled as a binomial distribution?


Probability that Derek Jeter, with a batting average of 0.3, gets three hits out of the three times he comes to bat 😆  This is fun. I am glad I learned some useful concepts out of the messy commute experience. By the way, Exactly one landfall in the next four hurricanes is also binomial. With Jose coming up, I wonder if we can compute the probability of damage for New York City based on the probability of landfall.


Don’t worry Joe. Our Mayor is graciously implementing his comprehensive $20 billion resiliency plan. NYC is safe now. Forget probability of damage. You need to worry about the probability of bankruptcy.

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

Lesson 31 – Yes or No: The language of Bernoulli trials

Downtown Miami will be flooded due to hurricane Irma.

Your vehicle will pass the inspection test this year.          

Each toss of a coin results in either a head or a tail.          

Did you notice that I am looking for an answer, an outcome that is “yes” or “no.” We often summarize data as the occurrence (or non-occurrence) of an event in a sequence of trials. For example, if you are designing dikes for flood control in Miami, you may want to look at the sequence of floods over several years to analyze the number of events, and the rate at which they occur.

There are two possibilities, a hit (event occurred – success) or miss (event did not occur – failure). A yes or a no. These events can be represented as a sequence of 0’s and 1’s (0001100101001000) called Bernoulli trials with a probability of occurrence of p. This probability is constant over all the trials, and the trials itself are assumed to be independent, i.e., the occurrence of one event does not influence the occurrence of the subsequent event.

Now, imagine these outcomes, 0’s or 1’s can be represented using a random variable X. In other words, X is a random variable that can take 0 or 1 with a probability p. If in Miami, there were ten extreme flood events in the last 100 years, the sequence will have 90 0’s and 10 1’s in some order. The probability of the event is hence 0.1. If the probability is 0.5, then, in a sequence of 100 trials (coin tosses for example), you will see 50 heads on average. We can derive the expected value of X and the variance of X as follows:

Since the Bernoulli trials are independent, the probability of a sequence of events happening will be equal to the product of the probability of each event. For instance, the probability of observing a sequence of No Flood, No Flood, No Flood and Flood over the last four years is 0.9*0.9*0.9*0.1 = 0.072 (assuming p = 0.1).

Bernoulli trials form the basis for deriving several discrete probability distributions that we will learn over the next few weeks.

While you ponder over what these distributions are, their mathematical forms, and how they represent the variation in the data, I will leave you with this image of the daily rainfall data from Miami International Airport. An approximate 6.38 inches of rain (~160mm/day) is forecasted for Sunday. Notice how you can remap the data into a sequence of 0’s (if rain is less than 160) and 1’s (if rain is greater than 160).

After tomorrow, when you hear “unprecedented rains” in the news, keep in mind that we seek the historical sequence data like this precisely because our memory is weak.

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.