Lesson 13 – Dear Mr. Bayes

April 29, 2017

Thomas Bayes

Dear Mr. Bayes,

I am writing this letter to thank you for coming up with the Bayes Theorem. It uses logic and evidence to understand and update our knowledge. We can revise our belief with new data, and we have been doing that for centuries now. You have provided the formal probability rules to combine data with prior knowledge to get more precise understanding. This way of thinking is inherent in our decision making now. I have benefitted so much from this rule. I cannot thank you enough for this.

Today, I want to share a story with you about how Joe, the curious kid used Bayes Theorem to impress his boss.

Joe works part time at the MARKET.

His usual daily routine is to receive the bags of apples from Andy’s distribution company (A) and Betsy’s distribution company (B), check for quality and report to his boss.

One day, during this routine, he stepped out of the loading dock to check his twitter feed.

When he returned, he noticed that there was a bad apple → a rotten bag of apples in 100 bags of apples.

Since the bags were identical, he could not say whether the bad apple was from Andy’s or Betsy’s. All he knew was that Andy’s was contracted to deliver 60 bags and Betsy’s was contracted for 40 bags.

Joe is a sharp kid. Although he did not see who delivered that bad apple, he knew he could assign a probability that it came from Andy’s or Betsy’s. He called me for some advice.

Mr. Bayes, I hope my students are calculating the updated probability of getting a problem on Normal Distribution in the mid-term based on the review session. Your discovery saw the light after the invention of Monte Carlo approaches. Bayesian methods are widely applied now. You can rest assured in the heaven. Your posterity has ensured the use of Bayes Theorem for centuries to come, albeit making it a methodological fad sometimes.


A posterior Bayesian from Earth


If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

Lesson 12 – Total recall

Recall October 29, 2012: High winds during Superstorm Sandy; record-breaking high tides, mandatory evacuations, significant power outages, $1Million in estimated property damage.

Recall February 24, 2016: Strong winds; a tractor blown over on the upper level of the George Washington Bridge, a sidewalk shed collapsed on Lenox Avenue in Harlem, $100k in estimated property damage.

Recall August 18, 2009: Thunderstorm winds; hundred trees down in Central Park, significant tree damage in western Central Park between 90th and 100th Street, $300k in estimated property damage.

Recall conditional probability rule from lesson 9.

P(A|B) = P(A ∩ B) / P(B) 


P(A ∩ B) = P(A|B)*P(B)

Recall mutually exclusive and collectively exhaustive events from lesson 4.

The midterms are mutually exclusive, but the final exam is collectively exhaustive.

If you have n events E1, E2, … En that are mutually exclusive and collectively exhaustive, and another event A that may intersect these events, then, the law of total probability says that the probability of A is the sum of the probabilities of its disjoint parts.

Let us cut the jargon and try to understand this law using a simple example. We have the data on property damage during storms accessible from NOAA’s storm events database. Let us take a subset of this data — wind storms in New York City. You can get this subset here.

With some screening, you will see that there are 57 events → 16 high wind events, 15 strong wind events, and 26 thunderstorm wind events. Notice that there is property damage during some of these incidents. Let us visualize this set up using a Venn diagram.

Your immediate perception after seeing this picture would have been that the high winds, strong winds, and thunderstorm winds are mutually exclusive and collectively exhaustive. They don’t intersect, and together make up the entire wind storm sample space. Damages cut across these events.

Let us first focus on the high wind events. The 16 high wind events are shown as 16 points in the picture below. Notice that 4 of these points are within the damage zone. The probability of high wind events is P(H) = 16/57, the probability of high wind events and damage is P(damage ∩ high winds) = 4/57 and the probability of damage given high wind events is

P(damage|high winds) = P(damage ∩ high winds) / P(high winds) = 4/16

Now let us add all the other points (events) onto the picture. Some of these will be in damage zone, and some of them will be out of damage zone.

We can estimate the total probability of damage by adding its disjoint parts.

P(damage) = P(damage ∩ high winds) + P(damage ∩ strong winds) + P(damage ∩ thunderstorm winds)


P(damage) = P(damage|high winds)*P(high winds) + P(damage|strong winds)*P(strong winds) + P(damage|thunderstorm winds)*P(thunderstorm winds)

P(damage) = (4/16)*(16/57) + (8/15)*(15/57) + (9/26)*(26/57) = 21/57

The best part is that we can use this law as a predictive equation. Suppose there is an approaching storm and the weatherman told you that there is a 10% chance that the coming storm has high winds, 30% chance that it has strong winds and 60% chance that it has thunderstorm winds, you can immediately use this law and compute the probability of damage for NYC.

Can you tell me what that damage probability is?

Should I wait till after your Earth Day March?

Recall that you are totally contributing your share of Co2 to the earth during the March.

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

Lesson 11 – Fran, the functionin’ R-bot

Hello, my name is Fran. I am a function in R.


 Hello, my name is D. I am a functionin’ person.


I can perform any task you give me.


Interesting, can you add two numbers?


Yes, I can.


Can you tell me more about how you work?


Sure, I need the inputs and the instruction, i.e. what you want me to do with the inputs.

Okay. I am giving you two numbers, 10 and 15. Can you show me how you   will create a function to add them?


This is easy. Let me first give you the structure.

# structure of a function #
functionname = function(inputs)

You can select these lines and hit the run button to load the function in R. Once you execute these lines, the function will be loaded in R, and you can use this name with any inputs.

Let us say the two numbers are a and b. These numbers are provided as inputs. I will first assign a name to the function — “add”. Since you are asking me to add two numbers, the instruction will be y = a + b, and I will return the value of y.

Here is a short video showing how to create a function to add two numbers a and b. You can try it in your RStudio program.

 Neat. If I give you three numbers, m, x, and b, can you write a function for mx + b?


Yes. I believe you are asking me to write a function for a straight line: y = mx + b. I will assign “line_eq” as the name of the function, the inputs will be m, x, and b and the output will be y.

# function for line equation #
line_eq = function (m, x, b)
 # m is the slope of the line 
 # b is the intercept of the line
 # x is the point on the x-axis
 y = m*x + b # equation for the line 

# test the function #
line_eq(0.5, 5, 10)
> 12.5

Can you perform more than one task? For example, if I ask you for y = mx + b and x + y, can return both the values?


Yes, I can. I will have two instructions. In the end, I will combine both the outputs into one vector and return the values. Here is how I do it.

# function for line equation + (x + y) #
two_tasks = function (m, x, b)
 # m is the slope of the line 
 # b is the intercept of the line
 # x is the point on the x-axis
 y = m*x + b # equation for the line 
 z = x + y

# test the function #
two_tasks(0.5, 5, 10)
> 12.5 17.5

Very impressive. What if some of the inputs are numbers and some of them are a set of numbers? For instance, if I give you many points on the x-axis, m and b, the slope and the intercept, can you give me the values for y?


No problemo. The same line_eq function will work. Let us say you give me some numbers x = [1, 2, 3, 4, 5], m = 0.5 and b = 10. I will use the same function line_eq(m, x, b).

# use on vectors #
x = c(1,2,3,4,5)
m = 0.5
b = 10

> 10.5 11.0 11.5 12.0 12.5

I am beginning to like you. But, maybe you are fooling me with simple tricks. I don’t need a robot for doing simple math.


Hey, my name is Fran 😡


Okay Fran. Prove to me that you can do more complicated things.


Bring it on.


 It is springtime, and I’d love to get a Citi bike and ride around the city. I want you to tell me how many people rented the bike at the most popular route, the Central Park Southern Loop and the average trip time.


aargh… your obsession with the city. Give me the data.


Here you go. You can use the March 2017 file. They have data for the trip duration in seconds, check out time and check in time, start station and end station.

Alright. I will name the function “bike_analysis.” The inputs will be the data for the bike ridership for a month, and the name of the station. The function will identify how many people rented the bikes at the Central Park S station and returned it back to the same station — completing the loop. You asked me for total rides and the average trip time. I threw in the maximum and minimum ride time too. You can use this function with data from any month and at any station.

# function to analyze bike data # 
bike_analysis = function(bike_data,station_name)
 dum = which (bike_data$Start.Station.Name == station_name &    bike_data$End.Station.Name == station_name)
 total_rides = length(dum)
 average_time = mean(bike_data$Trip.Duration[dum])/60 # in minutes 
 max_time = max(bike_data$Trip.Duration[dum])/60 # in minutes 
 min_time = min(bike_data$Trip.Duration[dum])/60 # in minutes
 output = c(total_rides,average_time,max_time,min_time)

# use the function to analyze Central Park South Loop #

# bike data # 
bike_data = read.csv("201703-citibike-tripdata.csv",header=T)

station_name = "Central Park S & 6 Ave"

> 212.000000  42.711085 403.000000   1.066667

212 trips, 42 minutes of average trip time. The maximum trip time is 403 minutes and the minimum trip time is ~ 1 minute. Change of mind?

Wow. You are truly helpful. I would have spent a lot of time if I were to do this manually. I can use your brains and spend my holiday weekend riding the bike.


Have fun … and Happy Easter.


How did you know that?


Machine Learning man 😉


If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

Lesson 10 – The fight for independence

I don’t have a “get out of jail free” card.

I don’t want to pay $50 to the bank because I am short on cash.

😉 I am trusting my magic dice to fight for my freedom. I know I will roll a double.

🙁 I am disappointed. As I wait for my turn, I realized that I did not kiss the dice before rolling. So I do it now and roll again.

😯 Maybe I should have kissed the dice two times since it is the second try. Oh, I did not pray before rolling. So I pray and roll the dice with optimism.

😡 I don’t believe this. My magic dice betrayed me. I will throw them away and get new ones.

Wait. The magic dice did not betray you. It is just following the probability rule for independent events. Unlike you, your magic dice does not have a memory. It does not know that the previous try was not a double. All it knows is that the probability of getting a double on any try is 16.66%.

Assume A is the event of seeing a double, and B is a previous event, say {6,1} – not a double.

The probability of getting a double given that the last try was not a double, P(A|B) is equal to the probability of getting a double in any try, P(A). P(A) does not depend on whether or not event B has happened. B does not influence A.

For independent events A and B, 
P(A|B) = P(A)

From lesson 9, conditional probability rule, we know that

P(A|B) = P(A ∩ B)/P(B)

We can combine these two and come up with a property for independent events.

P(A ∩ B) = P(A).P(B)

For independent events, the probability of both happening (A and B) is the product of the individual probabilities.

Let us apply this property to our example. What is the probability of not seeing a double in three consecutive rolls (with prayer 🙂 or without prayer)? In other words, what are the odds of missing three rounds of the game and paying $50 to get my freedom finally?

The probability of not seeing a double in any try is 30/36. 30 non-double outcomes in 36 possibilities. Since the events are independent, the likelihood of seeing three non-doubles is (30/36)(30/36)(30/36) ≅ 58%.

I should have known that before praying.

If the events are independent, they do not influence each other. A coin toss cannot affect a dice. Torrential rain in London may have nothing to do with the severe drought in California. Your actions may not influence my actions because we are independent.

We all like being independent … or the illusion of independence!


If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

Lesson 9 – The necessary ‘condition’ for Vegas

Joe and Devine meet again.

J: After our last discussion about probability, I now think in odds.

D: Go on.

J: I believe my decision making has improved. I react based on probability.

D: Great. Assuming you are making decisions for your benefit, in the  long run, you will be better off. Probability is the guiding force.

J: I am visiting Vegas next week. I want to use “the force” to outwit the house.

D: For that, the necessary condition is to know about conditional probability.

J: I know that probability is the long-run relative frequency. But what is conditional probability?

D: Since you are excited about Vegas, let us take the cards example.

Say, we have a deck of cards. If I shuffle and draw a card at random, what is the probability of getting a king?

J: Let me use my probability logic here. I will assume that the 52 cards will make up our sample space. Since there will be 4 kings in a deck of 52 cards, if your shuffling is fair, the likelihood of getting a king is 4/52.

D: Good. Let us call this event A → King.

What are the odds of getting a red card?

J: Since there will be 26 red cards in 52, the odds of getting a red card are 26/52.

D: Exactly. Let us call this event B → Red.

Now, If I draw a card at random, face down, and tell you that it is red, what is the probability that it will be a king?

J: So you are providing me some information about the card?

D: Yes, I am giving you a condition that the card is red.

J: Okay. Under the condition that the card is red, the odds of it being a king should be 2/26.

D: Can you elaborate.

J: Since you told me that the card is red, I only have to see how many kings are there in red cards. The sample space is now 26. There are two red kings. So the probability will be 2/26.

D: Good. Mathematically, this is written as

P(A|B) = P(A ∩ B) / P(B) 


P(King | Red) = P(King and Red) / P(Red)

P(King and Red) is 2/52. P(Red) is 26/52. So we get 2/26.

J: I get it. My answer will depend on the condition. Can you provide one more example?

D: Sure. You told me that the probability of getting a king is 4/52. Suppose, this first card is faced up, and I draw another card face down. What is the probability that this second card is a king?

J: Since the first card is on the table and not replaced, the probability that the second card will be a king should be 3/51. Three kings left in the deck of 51.

D: Correct. The outcome of the second card is conditional on the outcome of the first card.

J: This makes perfect sense. If I can practice this counting and conditional probability, I can make some money on blackjack.

D: Yes you can. Knowing conditional probability is the necessary condition.

J: Why do you keep saying “necessary condition”?

D: Probability is your guiding force everywhere else in life.

But in Vegas, Joe Pesci is the force.

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.