## Lesson 26 – The variety in consumption

This summer, I affixed an air conditioner in my apartment to adapt to a changing climate. One month into the season, I was startled to see my electricity bill double. I checked my AC specs; sure enough, its energy efficiency ratio is 9.7 BTU/h.W — not the modern energy star baby by any standard.

I begin to wonder how many such low-efficiency units contribute to increased energy usage, especially in New York City where window AC is a norm for residential buildings. While getting the number of older units is challenging, Local Law 84 on Municipal Energy and Water Data Disclosure require owners of large buildings to report their energy and water consumption annually. You can get the 2015 disclosure data from here. I am showing a simple frequency plot of the weather normalized electricity intensity in kBtu per square foot of building area here.

There are 13223 buildings in the data file; 254 of them have an electricity intensity greater than 200 kBtu. I am not showing them here. The NY Stock Exchange building, Bryant Park Hotel, and Rockefeller University are notable among them.

I want to know the variety in energy use. Long time readers might guess correctly from the title that I am talking about the variability in energy usage data. We can assume that energy consumption is a random variable X, i.e. X represents the possible energy consumption values (infinite and continuous). The data we downloaded are sample observations x. We are interested in the variance of X → V[X].

In lesson 24 and lesson 25, we learned that the expected value (E[X]) is a descriptive quantity of the average (center) of a random variable X with a probability distribution function f(x). In the same way, a measure of the variability, i.e. deviation from the center, of the random variable is the variance V[X].

It is defined as the expected value of the squared deviation from the average.

is the expected value of the random variable → E[X]. measures the deviation from this value. We square these deviations and get the expected value of the squared deviations. If you remember lesson 17, this is exactly the equation for the variance of the data sample. Here, we generalize it for a random variable X.

With some derivation, we can get a useful alternative for computing the variance.

If we know the probability distribution function f(x) of the random variable X, we can also write the variance as

f(x) for this data might look like the thick black line on the frequency plot.

The expected value of the consumption for 2015 of the 12969 buildings is 83 kBtu/sqft; the variance is 1062 (kBtu/sqft)×(kBtu/sqft). A better way to understand this is through standard deviation, the square root of the variance — 32 kBtu/sqft. You can see from the frequency plot that the data has high variance — buildings with very low electricity intensity and buildings with high electricity intensity.

What is your building’s energy consumption this year? Does your city have this cool feature?

It came at a price though. With another local law, we can ban all the low EER AC units to solve the energy problem.

If you find this useful, please like, share and subscribe.

## Lesson 25 – More expectation

My old high school friend is now a successful young businessman. Last week, he shared thoughts on one of his unusual stock investment schemes. Every few months, he randomly selects four stocks from four different sectors and bets equally on them. I asked him for a rationale. He said he expects two profit making stocks on average assuming that the probability of profit or loss for a stock is 0.5. I think since he picks them at random, he also assigns a 50-50 chance of win lose.

###### My first thought

This made a nice expected value problem of the sum of random variables.

The expected number of profit making stocks in his case is 2. We can assign X1, X2, X3, and X4 as the random variables for individual stocks with outcomes 1 if it makes a profit and 0 otherwise. We can assign Y as the total number of profit making stocks; ranging from 0 to 4. His possible outcomes are:

As we can see, the total number of profit making stocks in these scenarios are 4, 3, 3, 2, 3, 2, 2, 1, 3, 2, 2, 1, 2, 1, 1, 0. The average of these numbers is 2; the expected number of profit making stocks.

Another way of getting at the same number is to use the expected value formula we learned in lesson 24.

`E[Y] = 4(1/16) + 3(4/16) + 2(6/16) + 1(4/16) + 0(1/16) = 2`

An important property of expected value of a random variable is that the mean of the linear function is the linear function of the mean.

Y = X1 + X2 + X3 + X4

Y is another random variable that comes from a combination of individual random variables. For sum of random variables,

`E[Y] = E[X1] + E[X2] + E[X3] + E[X4]`

Detailed folks can go over the derivation.

Now, E[X1] = 0.5(1) + 0.5(0) = 0.5, since the outcomes are 1 and 0 and the probabilities are 0.5 each. Adding all of them, we get 2.

So you see, the additive property makes it easy to estimate the expected value of the sum of the random variables instead of writing down the outcomes and computing the probability distribution of Y.

Other simple rules when there are constants involved;

Try to derive them as we did above.

###### My second thought

Stock market might be more complicated than a coin flip experiment. But what do I know; he clearly has more net worth than me. I guess since this is only one of his investment schemes, he is just playing around with his leftovers.

###### My final thought

I am only good for teaching probability; not using it like him. But again, most ivory tower professors only preach; don’t practice. Hey, they are protected. Why should they do real things?

If you find this useful, please like, share and subscribe.

## Lesson 24 – What else did you expect?

On a hot summer evening, a low energy Leo walked into his apartment building without greeting the old man smoking outside. Leo has just burnt his wallet in Atlantic City Roulette game, and his mind has been occupied with how it all happened. He went in with \$500. Ten chips of \$50. His first gamble was to play safe and bet one chip at a time on red. He lost the first two time, won the third time, and lost the next two times. After five consecutive bets, he was left with \$350. In the Roulette game, the payout for red or black is 1 to 1. He started getting worried. Since the payout for any single number is 35 to 1, in a hasty move, he went all in on number 20, just to realize that he was all out.

Could it be that luck was not favoring Leo on the day? Could it be that his betting strategy was wrong? Are the odds stacked against him? If he had enough chips and placed the same bet long enough, what can he expect?

###### Based on his first bet

Imagine Leo bets a dollar at a time on red. He will win or lose \$1 each time. In an American Roulette, there will be 18 red, 18 black and two green (0 and 00) — a total of 38 numbers. Each number is independent, i.e. there is an equal chance of getting any number. The probability of getting a red is 18/38 (18 reds in 38 numbers). In the same way, the probability of getting a black is 18/38, and the probability of getting a green is 2/38.

If the Ivory ball ends in a red, he will win \$1; if it ends in any other color he will lose \$1 – or he gets -\$1. In the long run, if he keeps playing this game with dollar on and off, his expected win for a dollar will be

`On average, for every \$1 he bets on red, he will lose 0.05 cents.`
###### Based on his Second bet

Now let us imagine Leo bets on a single number where the payout is 35 to 1. He will win \$35 if the ball ends up in his number, or lose the dollar. The probability of getting any number is 1/38 (one number in 38 outcomes). Again, in the long run, if he keeps playing this game; win \$35 or lose \$1, over time, his expected win for a dollar will be

`Although the payout is high, one average, for every \$1 he bets on a single number, he will still lose 0.05 cents. `

This estimation we just did is called the Expected Value of a random variable. Just like how “mean” is a description of the central tendency for a sample data, the expected value (E[X]) is a descriptive quantity of the central tendency (average behavior) of a random variable X with a probability distribution function (f(x)).

In Leo’s case, X is the random variable describing his payout, x is the actual payout from the house (\$1 or \$35 if he wins, or -\$1 if he loses), and f(x) is the probability distribution or frequency of the outcomes (18/38 for red and 20/38 otherwise, or 1/38 for a single number and 37/38 otherwise).

You will notice that this equation is exactly like the equation for the average of a sample. Imagine there is a very large sample data with repetitions; we are adding and averaging over the groups.

Poor Leo expected this to happen but didn’t realize that the table is tilted and the game is rigged.

If you find this useful, please like, share and subscribe.

## Lesson 23 – Let’s distribute the probability

Hey Joe, what are you up to these days?

Apart from visiting DC recently, life has been mellow over this summer. I am reading your lessons every week. I noticed there are several ways to visualize data and summarize it. Those were a nice set of data summary lessons.

Yes. Preliminaries in data analysis — visualize and summarize. I recently came across visuals with cute faces 🙂 I will present them at an appropriate time.

That is cool. On the way back from DC, we played the Chicago dice game. I remembered our conversation about probability while playing.

Interesting. How is the game played?

There will be eleven rounds numbered 2 – 12. In each round, we throw the pair of dice to score the number of the round. For example, if on the first try, I get a 1 and 1, I win a point because my first round score is 2. If I throw any other number other than 2, I don’t win anything. The player with the highest total after 11 rounds wins the game.

I see. So there are 11 outcomes (2 – 12), and you are trying to get the outcome. Do you know the probability distribution of these outcomes?

I believe you just used the question to present a new idea – “probability distribution“. Fine, let me do the Socratic thing here and ask “What is probability distribution“?

It is the distribution of the probability of the outcomes. In your Chicago dice example, you have a random outcome between 2 and 12; 2 if you roll a 1 and 1; 12 if you roll a 6 and 6. Each of these random outcomes has a probability of occurring. If you compute these probabilities and plot them; i.e. distribute the probabilities on a number line, we can see a probability distribution of these random variables.

Let me jump in here. There are 11 possible outcomes. I will tabulate the possibilities.

There are limited ways of achieving an outcome. The likelihood of each outcome will be the ratio of the total ways we can get the number and 36. An outcome 2 can only be achieved if we get a (1,1). Hence the probability of getting 2 in this game is 1/36.

Excellent, now try to plot these probabilities on a scale from 2 to 12.

Looking at the table, I can see the probability will increase as we go up from 2 to 7 and decrease from there till 12.

I like the way you named your axes. X and P(X = x). Your plot shows that there is a spike (which is the probability) for each possible outcome. The probability is 0 for all other outcomes. The spikes should add up to 1. This probability graph is called the probability distribution function f(x) for a discrete random variable.

The function can be integrated to obtain the cumulative distribution function. Say you want to know the probability of getting an outcome less than 4. You can use the cumulative function that is integrated over the outcomes 2 and 3. Just be watchful of the notations. Probability distribution function has a lowercase f, and cumulative distribution function has an uppercase F.

So if we know the function f(x), we can find out the probability of any possible event from it. These outcomes are discrete (2 to 12), and the function is also discrete for every outcome. What if the outcomes are continuous? How does the probability distribution function look if the random variable is continuous where the possibilities are infinite?

Okay, let us do a thought experiment. Imagine there are ten similar apples in a basket. What is the probability of taking any apple at random?

Since there are ten apples, the probability of taking one is 1/10.

What if there are n apples?

Then the probability of taking any one is 1/n. Why do you ask?

What happens to the probability if n is a very large number, i.e. if there are infinite possibilities?

Ah, I see. As n approaches infinity, the probability of seeing any one number approaches 0. So unlike discrete random variables which have a defined probability for each outcome, for continuous random variables P(X = x) = 0. How then, can we come up with a probability distribution function?

Recall how we did frequency plots. We partitioned the space into intervals or groups and recorded the number of observations that fall into each group. For continuous random variables, the proportion of observations in the group approaches the probability of being in the group. For a large n, we can imagine a large number of small intervals like this.

We can approximate this to a smooth curve and define the probability of a continuous variable in an interval a and b.

The extension from the frequency plot to the probability distribution function is clear. Since the function is continuous, if we want the cumulative function, we integrate it like this.

Great. You picked up many things today. Did you figure out the odds of getting a deal on your Chicago dice game — getting the same number as the round in your 11 tries?

If you find this useful, please like, share and subscribe.

## Lesson 22 – You are so random

Not just Pinkie Pie, outcomes of events that involve some level of uncertainty are also random. A random variable describes these outcomes as numbers. Random variables can take on different values; just like variables in math taking different values.

If the possible outcomes are distinct numbers (e.g. counts), then these are called discrete random variables. If the possible outcomes can take on any value on the real number line, then these are called continuous random variables.

There are six possible outcomes (1, 2, 3, 4, 5 and 6) when you roll a dice. Each number is distinct. We can assume X as a random variable that can take any number between 1 and 6; hence it is finite and discrete. For any single roll, we can assume x to be the outcome. Notice that we are using uppercase X for the random variable and lowercase x for the value it takes for a given outcome.

X is the set of possible values and x is an observation from that set.

In lesson 20, we explored the rainfall data for New York City and Berkeley. Here, we can assume rain to be a continuous random variable X on the number line. In other words, the rainfall in any year can be a random value on the line with 0 as the lower limit. Can you guess the upper limit for rainfall? The actual data we have is an outcome (x); observation; a value on this random variable scale. Again, X is the possible values rainfall can take (infinite and continuous), and x is what we observed in the sample data.

In lesson 19, we looked at SAT reading score for schools in New York City. Since SAT reading score is between 200, the participation trophy and 800, in increments of 10, we can assume that it is finite and discrete random variable X. Any particular score we observe, for instance, 670 for a student is an observed outcome x.

If you are playing monopoly, the outcome of your roll will be a random variable between 2 and 12; discrete and finite; 2 if you get 1 and 1; 12 if you get 6 and 6, and all combinations in between.

In lesson 14, we plotted the box office revenue for STAR WARS films. We can assume this data as observations of a continuous random variable.

Do you think this random variable showing revenue can be negative? What if they lose money? Maybe not STAR WARS, but there are loads of terrible films that are negative random variables.

Can you think of other random variables that can be negative?