February 2020 – dataanalysisclassroom

Lesson 91 – The One-Sample Hypothesis Tests in R

The last time we did something in R is in Lesson 83. Since then, we have been learning the hypothesis testing framework. In Lesson 84, we first made friends with the hypothesis tests — the concept of needing evidence beyond a reasonable doubt. In Lesson 85, we learned the framework — the elements of hypothesis testing that provide us with a systematic way of setting up the test and using data to verify the hypothesis.

In Lessons 86, 87, and 88, we learned the one-sample hypothesis test on proportion, mean, and the variance, respectively. Then, in Lesson 89, we learned the one-sample sign test as a complement to the one-sample hypothesis test on the mean, which can account for outliers as well. Last week (Lesson 90), we dived into the bootstrap-based method where we relaxed the assumptions on the null distribution. The bootstrap-based process provides us the flexibility to run the test on various statistics of interest.

Today, let’s pause and look at how to perform these hypothesis tests in R. Brace yourself for an R drive.

To help us with this lesson today, we will make use of the eight data points on 2003 Ford Focus vehicle mileage that Joe used in Lesson 89.

The Basic Steps

Step1: Get the data

You can get the data from here.

Step 2: Create a new folder on your computer

Let us call this folder “lesson91”.

Step 3: Create a new code in R

Create a new code for this lesson. “File >> New >> R script”. Save the code in the same folder “lesson91” using the “save” button or by using “Ctrl+S”. Use .R as the extension — “lesson91_code.R”

Step 4: Choose your working directory

“lesson91” is the folder where we stored the code. Use “setwd(“path”)” to set the path to this folder. Execute the line by clicking the “Run” button on the top right.

setwd("path to your folder")

Step 5: Read the data into R workspace

Since the sample data is small enough, instead of saving the data in a text file and reading it into R workspace, we can directly input the data into the workspace as follows:

## Data Entry ##
# FORD MPG - reported # 
  x = c(21.7, 29, 28.1, 31.5, 24, 21.5, 28.7, 29)

The Baseline for Null Hypotheses

Let’s run the hypothesis tests on the mean, standard deviation, and the proportion. For this, we have to make some modest assumptions.

For the mean, we will use the EPA specified rating of 26.2 MPG. $\mu = 26.2$ MPG.

For comparing the standard deviation, we are unsure of the baseline standard deviation from the EPA rating. However, they provide us with a range that is between 22 MPG and 32 MPG. If we can assume that the range is covered by six standard deviations, i.e., $32 - 22 = 6 \sigma$ , we can get an estimate for baseline $\sigma$ . In this case, $\sigma = 1.67$ MPG.

Further, let’s also assume that one in eight cars will usually have a mileage less than the minimum rating of 22 MPG. $p = \frac{1}{8}$

# Compare to Baseline
  mu = 26
 
  rangex = c(22,32)
  sigma = diff(range_ford)/6

  n = length(x)
  p = 1/n

Hypothesis Test on the Mean

For the hypothesis test on the mean, we learned three methods, the t-test, the sign-test, and the bootstrap-based test.

Let’s start with the t-test. For the t-test, the null and the alternate hypothesis are:

$H_{0}: \mu \ge 26.2$ MPG

$H_{A}: \mu < 26.2$ MPG

The null distribution is a T-distribution with (n-1) degrees of freedom. The test statistic is $t_{0}=\frac{\bar{x}-\mu}{\frac{s}{\sqrt{n}}}$ . We have to verify how likely it is to see a value as large as $t_{0}$ in the null distribution.

Let’s compute the test statistic.

#T-Test #
   xbar = mean(x)

   s = sd(x)

   to = (xbar-mu)/(s/sqrt(n))

> to
   [1] 0.5173082

The test statistic is 0.517.

For a selected rejection rate of $\alpha$ we have to compute the p-value corresponding to this test statistic, or the critical value on the T-distribution.

   alpha = 0.05
   
   pvalue_ttest = pt(to,df=(n-1))   
   
   tcritical = qt(alpha,df=(n-1))

When you execute these lines, you will see that the p-value is 0.69, and the critical value from the t-distribution is -1.89. The “pt” function computed $P(T \le t_{0})$ from a T-distribution with user-specified degrees of freedom. For our case, df = 7. The “qt” function computes the quantile corresponding to a probability of 5%, i.e., $\alpha=0.05$ from a T-distribution with user-specified degrees of freedom.

> pvalue_ttest
   [1] 0.6895594

> tcritical
   [1] -1.894579

Since the p-value is greater than 0.05, or since $t_{0}>t_{critical}$ , we cannot reject the null hypothesis.

In R, there is a function to run this test — “t.test“

We could have used the function directly on the sample data x.

# using t-Test function in R #
   t.test(x,alternative = "less", mu = mu, conf.level = 0.95)

One Sample t-test
 data:  x
 t = 0.51731, df = 7, p-value = 0.6896
 alternative hypothesis: true mean is less than 26
 95 percent confidence interval:
      -Inf 29.20539
 sample estimates:
 mean of x 
   26.6875

It provides $t_{0}$ and the p-value as the outputs from which we can decide on the null hypothesis.

Next, let’s look at how to conduct the sign-test in R. The null and the alternative hypothesis for the sign-test are

$H_{0}: P(X > 26.2) = 0.5$

$H_{A}: P(X > 26.2) < 0.5$

If $H_{0}$ is true, about half of the values of sample X will be greater than 26.2 MPG, and about half of them will be negative. If $H_{A}$ is true, more than half of the values of sample X will be less than 26.2 MPG, i.e., the sample shows low mileages — significantly less than 26.2 MPG.

To run this test, we should compute $s^{+}$ , the test statistic that determines the number of values exceeding 26.2 MPG.

#Sign-Test
 splus = length(which(x>mu))

 > splus
   [1] 5

$s^{+}$ is 5.

Under the null hypothesis, $s^{+}$ follows a binomial distribution with a probability of 0.5. Using this assumption, we compute the p-value using the “binomial.test” function in R.

# Using Binom Test function in R 
   binom.test(splus,n,p=0.5,alternative = "less")

Exact binomial test
 data:  splus and n
 number of successes = 5, number of trials = 8, p-value = 0.8555
 alternative hypothesis: true probability of success is less than 0.5
 95 percent confidence interval:
  0.0000000 0.8888873
 sample estimates:
 probability of success 
                  0.625

The test provides the p-value, i.e., $P(S^{+} \le 5)$ from a binomial distribution with n = 8 and p = 0.5.

$P(S^{+} \le 5)=\sum_{k=0}^{k=5}{8 \choose k} p^{k}(1-p)^{8-k}=0.8555$

Since the p-value is greater than 0.05, the rejection rate, we cannot reject the null hypothesis.

Lastly, we will check out the bootstrap-based test on the mean. For the bootstrap-based one-sided test, the null hypothesis and the alternate hypothesis are

$H_{0}: P(\bar{x} > 26.2) = 0.5$

$H_{A}: P(\bar{x} > 26.2) < 0.5$

Use the following lines to set up and run the test.

#Bootstrap Test
 N = 10000

 xmean_null = matrix(NA,nrow=N,ncol=1)
  for (i in 1:N)
   {
     xboot = sample(x,n,replace=T)
     xmean_null[i,1] = mean(xboot)
   }
 
hist(xmean_null,col="pink",xlab="Mean of the Distribution",font=2,font.lab=2,main="Null Distribution Assuming Ho is True")

points(mu,10,pch=15,cex=2,col="blue")

abline(v=c(quantile(xmean_null,0.95)),col="black",lwd=2,lty=2)

pvalue_bootstrap = length(which(xmean_null>mu))/N

> pvalue_bootstrap
   [1] 0.7159

In the loop, we are executing the “sample” function to draw a bootstrap-replicate from the original data. Then, from this bootstrap-replicate, we compute the mean. Repeated such sampling and computation of the bootstrap-replicated mean statistic forms the null distribution. From this null distribution, we calculate the p-value, i.e., the proportion of the null distribution that exceeds the baseline of 26.2 MPG.

Since the p-value (0.716) is greater than a 5% rejection rate, we cannot reject the null hypothesis.

We can also observe that the code provides a way to visualize the null distribution and the basis value of 26.2 MPG. If the basis value $\mu=26.2$ is so far out on the null distribution of $\bar{x}$ that less than 5% of the bootstrap-replicated test statistics are greater than $\mu$ , we would have rejected the null hypothesis.

Hypothesis Test on the Variance

For the hypothesis test on the variance, we learned two methods, the Chi-square test, and the bootstrap-based test.

Let’s first look at the Chi-square test. The null and alternate hypothesis is as follows:

$\sigma^{2}=1.67^{2}$

$\sigma^{2} \neq 1.67^{2}$

The alternate hypothesis is two-sided. Deviation in either direction (less than or greater than) will reject the null hypothesis.

If you can recall from Lesson 88, $\frac{(n-1)s^{2}}{\sigma^{2}}$ is our test statistic, $\chi^{2}_{0}$ , which we will verify against a Chi-square distribution with $(n-1)$ degrees of freedom.

# Chi-square Test
chi0 = ((n-1)*s^2)/sigma^2

pvalue_chi0 = pchisq(chi0,df=(n-1))

chilimit_right = qchisq(0.975,df=(n-1))

chilimit_left = qchisq(0.025,df=(n-1))

The “pchisq” function computes $P(\chi^{2} \le \chi^{2}_{0})$ from a Chi-square distribution with user-specified degrees of freedom, df = 7. The “qchisq” function computes the quantile corresponding to a specified rate of rejection. Since we are conducting a two-sided test, we calculate the limiting value on the left tail and the right tail.

 > pvalue_chi0
   [1] 0.9999914

The p-value is 0.999. Since it is greater than 0.975, we reject the null hypothesis based on the two-sided test.

> chi0
   [1] 35.60715

> chilimit_left
   [1] 1.689869

> chilimit_right
   [1] 16.01276

The lower and the upper bound from the null distribution are 1.69 and 16.01, respectively. Whereas, the test statistic $\chi^{2}_{0}$ is 35.60, well beyond the upper bound acceptable value. Hence we reject the null hypothesis.

Now, let’s look at how to run a bootstrap-based test for the standard deviation. For the bootstrap-based two-sided test, the null hypothesis and the alternate hypothesis are

$H_{0}: P(\sigma > 1.67) = 0.5$

$H_{A}: P(\sigma > 1.67) \neq 0.5$

Use the following lines to set up and run the test.

# Bootstrap Test for Standard Deviation
 N = 10000
 
xsd_null = matrix(NA,nrow=N,ncol=1)
 for (i in 1:N)
   {
     xboot = sample(x,n,replace=T)
     xsd_null[i,1] = sd(xboot)
   }

hist(xsd_null,col="pink",xlab="Standard Deviation of the Distribution",font=2,font.lab=2,main="Null Distribution Assuming Ho is True")

points(sigma,10,pch=15,cex=2,col="blue")
   abline(v=c(quantile(xsd_null,0.025),quantile(xsd_null,0.975)),col="black",lwd=2,lty=2)

pvalue_bootstrap = length(which(xsd_null>sigma))/N

> pvalue_bootstrap
   [1] 0.9747

As before, in the loop, we are executing the “sample” function to draw a bootstrap-replicate from the original data. Then, from this bootstrap-replicate, we compute the standard deviation. Repeated such sampling and computation of the bootstrap-replicated standard deviation statistic forms the null distribution for $\sigma$ . From this null distribution, we calculate the p-value, i.e., the proportion of the null distribution that exceeds the baseline of 1.67 MPG.

Since the p-value (0.975) is greater than or equal to 0.975, ( $\frac{\alpha}{2}$ rejection rate), we reject the null hypothesis.

The code provides a way to visualize the null distribution and the basis value of 1.67 MPG. The basis value is far left on the null distribution. Almost 97.5% of the bootstrap-replicated test statistics are greater than $\sigma = 1.67$ — we reject the null hypothesis.

Hypothesis Test on the Proportion

For the hypothesis test on the proportion, we can employ the binomial distribution (parametric) as the null distribution, or use the bootstrap-based method (non-parametric) to generate the null distribution. For the parametric approach, the null and the alternative hypothesis are

$H_{0}: p \le \frac{1}{8}$

$H_{A}: p > \frac{1}{8}$

We are considering a one-sided alternative since the departure in one direction (greater than) is sufficient to reject the null hypothesis.

From a sample of eight cars, the null distribution is the probability distribution of observing any number of cars having a mileage of less than 22 MPG. In other words, out of the eight cars, we could have 0, 1, 2, 3, …, 8 cars to have a mileage less than 22 MPG, the lower bound specified by EPA.

The null distribution is the distribution of the probability of observing these outcomes. In a Binomial null distribution with n = 8 and p = 1/8, what is the probability of getting 0, 1, 2, …, 8? and if more than an acceptable number (as seen in the null distribution) is seen, we reject the null hypothesis.

We can generate and plot that null distribution in R using the following lines.

## Hypothesis Test on the Proportion
 
ncars = length(which(x<22))

plot(0:n,dbinom(0:n,n,prob=p),type="o",xlab="Number of cars with MPG less than the range",ylab="P(X=x)",font=2,font.lab=2)

x_at_alpha = qbinom(0.95,n,prop) # approx quantile for 5% rejection 

cord.x <- c(x_at_alpha,seq(x_at_alpha,n),x_at_alpha) 

cord.y <- c(0, dbinom(x_at_alpha:n,n,prob=p), 0) 

polygon(cord.x,cord.y,col="pink")

points(ncars,0.002,pch=15,col="blue",cex=2)

pval = sum(dbinom(ncars:n,n,prob=p))

In the first line, we are computing the number of cars that have a mileage of less than 22 MPG, which is the test statistic. We observe this to be two cars.

> ncars
   [1] 2

In the next few lines, we are plotting the null distribution as computed from a binomial distribution with n = 8 and p = 1/8, and visually showing the region of rejection using the “polygon” function.

Then, we compute the p-value as the probability of observing two or more cars having a mileage less than 22 MPG.

$P(Y \ge 2) = \sum_{k=2}^{k=8} {8 \choose k} p^{k}(1-p)^{8-k}=0.2636$

> pval
   [1] 0.2636952

Since the p-value is greater than 0.05, we cannot reject the null hypothesis (based on the limited sample) that one in eight cars (2003 Ford Focus) will have a mileage of less than 22 MPG.

Let’s wrap up the day by running the bootstrap test on the proportion. For the bootstrap-based hypothesis test on the proportion, the null and the alternate hypothesis are

$H_{0}: P(p > \frac{1}{8}) = 0.5$

$H_{A}: P(p > \frac{1}{8}) > 0.5$

If we get a large proportion of the null distribution to be greater than 1/8, we will reject the null hypothesis.

Use the following lines to execute the test.

# Bootstrap Test on Proportion 
 N = 10000

xprop_null = matrix(NA,nrow=N,ncol=1)
 for (i in 1:N)
   {
     xboot = sample(x,n,replace=T)
     xprop_null[i,1] = length(which(xboot<22))/n
   }

hist(xprop_null,col="pink",xlab="Proportion of the Distribution",font=2,font.lab=2,main="Null Distribution Assuming Ho is True")

points(prop,10,pch=15,cex=2,col="blue")

abline(v=c(quantile(xprop_null,0.95)),col="black",lwd=2,lty=2)

pval=length(which(xprop_null>prop))/N

> pval
   [1] 0.631

We are executing the “sample” function N = 10,000 times to draw bootstrap-replicates from the original data. Then, for each bootstrap-replicate, we compute the proportion that is less than 22 MPG. The 10,000 bootstrap-replicated proportions forms the null distribution for $p$ . From this null distribution, we calculate the p-value, i.e., the proportion of the null distribution that exceeds the baseline of $\frac{1}{8}$

Since the p-value (0.631) is less than 0.95, ( $1-\alpha$ rejection rate), we cannot reject the null hypothesis. 63% of the bootstrap-replicated proportion is greater than the baseline. If more than 95% of the bootstrap-replicated proportion is greater than the baseline, we would have rejected the null hypothesis. This scenario would have unfolded if the original sample had several cars with less than a mileage of 22 MPG. The bootstrap replicates would show them, in which case the proportion would be much greater than 1/8 in many replicates, and overall, the p-value would exceed 0.95.

Take the next two weeks to digest all the one-sample hypothesis tests, including how to execute the tests in R. In two weeks, we will move on to the two-sample hypothesis tests.

Here is the full code for today’s lesson.

If you find this useful, please like, share and subscribe.
You can also follow me on Twitter @realDevineni for updates on new lessons.

Lesson 90 – The One-Sample Hypothesis Tests Using the Bootstrap

Hypothesis Tests – Part V

$H_{0}: P(\theta > \theta^{*}) = 0.5$

$H_{A}: P(\theta > \theta^{*}) > 0.5$

$H_{A}: P(\theta > \theta^{*}) < 0.5$

$H_{A}: P(\theta > \theta^{*}) \neq 0.5$

Jenny and Joe meet after 18 lessons.

I heard you are neck-deep into the hypothesis testing concepts.

Yes. And I am having fun learning about how to test various hypotheses, be in on the mean, on the standard deviation, or the proportion. It is also enlightening to learn how to approximate the null distribution using the limiting distribution concepts.

True. You have seen in lesson 86 — hypothesis tests on the proportion, that the null distribution is a Binomial distribution with n, the sample size, and p, the proportion being tested.

You have seen in lesson 87 — hypothesis tests on the mean, that the null distribution is a T-distribution because $\frac{\bar{x}-\mu}{\frac{s}{\sqrt{n}}} \sim t_{df=n-1}$

You have seen in lesson 88 — hypothesis tests on the variance, that the null distribution is a Chi-square distribution because $\frac{(n-1)s^{2}}{\sigma^{2}} \sim \chi^{2}_{df=n-1}$

Did you ever wonder what if the test statistic is more complicated mathematically than the mean or the variance or the proportion and if its limiting distribution or the null distribution is hard to derive?

Or, did you ever ask what if the assumptions that go into deriving the null distribution are not met or not fully satisfied?

Can you give an example?

Suppose you want to test the hypothesis on the median or the interquartile range or the skewness of a distribution?

Or, if you are unsure about the distributional nature of the sample data? For instance, the assumption that $\frac{(n-1)s^{2}}{\sigma^{2}} \sim \chi^{2}_{df=n-1}$ is based on the premise that the sample data is normally distributed.

😕 There are non-parametric or distribution-free approaches? I remember Devine mentioning a bootstrap approach.

😎 In lesson 79, we learned about the concept of the bootstrap. Using the idea of the bootstrap, we can generate replicates of the original sample to approximate the probability distribution function of the population. Assuming that each data value in the sample is equally likely (with a probability of 1/n), we can randomly draw n values with replacement. By putting a probability of 1/n on each data point, we use the discrete empirical distribution $\hat{f}$ as an approximation of the population distribution $f$ .

Hmm. Since each bootstrap replicate is a possible representation of the population, we can compute the test statistics from this bootstrap sample. And, by repeating this, we can have many simulated values of the test statistics to create the null distribution against which we can test the hypothesis.

Exactly. No need to make any assumption on the distributional nature of the data at hand, or the kind of the limiting distribution for the test statistic. We can compute any test statistic from the bootstrap replicates and test the basis value using this simulated null distribution. You want to test the hypothesis on the median, go for it. On the skewness or geometric mean, no problem.

This sounds like an exciting approach that will free up the limitations. Why don’t we do it step by step and elaborate on the details. Our readers will appreciate it.

Absolutely. Do you want to use your professor’s hypothesis that the standard deviation of his class’s performance is 16.5 points, as a case in point?

Sure. In a recent conversation, he also revealed that the mean and median scores are 54 and 55 points, respectively and that 20% of his class usually get a score of less than 40.

Aha. We can test all four hypotheses then. Let’s take the sample data.

60, 41, 70, 61, 69, 95, 33, 34, 82, 82

Yes, this is a sample of ten exam scores from our most recent test with him.

Let’s first review the concept of the bootstrap. We have the following data.

Assuming that each data value is equally likely, i.e., the probability of occurrence of any of these ten data points is 1/10, we can randomly draw ten numbers from these ten values — with replacement.

Yes, I can recall from lesson 79 that this is like playing the game of Bingo where the chips are these ten numbers. Each time we get a number, we put it back and roll it again until we draw ten numbers.

Yes. For real computations, we use a computer program that has this algorithm coded in. We draw a random number from a uniform distribution ( $f(u)$ ) where $u$ is between 0 and 1. These randomly drawn $u's$ are mapped onto the ranked data to draw a specific value from the set. For example, in a set of ten values, for a randomly drawn u of 0.1, we can draw the first value in order — 33.

Since each value is equally likely, the bootstrap sample will consist of numbers from the original data (60, 41, 70, 61, 69, 95, 33, 34, 82, 82), some may appear more than one time, and some may not show up at all in a random sample.

Let me create a bootstrap replicate.

See, 70 appeared two times, 82 appeared three times, and 33 did not get selected at all.

Such bootstrap replicates are representations of the empirical distribution $\hat{f}$ . The empirical distribution $\hat{f}$ is the proportion of times each value in the data sample $x_{1}, x_{2}, x_{3}, …, x_{n}$ occurs. If we assume that the data sample has been generated by randomly sampling from the true distribution, then, the empirical distribution (i.e., the observed frequency) $\hat{f}$ is a sufficient statistic for the true distribution $f$ .

In other words, all the information contained in the true distribution can be generated by creating $\hat{f}$ , the empirical distribution.

Yes. Since an unknown population distribution $f$ has produced the observed data $x_{1}, x_{2}, x_{3}, …, x_{n}$ , we can use the observed data to approximate $f$ by its empirical distribution $\hat{f}$ and then use $\hat{f}$ to generate bootstrap replicates of the data.

How do we implement the hypothesis test then?

Using the same hypothesis testing framework. We first establish the null and the alternative hypothesis.

$H_{0}: P(\theta > \theta^{*}) = 0.5$

$\theta$ is the test statistic computed from the bootstrap replicate and $\theta^{*}$ is the basis value that we are testing. For example, a standard deviation of 16.5 is $\theta^{*}$ and standard deviation computed from one bootstrap sample is $\theta$ .

The alternate hypothesis is then,

$H_{A}: P(\theta > \theta^{*}) > 0.5$

$H_{A}: P(\theta > \theta^{*}) < 0.5$

$H_{A}: P(\theta > \theta^{*}) \neq 0.5$

Essentially, for each bootstrap replicate i, we check whether $\theta_{i} > \theta^{*}$ . If yes, we register $S_{i}=1$ . If not, we register $S_{i}=0$ .

Now, we can repeat this process, i.e., creating a bootstrap replicate, computing the test statistic and verifying whether $\theta_{i} > \theta^{*}$ or $S_{i} \in (0,1)$ a large number of times, say N = 10,000. The proportion of times $S_{i} = 1$ in a set of N bootstrap-replicated test statistics is the p-value.

And we can apply the rule of rejection if the $p-value < \alpha$ , the selected rate of rejection.

Correct. That is for a one-sided hypothesis test. If it is a two-sided hypothesis test, we use the rule $\frac{\alpha}{2} \le p-value < 1- \frac{\alpha}{2}$ for non-rejection, i.e., we cannot reject the null hypothesis if the p-value is between $\frac{\alpha}{2}$ and $1-\frac{\alpha}{2}$ .

Great! For the first bootstrap sample, if we were to verify the four hypotheses, we register the following.

Since the bootstrap sample mean $\bar{x}_{boot}=67.8$ is greater than the basis of 54, we register $S_{i}=1$ .

Since the bootstrap sample median $\tilde{x}_{boot}=69.5$ is greater than the basis of 55, we register $S_{i}=1$ .

Since the bootstrap sample standard deviation $\sigma_{boot}=14.46$ is less than the basis of 16.5, we register $S_{i}=0$ .

Finally, since the bootstrap sample proportion $p_{boot}=0.1$ is less than the basis of 0.2, we register $S_{i}=0$ .

We do this for a large number of bootstrap samples. Here is an illustration of the test statistics for three bootstrap replicates.

Let me run the hypothesis test on the mean using N = 10,000. I am creating 10,000 bootstrap-replicated test statistics.

The distribution of the test statistics is the null distribution of the mean. Notice that it resembles a normal distribution. The basis value of 54 is shown using a blue square on the distribution. From the null distribution, the proportion of times $S_{i}=1$ is 0.91. 91% of the $\bar{x}$ test statistics are greater than 54.

Our null hypothesis is
$H_{0}: P(\bar{x} > 54) = 0.5$

Our alternate hypothesis is one-sided. $H_{A}: P(\bar{x} > 54) < 0.5$

Since the p-value is greater than a 5% rejection rate, we cannot reject the null hypothesis.

If the basis value $\mu$ is far out on the null distribution of $\bar{x}$ that less than 5% of the bootstrap-replicated test statistics are greater than $\mu$ , we would have rejected the null hypothesis.

Shall we run the hypothesis test on the median?

$H_{0}: P(\tilde{x} > 55) = 0.5$

$H_{A}: P(\tilde{x} > 55) < 0.5$

Again, a one-sided test.

Sure. Here is the answer.

We can see the null distribution of the test statistic (median from the bootstrap samples) along with the basis value of 55.

86% of the test statistics are greater than this basis value. Hence, we cannot reject the null hypothesis.

The null distribution of the test statistic does not resemble any known distribution.

Yes. Since the bootstrap-based hypothesis test is distribution-free (non-parametric), not knowing the nature of the limiting distribution of the test statistic (median) does not restrain us.

Awesome. Let me also run the test for the standard deviation.

$H_{0}: P(\sigma > 16.5) = 0.5$

$H_{A}: P(\sigma > 16.5) \ne 0.5$

I am taking a two-sided test since a deviation in either direction, i.e., too small a standard deviation or too large of a standard deviation will disprove the hypothesis.

Here is the result.

The p-value is 0.85, i.e., 85% of the bootstrap-replicated test statistics are greater than 16.5. Since the p-value is greater than the acceptable rate of rejection, we cannot reject the null hypothesis.

If the p-value were less than 0.025 or greater than 0.975, then we would have rejected the null hypothesis.

For a p-value of 0.025, 97.5% of the bootstrap-replicated standard deviations will be less than 16.5 — strong evidence that the null distribution produces values much less than 16.5. For a p-value of 0.975, 97.5% of the bootstrap-replicated standard deviations will be greater than 16.5 — strong evidence that the null distribution produces values much greater than 16.5. In either of the sides, we reject the null hypothesis that the standard deviation is 16.5.

Let me complete the hypothesis test on the proportion.

$H_{0}: P(p > 0.2) = 0.5$

$H_{A}: P(p > 0.2) \ne 0.5$

Let’s take a two-sided test since deviation in either direction can disprove the null hypothesis. If we get a tiny proportion or a very high proportion compared to 0.2, we will reject the belief that the percentage of students obtaining a score of less than 40 is 0.2.

Here are the null distribution and the result from the test.

The p-value is 0.32. 3200 out of the 10000 bootstrap-replicated proportions are greater than 0.2. Since it is between 0.025 and 0.975, we cannot reject the null hypothesis.

You can see how widely the bootstrap concept can be applied for hypothesis testing and what flexibility it provides.

To summarize:

Repeatedly sample with replacement from the original sample data. 
Each time, draw a sample of size n.

Compute the desired statistic from each bootstrap sample.
(mean, median, standard deviation, interquartile range, 
 proportion, skewness, etc.)

Null hypothesis  can now be tested as follows:

 if , else, 

 
(average over all N bootstrap-replicated test statistics)

If  or , reject the null hypothesis 
(for a two-sided hypothesis test at a selected rejection rate of )

If , reject the null hypothesis 
(for a left-sided hypothesis test at a rejection rate of )

If , reject the null hypothesis 
(for a right-sided hypothesis test at a rejection rate of )