Normal approximation for the Binomial distribution

The probability distribution function for Binomial distribution is

$P(X=x) = f(x) = {n \choose x}p^{x}(1-p)^{n-x}; x = 0, 1, 2, ... n$

Assume $1 - p = q$

$f(x) = \frac{n!}{(n-x)!x!}p^{x}q^{n-x} \hspace{5} \cdots (1)$

Stirling’s approximation for n!:

$n! = n^{n}e^{-n}\sqrt{2 \pi n} \hspace{5} \cdots (2)$

Use eq (2) with eq (1)

$f(x) = \frac{n^{n}e^{-n}\sqrt{2 \pi n}}{(n-x)^{(n-x)}e^{-(n-x)}\sqrt{2 \pi (n-x)}x^{x}e^{-x}\sqrt{2 \pi x}}p^{x}q^{n-x} \hspace{5} \cdots (3)$

$f(x) = (\frac{p}{x})^{x} (\frac{q}{n-x})^{n-x} n^{n} \frac{1}{e^{n}} \sqrt{2 \pi} \sqrt{n} \frac{1}{\sqrt{2 \pi (n-x)}} \frac{e^{n}}{e^{x}} e^{x} \frac{1}{\sqrt{2 \pi x}} \hspace{5} \cdots (4)$

$f(x) =(\frac{p}{x})^{x} (\frac{q}{n-x})^{n-x} n^{n} \sqrt{\frac{n}{2 \pi x (n-x)}} \hspace{5} \cdots (5)$

$f(x) =(\frac{p}{x})^{x} (\frac{q}{n-x})^{n-x} n^{n} \frac{n^{x}}{n^{x}} \sqrt{\frac{n}{2 \pi x (n-x)}} \hspace{5} \cdots (6)$

$f(x) =(\frac{np}{x})^{x} (\frac{nq}{n-x})^{n-x} \sqrt{\frac{n}{2 \pi x (n-x)}} \hspace{5} \cdots (7)$

Equation (7) has two terms.

Term 1: $(\frac{np}{x})^{x} (\frac{nq}{n-x})^{n-x}$

Term 2: $\sqrt{\frac{n}{2 \pi x (n-x)}}$

Let’s work with Term 1 and reduce it further.

$ln \Big ( (\frac{np}{x})^{x} (\frac{nq}{n-x})^{n-x} \Big ) = x ln (\frac{np}{x}) + (n-x) ln (\frac{nq}{n-x})$

We need simplifications and series expansions.

Assume $x = c + np$ . Remember $np$ is the expected value of the Binomial distribution. So, $x = c + np$ is simply looking at x as some deviation c from the mean or expected value of the distribution.

$ln(\frac{np}{x}) = ln(\frac{np}{np + c}) = ln(np) - ln(np + c) = - \Big ( ln(np + c) - ln(np) \Big ) = -ln(\frac{np + c}{np}) = - ln(1 + \frac{c}{np})$

Now the approximate expansion (up to the second term) of $ln(1 + x) = x - \frac{x^{2}}{2}$

Using this, we can represent $ln(\frac{np}{x})$ as $-(\frac{c}{np} - \frac{c^{2}}{2n^{2}p^{2}})$

In a similar fashion, we can reduce the second log term $ln (\frac{nq}{n-x})$ as follows.

$ln(\frac{nq}{n-x}) = ln(nq) - ln(n-x)$

$= -(ln(n - x) - ln(nq)$

$= -(ln(n - (np+c)) - ln(nq)$

$= -(ln(n(1-p)-c) - ln(nq))$

$= -(ln(nq-c) - ln(nq))$

$= -(ln(\frac{nq-c}{nq}))$

$= -ln(1 + (\frac{-c}{nq}))$

Using $ln(1+x)$ expansion, we can write

$ln (\frac{nq}{n-x}) = -(\frac{-c}{nq} - \frac{c^{2}}{2n^{2}q^{2}})$

We can substitute these two approximations in Term 1.

$ln \Big ( (\frac{np}{x})^{x} (\frac{nq}{n-x})^{n-x} \Big ) = x ln (\frac{np}{x}) + (n-x) ln (\frac{nq}{n-x})$

$= (np+c)(-(\frac{c}{np} - \frac{c^{2}}{2n^{2}p^{2}})) + (n-(np+c))(-(\frac{-c}{nq} - \frac{c^{2}}{2n^{2}q^{2}}))$

$= (np+c)(\frac{c^{2}}{2n^{2}p^{2}} - \frac{c}{np}) + (nq-c)(\frac{c^{2}}{2n^{2}q^{2}} + \frac{c}{nq})$

$= \frac{npc^{2}}{2np np} - \frac{npc}{np} + \frac{c^{3}}{2n^{2}p^{2}} - \frac{c^{2}}{np} + \frac{nqc^{2}}{2nq nq} + \frac{nqc}{nq} - \frac{c^{3}}{2n^{2}q^{2}} - \frac{c^{2}}{nq}$

$= -\frac{c^{2}}{2np} - \frac{c^{2}}{2nq} -c + c + \frac{c^{3}}{2n^{2}} (\frac{1}{p^{2}} - \frac{1}{q^{2}})$

$= -\frac{c^{2}}{2n}(\frac{1}{p}+\frac{1}{q}) + \frac{c^{3}}{2n^{2}} (\frac{1}{p^{2}} - \frac{1}{q^{2}})$

$= -\frac{c^{2}(p+q)}{2npq} + \frac{c^{3}}{2n^{2}} (\frac{1}{p^{2}} - \frac{1}{q^{2}})$

$= -\frac{c^{2}}{2npq}$ , assuming the second term will vanish as $n \rightarrow \infty$

So,

$ln \Big ( (\frac{np}{x})^{x} (\frac{nq}{n-x})^{n-x} \Big ) = -\frac{c^{2}}{2npq}$

and,

$(\frac{np}{x})^{x} (\frac{nq}{n-x})^{n-x} = e^{-\frac{c^{2}}{2npq}}$

Because, $x = np + c$ ,

$(\frac{np}{x})^{x} (\frac{nq}{n-x})^{n-x} = e^{-\frac{(x-np)^{2}}{2npq}}$

Now, let’s work with Term 2.

Term 2: $\sqrt{\frac{n}{2 \pi x (n-x)}}$

$\sqrt{\frac{n}{2 \pi (np+c) (n-(np+c))}}$

$\sqrt{\frac{n}{2 \pi (np+c) (n(1-p)-c))}}$

$\sqrt{\frac{n}{2 \pi (np+c) (nq-c))}}$

$\sqrt{\frac{n}{2 \pi (n^{2}pq -ncp+ncq - c^{2})}}$

$\sqrt{\frac{1}{2 \pi npq - \frac{2 \pi}{n}( ncp - ncq + c^{2})}}$

As $n \rightarrow \infty$ , we can assume that the second term in the denominator vanishes leaving,

$\sqrt{\frac{1}{2 \pi npq}$

Substituting these two terms (approximation for Term 1 and approximation for Term 2) in equation (7), we get

$f(x) =(\frac{np}{x})^{x} (\frac{nq}{n-x})^{n-x} \sqrt{\frac{n}{2 \pi x (n-x)}} = \frac{1}{\sqrt{2 \pi npq}} e^{-\frac{(x-np)^{2}}{2npq}}\hspace{5} \cdots (8)$

For the binomial distribution, the expected value $\mu = np$ and the variance $\sigma^{2} = npq$ .

Using these notations with equation (8), we get the an approximation for the Binomial distribution.

$f(x) = \frac{n!}{(n-x)!x!}p^{x}q^{n-x} = \frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{-\frac{(x-\mu)^{2}}{2\sigma^{2}}} \hspace{5} \cdots (9)$

or,

$f(x) = \frac{n!}{(n-x)!x!}p^{x}q^{n-x} = \frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{\frac{-1}{2}(\frac{x-\mu}{\sigma})^{2}} \hspace{5} \cdots (10)$

Equation (10) is the probability density function of the normal distribution (the bell shape).

Enjoy this blog? Please spread the word :)