Brainteasing Our Way to the Central Limit Theorem
This post is about trying to solve the following brainteaser:
You toss a fair coin 400 times. What’s the probability that you get at least 220 heads? Round your answer to the nearest percent.
Each coin toss is an independent Bernoulli random variable parametrised by since the coin is fair. We’ll call a single sequence of 400 tosses a single trial. It is a Binomial random variable with the probability mass function (PMF)
This means, for example, that the probability of scoring exactly 220 heads is
The final form of this expression can be interpreted as a number of ways to choose 220 heads from 400 tosses in the numerator over the total number of possible configurations in the denominator.
Herein comes the trouble. We would like to know the probability of the number of heads being , not just exactly 220. In other words, we seek to compute the sum
There is a handful of approaches one can take to solve this problem.

Run trials, each of tosses. Count how many trials scored heads. This is the most direct approach, but how does one justify the chosen number of trials? This approach is also most computationally expensive, but as we will see it has its values as it is very easy to implement and it provides a good estimate for comparison with other methods.

The sum can be computed directly by iteration.

A classic pencil and paper approach could use a normal approximation based on the central limit theorem. Statistical tables will be required to avoid any computer or calculator use. [One could avoid using tables and compute everything manually, for example by using Taylor expansions of but we won’t go there.]
Option 1: Simulation
The simulation does not require much understanding of the probabilistic model behind the problem. We just run trials, tosses in each, and find out how many of the trials scored more than heads.
The first simulation is implemented in R:
I get the value of , so the answer is .
For no good reason at all, the same simulation crudely implemented in C++11:
It runs slower than R version above, and a representative answer is , that is about .
Option 2: Sum by Iteration
For a more formal approach, consider the sum
Since the Bernoulli probability for each toss is fixed, the sum of probabilities can be reduced to the sum of binomial coefficients
This sum is easy to do on a computer.
This code returns the value of , or . This result was computed much, much faster than using simulation, but the important thing here is that it’s the same answer obtained using different methods.
The C++ implementation would have to include a way to compute since the standard library does not provide for it. Implementing it is a good exercise, since one cannot rely on the mathematical definition involving factorials – the result would overflow even for rather small values of . (Knuth, 2011) has a lot more to say on this.
Option 3: Normal Approximation Based on the Central Limit Theorem
Since each coin toss is an independent Bernoulli rendom variable and all tosses in a trial are identically distributed with mean and variance , we can try to pretend that the sum of such variables is normally distributed.
Then we can compute the mean of this sum
And also its variance
Now we could use the approximation
where is available from standard normal CDF tables.
If you think of it in terms of area under the graph, we would like to know the area of a slice between 220 and 400, or in other words the difference in the value of the CDF at these locations.
Two points must be noted. Firstly, , since there is only tosses in a trial. Secondly, we have to be careful when moving from discrete to continuous world – we would like to include into our answer, so the last (discrete) value to exclude is .
The CDF tables are normalised, so let’s normalise the value of
My tables (Lindley D.V. & Scott W.F., 2008) state that
so the answer comes to
Rounding this value to the nearest percent, the final answer is , as before.
The normal approximation works so well because, to quote (Bersekas & Tsitsiklis, 2008, p. 279),
When is close to , in which case the PMF of is symmetric, the […] formula yields a very good approximation for as low as or .
This was pencil and paper (and statistical tables) approach. We can do the same thing in software to verify the answer.
This code returns the value of , or , same as before.
There we have it!
Conclusions and Meditations
Despite my education in mathematical modelling, I still get excited when a crude simulation turns out to be helpful for guiding more theoretical work. Such was the case this time as well – the simple simulation as the first option has prevented me from making an orderofmagnitude sized error in my theoretical derivations later.
If you read this far, thanks, and any comments/suggestions are welcome.
Oliver
References
 Bertsekas, D.P. and Tsitsiklis, J.N. (2008) Introduction to Probability. 2nd ed. Athena Scientific.
 Lindley, D.V. and Scott, W.F. (2008) New Cambridge Statistical Tables. 2nd ed. Cambridge University Press
 Knuth, D.E. (2011) The Art of Computer Programming, Volume 2: Seminumerical Algorithms. 3rd ed. Addison Wesley