# A bad strategy for rolling a die.

When I was learning to program, this is what a lot of books would offer as a model of a die roll

with `rand()`

returning a uniformly distributed random integer from a closed interval `[0,RAND_MAX]`

.

I was slightly surprised when revising C++ some years later I came across Stanford CS106B Course Reader which explained that

`rand()`

guarantees only that the value it produces in uniformly distributed over the range from 0 to RAND_MAX. There is, however, no guarantee that the reminders on division by six will be at all random.

My mathematical maturity at the time was not good enough to figure out why this is the case, so I just kept this nugget of wisdom in my head until yesterday I found my old notes in the box and with some experimentation was able to understand why this must be the case.

# A thought experiment

Omitting the `+1`

part of the formula (without any loss of generality), suppose that we still would like to model rolling a six-sided die where the die sides are numbered 0 to 5.

However, now we know that `RAND_MAX`

is 5. Then, the sample space `rand()`

draws from is

0 | 1 | 2 | 3 | 4 | 5 |

We can take each value in this sample space and find its value modulo 6. The result is the same since, for example, `5 mod 6 = 5`

0 | 1 | 2 | 3 | 4 | 5 |

Therefore, every number from 0 to 5 has a uniform probability of being returned by `rand()`

.

The next step is to increase the size of the sample space so that the last element is now 11. The sample space becomes

0 | 1 | 2 | 3 | 4 | 5 |

6 | 7 | 8 | 9 | 10 | 11 |

And, just as before, we take each value modulo 6 and we get

0 | 1 | 2 | 3 | 4 | 5 |

0 | 1 | 2 | 3 | 4 | 5 |

Although the sample space has increased, the probability `rand() % 6`

has not changed – there is exactly two instances of each value from 0 to 5, so the distribution is still uniform.

You might have already guessed where it leads. Consider setting `RAND_MAX`

to 32,767 which is the minimal value it can be set to by the standard-compliant C compiler implementation. The sample space becomes

0 | 1 | 2 | 3 | 4 | 5 |

6 | 7 | 8 | 9 | 10 | 11 |

... many happily skipped rows ... | |||||

32,766 | 32,767 |

When this sample space is transformed modulo 6 we get

0 | 1 | 2 | 3 | 4 | 5 |

0 | 1 | 2 | 3 | 4 | 5 |

... many happily skipped rows ... | |||||

0 | 1 |

In this table, 0s and 1s outnumber other values by 1 each. If a random number is drawn from this table, 0s and 1s have a slightly higher chance of being picked. The “modulo 6” distribution is not uniform! And any other “modulo n” distribution won’t be, unless the table must be “rectangular” with no cells “hanging” on its bottom row, that is unless n happens to satisfy the condition

`RAND_MAX mod n = n-1`

Now, we do not know what `RAND_MAX`

is set to unless we care to look at a particular compiler implementation, but the lesson is clear – the primitive rand() function may not be good enough to build a model which requires uniform random integers. The CS106 Course Reader mentioned before explains how to augment rand() to produce uniformly distributed random integers from a closed interval `[a,b]`

in four easy steps – Normalisation, Scaling, Translation and Conversion. It is not difficult, but the opportunities for making mistakes are very real. So it must be noted, that C++11 standard provides a much more satisfying choice of random number generators which may be a reason good enough for those interesting in mathematical modelling to prefer C++ to C.