Wednesday, March 11, 2015

Bit twiddling in JDK to generate random number

Some time back a friend asked me an algorithm question:
Given an integer random number generator randomN() that can generate random number in the range [0, N), how will you generate random numbers in the range [0, M) where M
I used modulo arithmetic to generate the desired random numbers in the range [0, M). And I reasoned out that if N = q*M + r, every number in the range [0, r] has an occurrence probability of (q+1)/N, but the numbers in the range [r+1, M) has an occurrence probability of only q/N.

It is easy to visualize this. See the diagram below:

You can see that we can divide line of length N units (given in green) by lines of length M units (given in black). When N is not exactly divisible by M, in the last part alone we have only r units.

So if we choose a random integer in the line represented in green, and then take a modulo M on the value, all the values except the values in the range [r+1, M) (represented in red) will occur q+1 times. But the values in that interval will occur only q times.

So how do we generate uniform random numbers? Solution is very simple: when the generated random number is in the last segment, just discard the number and regenerate another random number. My solution looked like this:

maxAllowed = N - (N mod M)
do {
    randValue = randomN()
} while (randValue >= maxAllowed)
return maxAllowed mod M

Recently I was reading the Java API documentation for Random.nextInt(M). And I stumbled upon a clever trick that actually does what I did using some bit twiddling. The next(numBits) in Java code generates random numbers of length numBits. Since signed integers are 31 bits, internally nextInt(M) calls next(31). This will return values in the range of [0, MAX_INT + 1). Now using this the code generates a random number in the range of [0, M). What is the clever trick here? Look at how integer overflow is used to detect boundary condition.
do { 
 bits = next(31); 
 val = bits % n; 
} while (bits - val + (n-1) < 0);

This boundary condition is similar to "randValue >= maxAllowed" condition in my code.

Personally I feel that my code is easier to understand and (I guess) performs better. The reason why I guess my code performs better is due to the fact that it does modulo arithmetic exactly two times. But JDK's code performs modulo arithmetic every time in the loop. Also for checking the condition it performs addition of four values, while my boundary condition is a straight forward comparison. 

Yet it was interesting to see this piece of code and spend some time to understand what it is actually doing.