Saturday, October 16, 2010

Performance of BigInteger.toString(radix)

Problem:
You are given a byte array, that represents a number in big endian format (the most significant byte first). You have to convert the byte array to its equivalent hex string. What is the most efficient way to do it?

Solution:
There are many ways to do it. But let us start with the easiest and correct one and optimize our solution. BigInteger class provides a constructor to convert the byte array to a BigInteger. We can convert that BigInteger to a hex string using the BigInteger.toString(radix) method. The solution is given below:
public static final String toHexStringUsingBigInteger(byte[] input) {
        BigInteger bi = new BigInteger(input);
        return bi.toString(16);
    }

We can also come up with a hand crafted solution that extract nibble by nibble and convert them into their equivalent hex character and finally forming a string. This solution is given below:
public static final String toHexStringUsingCharArray(byte input[]) {
        int i = 0;
        if (input == null || input.length <= 0)
            return null;

        char lookupArray[] = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f' };
        char[] result = new char[input.length * 2];

        while (i < input.length) {
            result[2*i] = lookupArray[(input[i]>>4) & 0x0F];
            result[2*i+1] = lookupArray[(input[i] & 0x0F)];
            i++;
        }
        return String.valueOf(result);
    }

For comparison purposes, let us write a test program and see how much time each implementation takes. The test program is given below:
public static void main(String[] args) {
        byte[] input = new byte[]{0x7D, 0x7D, 0x7D, 0x7D, 0x7D, 0x7D, 0x7D, 0x7D};
        long start = System.nanoTime();
        for(int i = 0; i < 100000; i++) toHexStringUsingBigInteger(input);
        long end = System.nanoTime();
        
        long start2 = System.nanoTime();
        for(int i = 0; i < 100000; i++) toHexStringUsingCharArray(input);
        long end2 = System.nanoTime();
        
        System.out.println(String.format("Using BigInteger  : %15d", (end-start)));
        System.out.println(String.format("Using char array  : %15d", (end2-start2)));
    }

Here is the output of few runs of the program:
Using BigInteger  :       702013524
Using char array  :        42074621
Using BigInteger  :       711340129
Using char array  :        41369504
Using BigInteger  :       707484052
Using char array  :        41221440

WOW! You see the difference between the BigInteger version and the hand crafted version? Hand crafted version is almost 16 times faster. Why?

You can take a look at how BigInteger.toString(radix) is implemented in the OpenJDK here. The most important point is, it is making use of Conversion.bigInteger2String() method. You can view the source code of that here. Ultimately, bigInteger2String() method is written such a way that it can cater to different radices and different locales. Most of the complexity in that function is around this concern. That is the reason why BigInteger.toString(radix) is terrible in performance.

Moral of the story:
If you are planning to use BigInteger and convert that back and forth to hex string, you are better off writing your own version of toHexString as the one given above, instead of using what you get with the library. You can invoke the BigInteger.toByteArray() and pass the byte array to your toHexString method.

Most important caveats:
Beware of handling of negative numbers between the two approaches. In case of the second approach, you will not get the sign right. You will always get a hex string for the bytes given as argument, without any special interpretation of the bytes. Where as, the BigInteger approach treats the bytes as a signed integer value. Consider the following test program:
public static void main(String[] args) {
        byte[] input = new byte[]{(byte)0x8D};
        System.out.println("From BigInteger : " + toHexStringUsingBigInteger(input));
        System.out.println("From char array : " + toHexStringUsingCharArray(input));
    }

And the output is:
From BigInteger : -73
From char array : 8d

Depending on your application, you will have to choose one usage over the other.

The other important thing is that the code given above doesn't skip the leading zeroes. You can make a small modification and make it to skip leading zero bytes.

1 comment:

Anonymous said...

Thanks!

Holy crap do I wish Java had implemented unsigned. ARRRGH

- Jamie