Avik’s Ruminations

Musings on technology and life by Avik Sengupta


without comments

From the department of problems you did not think you had

As you probably know, real numbers are usually represented in computer languages as IEEE floating point numbers, and called floats or doubles depending on precision. Floating point numbers are stored as combination of  an  exponent (the 2’s power)  and a mantissa (or the significand). A double typically takes 64 bits of memory, and  consists of 1 bit of sign, 11 bits of exponent, and 52 bits of significand.

When you display a floating point number to screen, or store it in a log, or in any way convert it to a string, you need to convert the floating point value into a decimal representation. Since a floating point value cannot represent all possible decimal numbers exactly,  there may be rounding errors in the conversion. Further, the result may need to be rounded to display a sensible value — if you input 0.3, you dont want to display, for example, 25 digits of 0.2999999999……  All of this is hard to get right. There are a couple of properties that you want your conversion to follow.

First, the conversion has to be correct, ie, the decimal value, when read back into a floating point value, must match the original:

read ( show (d) ) == d

Second, you’d want the printed representation of the number to be the smallest  possible string that solves the correctness condition.

The seminal work in this area is a 1990 paper by Steel and White : “How to print floating point numbers accurately“. Prior to this paper, there was a wide variation in behaviour of this functionality in different languages and implementations. Steele and White proposed an algorithm called Dragon4, that quickly became the settled standard in this matter. Once printf was standardised to be accurate, and with the existance of the venerable dtoa.c, this remained the state of the art for 20 years.

The trouble with this algorithm however is that it uses arbitrary precision arithmetic (also known as bignums) to achieve accuracy. This made the calculations relatively slow. It also made the implementations complex. dtoa.c, for example, contains its own arbitrary precision maths implementation.

Finally in 2010, Florian Loitsch published a major advance in this area in paper titled “Printing floating-point numbers quickly and accurately with integers“. He proposed an algorithm called Grisu, that accurately printed floating point numbers using only 64 bit integers.  Without the use of bignums, this is about four times faster than previous implementations. The only catch is that the algorithm fails on about 0.5% of numbers, and needs to drop down to a slower algorithm for those numbers.

Grisu is now implemented as the standard floating-point-to-decimal conversion routine in the V8 javascript engine (and hence in the Chrome browser) as well as in Firefox. The implementation from V8 has been extracted by Florian into a standalone, BSD licensed project called double-conversion. Of course, Julia, with its emphasis on mathematical correctness, uses double-conversion to implement Grisu in order to display or print a float.

So there you have it, printing a floating point value is more hard work than it seems.

[Update] Since this was written, Julia no longer users double-conversion. Jacob Quinn wrote a  pure Julia implementation of Grisu.

Written by

July 24th, 2012 at 5:55 pm

Posted in Technology