Random Numbers in Calc: Small Enhancements That Can Make a Difference
RAND() is one of those barely noticeable functions that seem to be trivial and a given in modern spreadsheets. The truth behind RAND() and other functions that try to simulate the real world in some way is that they are extremely difficult, and maybe even impossible, to do correctly.
Historically OpenOffice Calc used a very naïve random number generation function based on the system's libc. The system libc version is usually very outdated and is only meant to produce very limited results, but perhaps more dangerously such a random number generator is system dependent so moving your work from one system to another will cause problems or at least changing behaviours.
As soon as I noticed that OpenOffice used such an obsolete function I thought it would be easy to replace this with some modern algorithm that could guarantee good results: I was right. Incredibly, despite being a closed commercial product, Microsoft described the algorithm they chose as the basis for their implementation in Office[1]. Microsoft has indeed taken note of the requirements of their users and updated the functions to generate random numbers in Excel 2003 with the well documented implementation from Wichmann and Hill (1982).
Since 1982 there have been several well documented algorithms, but there was a natural enchantment in using the same algorithm used by Microsoft; after all they are the leading Office Suite and users tend to know better the behaviour of their suite. After an initial implementation I found that Wichmann and Hill developed an update to their original algorithm in 2006 that completely overcomes some serious limitations in 1982: the original implementation had in mind 16-bit systems while 32 and even 64 bit platforms are ubiquitous nowadays.
The code is not complex; the implementation of the new code was done during two days (part time) following three steps:
- A basic implementation in C to generate some initial numbers for testing. The seeding was done with the traditional libc rand() function.
- Adaptation of the code to work inside OpenOffice: this involved selecting carefully the variable types and finding a method to keep the seeds under control. I was careful to avoid an embarrassing condition of generating negative values. I was also lucky to find in OpenOffice an alternative to generate the seeding values without using rand().
- Testing, testing, and more testing … first locally and now at a wider scale.
I have not discarded yet implementing the popular Mersenne Twister algorithm, which is known to be faster, has a longer period and is available in other spreadsheets (and even as an OO extension), however there is a value in variety: the current algorithm fits perfectly the requirements of a spreadsheet like Calc and uses native features to support good random seeding. Alternatives like Mersenne Twister are already available.
The really nice thing, of course, is that thanks to the Apache License the code is available to be enhanced and improved or even scrapped and redone better in the future by OpenOffice derivatives, non-technical coders like me, and even commercial producers like Microsoft.
Can you help us testing the new pseudorandom number generator? Even though our aim wasn't to obtain crypto-grade quality in the generated numbers, we take our numeric algorithms very seriously, and we're looking for the best. We encourage anyone who wants to help to put our code through rigorous tests and help us find any problems to write to our developers mailing list.
Pedro Giffuni.
[1] http://support.microsoft.com/kb/828795
[2] Wichman, B.A. and I.D. Hill, Algorithm AS 183: An Efficient and Portable Pseudo-Random Number Generator, Applied Statistics, 31, 188-190, 1982.
[3] B. A. Wichmann and Hill, Generating good pseudorandom numbers, Computational Statistics & Data Analysis, Volume 51 Issue 3, December, 2006, Pages 1614-1622.
[4] M. Matsumoto and T. Nishimura, "Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator", ACM Trans. on Modeling and Computer Simulation Vol. 8, No. 1, January pp.3-30 (1998)