Random Numbers
Volume Number: | | 8
|
Issue Number: | | 3
|
Column Tag: | | Coding Efficiently
|
Fast Random Numbers
A random number generator that is 10 times faster
By Jon Bell, Clinton South Carolina
Note: Source code files accompanying article are located on MacTech CD-ROM or source code disks.
So-called random numbers are an important ingredient in many applications. This article discusses some of the theory behind random number generators, and shows how to replace the Toolbox random number generators with one that produces the same results but is more than ten times faster on a Mac SE/30.
[NOTE: The material in the following two sections is summarized from the article by Park and Miller which is listed in the references at the end of this article.]
Random Numbers
All random number generators produce sequences of numbers by using a fixed set of rules to produce each number in the sequence from its predecessors. This means that the numbers are not really random at all, in a technical sense, because they are completely predictable, provided that you know the rules. The only way to get true random numbers would be to use some physical process which is inherently random, such as rolling dice or counting the number of decays per second from a radioactive source. Nevertheless, a computer can produce numbers which look random to a person who doesnt know the rule by which they were generated, and which satisfy many of the common statistical tests of randomness. Such numbers are often called pseudorandom numbers.
Depending on the application, there are many kinds of random number sequences. For example, we might want real (floating-point) numbers which lie between some given minimum and maximum value; or we might want integers, instead. We might want uniformly random numbers, in which the probability of getting any single number is the same as the probability of getting any other number; or we might want random numbers which are distributed according to some pattern, such as the familiar Gaussian (bell-shaped) curve. Any of these kinds of random numbers can be produced by applying a suitable transformation to a sequence of integers which is uniformly distributed between 1 and some maximum value, and so most random number generators have a random integer generator as their core.
Before actually looking at specific random number generators, its worth summarizing the desirable properties which they should have. First, we want to get as many different random numbers as possible, within the limits we specify. The problem is that any random number generator will begin to repeat itself eventually. If we calculate each number in the sequence based only on the previous number in the sequence, as soon as we generate a number weve generated before, the sequence will repeat itself from that point on. The length of the sequence before it begins to repeat is called its period. Clearly we would like the period to be as long as possible.
Second, the sequence of numbers, although completely deterministic, should look random enough for our purposes. At the very least, the numbers should be uniformly distributed, so that we are just as likely to get 1236 as we are to get 8490792, or 52804, or whatever. In addition, the numbers should not be correlated: if we take two (or three, or four...) numbers from the sequence according to some specified rule (say, two consecutive numbers), those numbers should not be related to each other in any obvious way. All random number generators can be made to fail this test, but some fail more obviously than others.
Finally, it should be possible to implement the generator correctly and (hopefully) efficiently on a wide variety of machines and in a wide variety of languages, without having to depend on tricks which are peculiar to a particular machine or language. No matter what the computing environment, we should be able to get repeatable results.
Many random number generators which have been implemented on various systems, or recommended in various programming textbooks, have failed to meet one or more of these criteria. Below I will describe a so-called minimal standard random number generator which meets all three to some extent. It is almost certainly not the best random number generator available, but it is good enough for most purposes, and can serve as a benchmark for evaluating other random number generators.
The Minimal Standard Generator
There are many different schemes for producing uniformly random integers on a computer. One of the simplest was devised by D.H. Lehmer in 1951. After choosing two parameters, the modulus (m) and the multiplier (a), and the first integer, z1, in the sequence, we generates successive numbers in the sequence as follows:
z2 = az1 mod m
z3 = az2 mod m
...
zn = azn-1 mod m
Here mod means take the remainder after dividing by, as in Pascal and other programming languages. By the nature of the mod operation, the result always lies between 0 and m-1, inclusive.
It turns out that if we let m = 231 - 1, there are 534,600,000 different multipliers, a, which give us a generator with the maximum possible period, namely 231 - 1. This is the longest period which we can attain using 32-bit integers.
It looks as if we have an embarassingly rich choice of multipliers. Unfortunately, most of them cannot be used correctly on a computer with 32-bit integers, because they would cause the intermediate product anz to overflow. In 1979, G. L. Schrage devised an implementation of the linear congruential generator which is less susceptible to integer overflow than the simple-minded version described above. Even with Schrages method, nevertheless, it turns out that only 23,093 of the 534,600,000 full-period multipliers can be used in a correct implementation. To decide which of these multipliers are best, we must study in detail the sequences they produce.
As far as randomness is concerned, linear congruential random number generators have a well-known flaw. If we take consecutive pairs of numbers, say (z1, z2), (z3, z4), (z5, z6), etc., from a linear congruential sequence, and plot them on a graph, we find that they invariably lie on a series of parallel diagonal lines. Similarly, consecutive triplets lie on a series of parallel planes in three-dimensional space; consecutive quadruplets lie on parallel hyperplanes in four-dimensional space; and so on. Depending on how we use the numbers, these correlations can occasionally produce strange non-random results. The spacing between the planes varies according to the choice of a and m, and should be made as small as possible.
In 1969, Lewis, Goodman and Miller designed a 32-bit random number generator for the IBM System/360. They decided to use a linear congruential generator with m = 231 - 1 = 2147483647 and a = 75 = 16807. Since then, this generator has been implemented on a variety of systems, and has been dubbed a minimal standard random number generator.
The minimal standard generator does not give the smallest hyperplane spacing, though. That honor appears to be shared by the two generators which use the multipliers a = 48271 and a = 69621. Nevertheless, I will use the minimal standard in my implementation. The program listings indicate which parameters you need to change if you want to experiment with the better multipliers.
Before looking at how to implement the minimal standard generator, though, lets look at what the Macintosh Toolbox already provides us.
The QuickDraw Random Number Generator
The Macintosh Toolbox actually contains two random number generators. The first one is part of QuickDraw. Its seed is a QuickDraw global variable named randSeed, a longint which is initialized to 1 by InitGraf at the start of a program. The function Random updates the seed and returns an integer in the range from -32767 to 32767. I know nothing about the properties of this generator, but clearly its period must fall far short of the period of a decent 32-bit generator, and so I will not consider it further.
The SANE Random Number Generator
The Toolboxs second random number generator is part of the SANE floating-point arithmetic package:
function RandomX (var x : extended) : extended;
The parameter x is a variable which keeps track of the sequence of random integers (which I called z in the discussion above). It is commonly called the seed for the random number generator. At the beginning of a program, we must initialize this variable to a suitable integer value in the range [1, 231 - 2], and make sure that we preserve its value between calls to RandomX. Whenever we call RandomX, it updates the seed to a new value, and, in addition, returns that value as its result. Note that the seed and result are always integer numbers, even though they are stored in extended-precision floating-point variables.
According to the Apple Numerics Manual, RandomX is in fact a minimal standard random number generator, as described above.
Having to preserve the value of the seed between calls to RandomX is a nuisance if you use RandomX in several different procedures whose scopes do not all overlap. One quick and easy solution is to make the seed a global variable, which, however, offends structured-programming purists like me. A better solution (in MPW or Think Pascal, anyway) is to set up a unit which declares the seed as a global variable in the implementation section. This effectively hides the seed from the rest of your program, and prevents you from trashing it accidentally in some other routine.
Listing 2 shows an MPW Pascal unit, SANERandomNumbers.p, which carries out this idea. The procedure InitRandomSeed initializes the seed to a specified value, and the parameterless function RandomSeed returns its value for inspection. These are the only two ways that another routine can gain access to the seed. The parameterless function RandomReal updates the seed and returns an extended-precision real number, uniformly distributed in the range (0,1). The function RandomInteger updates the seed and returns a longint, uniformly distributed in the range [1, max], where max is specified as a parameter.
Listing 1 shows a simple MPW tool which I used to test this generator, and the other ones to be described in this article. It initializes the seed to 1, displays the first ten random numbers in the sequence, displays the value of the seed after 10,000 iterations, and finally counts how many random real numbers are generated per second. (Actually, it counts for one minute, then divides by 60, to get a reliable average.) The 10,000th seed tests whether the generator has been implemented correctly; the minimal standard generator must produce the value 1043618065. If this value does not appear, either the generator is not the minimal standard one, or else it was not implemented correctly.
Likning this program with SANERandomNumbers gave the following results:
The first ten random numbers are:
0.000007826369259426
0.131537788143166242
0.755605322195033227
0.458650131923449287
0.532767237412169221
0.218959186328090348
0.047044616214486126
0.678864716868318951
0.679296405836612175
0.934692895940827623
After 10000 iterations,
the random seed is 1043618065.
In one second, 1381 random numbers were generated.
The Minimal Standard Generator in Pascal
Having to go through the Toolbox trap dispatcher slows down the SANE random number generator significantly, which is a drawback if we need to generate lots of random numbers quickly. Since we know the algorithm, why not simply implement it directly in Pascal and bypass the trap dispatcher completely? This would seem to be trivial: simply replace calls to RandomX with the statement
seed := (a * seed) mod m;
The problem is that the product a * seed can easily overflow the limits of a 32-bit integer variable, causing incorrect results from that point on. G. L. Schrage came to the rescue in 1979 with an algorithm which splits the seed into its high and low 16-bit halves. Schrages algorithm is implemented in the unit PasRandomNumbers.p, given in Listing 3. It is gratifyingly quicker than the SANE implementation, generating 3049 random reals per second versus 1381, a speedup of 2.2x. (These timings, and all those that follow, were measured on an unmodified Mac SE/30, using the program given in Listing 1.)
Schrages method has two hidden bottlenecks when implemented on a 68000 processor (Mac Plus and SE). The first one involves the operations seed div q and seed mod q, which require dividing a 32-bit integer by another 32-bit integer. On the 68000, the DIVS instruction can only divide a 32-bit quantity by a 16-bit quantity! Therefore a compiler must perform these operations in software. MPW Pascal inserts calls to the routines %I_DIV4 and %I_MOD4, which are located in the library Paslib.o. Similarly, the products a * lo and r * hi must be carried out as 32-bit multiplications, via the routine %I_MUL4 in Paslib.o.
If we tell the compiler to generate 68020/68030 code (in MPW Pascal, by using the -mc68020 option), these operations can each be performed with a single machine instruction. DIVS.L divides two 32-bit numbers to produce a 32-bit quotient, DIVSL.L divides two 32-bit numbers to produce both a 32-bit quotient and a 32-bit remainder, and MULS.L multiplies two 32-bit numbers to produce a 32-bit result. Recompiling PasRandomNumbers.p with the -mc68020 option allows us to produce 4491 random numbers per second, a speed increase of 1.5x over the preceding version.
In generating random real numbers (function RandomReal), there is another bottleneck in the final step, which is a floating-point division operation. If we do not instruct the compiler otherwise, it generates calls to the SANE package for all floating-point operations, which allows the code to run on all Macs, but is rather slow. If we are running on a Mac with a 68020 or 68030 CPU, we probably have a 68881 or 68882 floating-point coprocessor (FPU) available as well. (Exceptions include the Mac LC, and some accelerated SEs and Pluses.) Therefore, if were using the -mc68020 option, we may as well use -mc68881 as well, to tell the compiler to use the FPU for floating-point arithmetic. Recompiling PasRandomNumbers.p yet again, with both options enabled, allows us to produce 15067 random numbers per second, a speed of increase of 3.6x over the previous version. [Note: we must also recompile the main program with the -mc68881 option, because the FPU expects extended variables to be stored in 10 bytes, whereas SANE expects extended variables to be stored in 8 bytes.]
The Minimal Standard Generatorin
Assembly Language
Examining the compilers output for the version with 68020/30 and FPU code, we can (as usual) spot things which could be improved in hand-coded assembly language. In procedure UpdateSeed, there are two things in particular. First, the compiler uses separate operations for seed div q and seed mod q (DIVS.L and TDIVS.L respectively), even though TDIVS.L produces both the quotient and remainder, in separate registers. Second, there is quite a bit of unnecessary shuffling of data between registers.
The obvious solution here is to recode the entire unit in assembly language, which is not a very difficult task. The results are shown in Listing 4. To use it with the existing test program, we follow these steps:
Assemble RandomNumbers.a to produce an object file RandomNumbers.a.o;
Create an interface file, RandomNumbers.p, by removing the implementation section from PasRandomNumbers.p;
Compile Test.p, using the interface file RandomNumbers.p to define the interfaces to the random-number routines. (Actually, I simply recycled Test.p.o from the previous test, since the interface to the random number-routines hadnt changed);
Link Test.p.o with RandomNumbers.a.o.
Results: 18758 random numbers per second, a speed increase of 1.2x over the previous version, or a net increase of 13.6x over the original SANE implementation!
For Fortran Freaks Only
Since many scientific applications are still written in Fortran, I ought to point out that any of the units described above can be easily used in Language Systems Fortran under MPW. Listing 5 shows a Fortran version of the test program given in Listing 1. LS Fortran compiles this to a standalone application rather than a MPW tool, but otherwise the two programs work the same way.
The most important thing to note here is that standard Fortran passes all subroutine arguments by reference, like Pascals var parameters. However, two of our random-number routines (InitRandomSeed and RandomInteger) expect their arguments to be passed by value. LS Fortran has an extension which enables this, in the form of the %VAL function:
CALL INITRANDOMSEED (%VAL(1))
This passes the value 1 directly to the subroutine, rather than creating a temporary variable, storing 1 in it, and passing the address of this temporary variable.
References
Apple Computer, Inc. Inside Macintosh, Volume I. Addison-Wesley, 1985.
Apple Computer, Inc. Apple Numerics Manual, 2nd ed. Addison-Wesley, 1988.
Stephen K. Park and Keith W. Miller. Random Number Generators: Good Ones Are Hard to Find, Communications of the ACM, vol. 31, p. 1192 (October 1988).
Steve Williams. 68030 Assembly Language Reference. Addison-Wesley, 1989.
Listing 1: Test.p
program TestRandomNumbers (input, output);
{ This program tests the random number generator imple-
mented by the unit RandomNumbers. If the generator is
the "minimal standard generator" (multiplicative linear
congruential algorithm with modulus 2147483647 and mul-
tiplier 16807), and the initial seed is 1, the value of
the seed after 10000 iterations should be 1043618065.
If this value is not obtained, then the generator either
is not a "minimal standard" generator, or has failed due
to integer overflow at some point.
Under the Macintosh Programmer's Workshop, this program
can be compiled and run as an MPW Tool. Under other
development systems, it may need some modification.
Jon Bell
Dept. of Physics & Comp. Sci.
Presbyterian College
Clinton SC 29325
CIS #70441,353 }
USES RandomNumbers, Events;
var
k : longint;
x : extended;
stopTime : longint;
begin
{ First verify that the generator is working correctly. }
InitRandomSeed (1);
writeln;
writeln ('The first ten random numbers are:');
writeln;
for k := 1 to 10 do
writeln (RandomReal:20:18);
for k := 11 to 10000 do
x := RandomReal;
writeln;
writeln ('After 10000 iterations,');
writeln ('the random seed is ', RandomSeed:1, '.');
writeln;
{ Now let's find out how fast the generator is. }
k := 0;
stopTime := TickCount + 3600;
while TickCount <= stopTime do
begin
x := RandomReal;
k := k + 1
end;
writeln ('In one second, ', trunc(k/60):1,
' random numbers were generated.');
writeln
end.
Listing 2: SANERandomNumbers.p
UNIT RandomNumbers;
{ This unit provides a convenient interface to the SANE
random number generator. }
INTERFACE
USES Sane;
procedure InitRandomSeed (newSeed : longint);
{ Initializes the random number seed to "newSeed". You
must call this procedure once, at the beginning of your
program, before you use any of the following functions.
As far as randomness is concerned, it makes no difference
what value you use for "newSeed", so long as it isn't
zero. Using different seeds merely gives you different
sequences of "random" numbers. Using the same seed each
time you run the program gives you the same sequence of
"random" numbers each time, which may be useful for
debugging. }
function RandomSeed : longint;
{ Returns the current value of the random number seed. }
function RandomReal : extended;
{ Returns a real number, "randomly" selected from the
interval [0,1]. }
function RandomInteger (max : longint) : longint;
{ Returns an integer, "randomly" selected from the
interval [1,max]. }
IMPLEMENTATION
const
m = 2147483647;
var
seed : extended;
procedure InitRandomSeed (newSeed : longint);
begin
seed := newSeed
end;
function RandomSeed : longint;
begin
RandomSeed := round(seed)
end;
function RandomReal : extended;
begin
RandomReal := RandomX (seed) / m;
end;
function RandomInteger (max : longint) : longint;
var
ignore : extended;
begin
ignore := RandomX (seed);
RandomInteger := (round(seed) mod max) + 1
end;
END.
Listing 3: PasRandomNumbers.p
UNIT RandomNumbers;
{ This unit implements a "minimal standard" random number
generator. }
INTERFACE
procedure InitRandomSeed (newSeed : longint);
function RandomSeed : longint;
function RandomReal : extended;
function RandomInteger (max : longint) : longint;
IMPLEMENTATION
const
a = 16807;
m = 2147483647;
q = 127773; { m div a }
r = 2836; { m mod a }
{ NOTE: Other possible values for these constants, which
may give "better" results, are:
a = 48271, q = 44488, r = 3399 or
a = 69621, q = 30845, r = 23902,
keeping m = 2147483647. }
var
seed : longint;
procedure InitRandomSeed (newSeed : longint);
begin
seed := newSeed
end;
function RandomSeed : longint;
begin
RandomSeed := seed
end;
{private} procedure UpdateSeed;
var
lo, hi, test : longint;
begin
hi := seed div q;
lo := seed mod q;
test := a * lo - r * hi;
if test > 0 then
seed := test
else
seed := test + m
end;
function RandomReal : extended;
begin
UpdateSeed;
RandomReal := seed / m
end;
function RandomInteger (max : longint) : longint;
begin
UpdateSeed;
RandomInteger := (seed mod max) + 1
end;
END.
Listing 4: RandomNumbers.a
MACHINE MC68020
MC68881
;-----------------------------------------------------------
; WARNING: These routines require a 68020 or 68030 CPU,
; and a 68881 or 68882 floating-point coprocessor!!!
;-----------------------------------------------------------
; Numerical constants used in the seed-updating algorithm.
a EQU 16807 ; multiplier
m EQU 2147483647 ; modulus
q EQU 127773 ; m div a
r EQU 2836 ; m mod a
;-----------------------------------------------------------
; The random number seed (a 32-bit integer) is a global
; variable whose value is preserved between calls to
; the random number functions.
Seed DS.L 1
InitRandomSeed PROC EXPORT
;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; Initializes the random number seed to a specified value.
;
; Pascal interface:
; procedure InitRandomSeed (newSeed : longint);
;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
newSeed EQU 8
ParamSize EQU 4
LocalSize EQU 0
LINK A6, #LocalSize
MOVE.L newSeed(A6), seed(A5)
UNLK A6
RTD #ParamSize
DC.B 'INITRANDOMSEED'
RandomSeed FUNC EXPORT
;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; Returns the current value of the random number seed.
;
; Pascal interface:
; function RandomSeed : longint;
;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
result EQU 8
LocalSize EQU 0
LINK A6, #LocalSize
MOVE.L Seed(A5), result(A6)
UNLK A6
RTS
DC.B 'RANDOMSEED'
UpdateSeed PROC
;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; Applies the Lehmer / Lewis-Goodman-Miller / Schrage
; algorithm to update the random number seed. This
; procedure is not to be used outside this unit, therefore
; it is not EXPORTed.
;
; It stores the new seed in the global variable "Seed",
; and leaves a copy in register D1 for use by the calling
; routine.
;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
hi EQU D0
lo EQU D1
MOVE.L Seed(A5), hi
TDIVS.L #q, lo:hi
MULS.L #a, lo
MULS.L #r, hi
SUB.L hi, lo
BGT.S StoreNewSeed
ADD.L #m, lo ; if lo <= 0
StoreNewSeed
MOVE.L lo, Seed(A5)
RTS
DC.B 'UPDATESEED'
RandomReal FUNC EXPORT
;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; Updates the random number seed and returns a real number
; in the range (0,1).
;
; Pascal interface:
; function RandomReal : extended;
;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
quotient EQU FP0
newSeed EQU D1
result EQU 8
LocalSize EQU 0
LINK A6, #LocalSize
JSR UpdateSeed
FMOVE.L newSeed, quotient
FDIV.L #m, quotient
FMOVE.X quotient, ([result,A6])
UNLK A6
RTS
DC.B 'RANDOMREAL'
RandomInteger FUNC EXPORT
;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; Updates the random number seed and returns a longint
; in the range [1,max].
;
; Pascal interface:
; function RandomInteger (max : longint) : longint;
;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
dividend EQU D1
divisor EQU D0
result EQU 8
max EQU 12
LocalSize EQU 0
ParamSize EQU 4
LINK A6, #LocalSize
JSR UpdateSeed
; The new seed is now in "dividend."
TDIVS.L max(A6), divisor:dividend
; "Divisor" now contains the remainder.
ADDQ.L #1, divisor
MOVE.L divisor, result(A6)
UNLK A6
RTD #ParamSize
DC.B 'RANDOMINTEGER'
END
Listing 5: TestF77.f
C-----------------------------------------------------------
C This program demonstrates the use of a hand-coded
C random-number generator in a Language Systems Fortran
C program. It can be linked with any of the following
C object files:
C
C SANERandomNumbers.p.o (SANE's random number generator)
C PasRandomNumbers.p.o (hand-coded in Pascal)
C RandomNumbers.a.o (hand-coded in assembly language)
C-----------------------------------------------------------
!!M Inlines.f
IMPLICIT NONE
INTEGER K, STOPTIME, RANDOMSEED
EXTENDED X, RANDOMREAL
EXTERNAL RANDOMREAL, RANDOMSEED
CALL INITRANDOMSEED (%VAL(1))
PRINT *
PRINT *, 'The first ten random numbers are:'
PRINT *
DO K = 1, 10
PRINT '(1X, F20.18)', RANDOMREAL()
END DO
DO K = 11, 10000
X = RANDOMREAL()
END DO
PRINT *
PRINT *, 'After 10000 iterations,'
PRINT *, 'the random seed is ', RANDOMSEED(), '.'
PRINT *
K = 0
STOPTIME = TICKCOUNT() + 3600
DO WHILE (TICKCOUNT .LE. STOPTIME)
X = RANDOMREAL()
K = K + 1
END DO
PRINT *, 'In one second, ', K/60,
& ' random numbers were generated.'
PRINT *
END