Linear and Inversive Pseudorandom Numbers for Parallel ... - CiteSeerX

0 downloads 0 Views 348KB Size Report
synchronization protocols for decentralized event lists ... Section 2. We discuss the use of linear congruential genera- tors (LCGs) and ... the same quality in the spectral test, see Section 3.1. ... an easy way to achieve large period lengths and im- .... In the following, we assume a setup where the ap- ... handle list1,list2,list3;.
Linear and Inversive Pseudorandom Numbers for Parallel and Distributed Simulation K. Entacher

A. Uhl

S. Wegenkittl

Department of Mathematics RIST++ Department of Mathematics Salzburg University Salzburg University Salzburg University A-5020 Salzburg, Austria A-5020 Salzburg, Austria A-5020 Salzburg, Austria

Abstract

In this work we discuss the use and possible abuse of linear and inversive pseudorandom numbers (PRNs) in parallel and distributed environments. After an investigation of properties of PRNs which determine how these may be applied in such environments we introduce a software package which provides an uni ed and easy-to-use approach to the generating and handling of parallel streams of such PRNs. Experimental results are conducted which describe the features of the software package and compare the performance of two selected types of pseudorandom number generators.

1 Introduction

Parallel and distributed simulation of discrete event systems has received signi cant attention since the proliferation of massively parallel and distributed computing platforms [17]. Besides event processing, state update, statistics collection, and many more tasks, random number generation is an important element of every simulation experiment. Whereas the main objective in the parallel and distributed simulation research is the investigation of synchronization protocols for decentralized event lists (e.g. conservative vs. optimistic approaches), our work focuses on dangerous e ects arising from straightforward use of parallelized PRNs and on how to avoid resulting pitfalls. It should be noted that highly correlated and statistically dependent PRNs originating from bad parallelization or distribution strategies may destroy or dramatically forge simulation results, thereby making advantages of certain synchronization protocols over others completely obsolete. In accordance with di erent levels of parallelism in parallel and distributed simulation (application level, subroutine level, component level, and event level) the parallel and distributed generation of PRNs may be achieved using di erent strategies as well (Section 2).

Therefore attention has to be paid on how to obtain parallel streams of PRNs. Since the mid of the eighties a large amount of work has been done concerning this topic (for an overview see e.g. [4, 5, 10, 26, 28] and Section 2. We discuss the use of linear congruential generators (LCGs) and explicit inversive congruential generators (EICGs) in parallel and distributed environments. Splitting LCGs (which are still the most commonly used generators) randomly and without prior investigations may lead to highly correlated parallel streams of PRNs (\long term correlations" [6, 9] or \bad subsequences" [15, 16]). On the other hand EICGs have recently been proposed for the use in parallel environments [14] due to their highly uncorrelated and equidistributed parallel streams if splitting techniques are applied [11, 31] (for more details see Section 3). Moreover, sequences of inversive PRNs exhibit remarkably good theoretical [13] and empirical [24] properties. A drawback of inversive methods is their bad eciency due to the high cost of modular inversion. We introduce an ecient software package (Section 4) which provides an uni ed and easy-to-use approach to generating and handling of parallel streams of LCGs and EICGs. Our package is built on top of the particular fast and accurate pseudorandom number generator (PRNG) library available at [20]. The concept of \task lists" allows to use this package for any kind of parallel architecture in a very exible way. We show the application of this package for a parallel Monte Carlo integration problem (Section 5). The choice of this speci c application has three reasons: rst, we have shown dramatic defects even for this simple application employing bad strategies for parallel PRN generation in a recent work [16]. Second, the overall computing time is dominated by the time needed for generating the PRNs (this is important,

since we want to compare the eciency of di erent PRNG types). Third (and most important), numerical integration itself is a statistical test for the independence and uniformity of PRNs by employing a T-test onto a sample of the integration errors. Therefore the (qualitative) results achieved with this application do carry over to all types of simulation problems and are not restricted to this speci c application. The use of the software package in an arbitrary simulation protocol is obvious and is of course not restricted to numerical integration. The ecient use of the task list concept is demonstrated for static and dynamic load balancing on a network of workstations (NOW) and for implementations on a multicomputer (distributed memory MIMD - Meiko CS-2). Concerning eciency of the di erent types of pseudorandom number generators we compare execution times of the Monte Carlo integration for LCGs and EICGs.

2 Parallel Pseudorandom Number Generation

Each task in a parallel simulation or Monte Carlo computation requires its own source of PRNs, so one has to pay attention that no subset of PRNs is used more than once. This may be accomplished in one of the following ways: one generator is used for all tasks or each task has an independent generator of its own or each task uses separately initialized and disjoint portions of the output stream of a single generator. Whereas the rst approach obviously causes prohibitive communication overhead, the second su ers from intrinsically bad scalability (note that this approach requires thousands of di erent high-quality PRNGs on massively parallel architectures like the ASCI Option Red or ASCI Option Blue systems), and from hidden correlations among the single generators [26]. As a simple example consider the following three LCGs: Let a1 = 25784034667741, a2 = 60968406756573, and a3 = 55151000561141, and denote the multiplicative LCG with modulus m = 248 and multiplier ai by LCGi . LCG3 is one of the top ve LCGs proposed by Fishman [18] (see also Section 3.1 where this LCG is denoted by FISH). The multipliers a1 and a3 are also proposed in [28] for use in parallel environments. All three LCGs exhibit almost the same quality in the spectral test, see Section 3.1. Now consider two and three dimensional vectors xn := (xn ; xn ) and xn := (xn ; xn ; xn ), such that the i'th component of each vector xn and xn equals the n'th pseudorandom number x(ni) of LCGi , i = 1; 2; 3. The strong correlations between parallel (2)

(1)

(2)

(3)

(1)

(2)

(3)

(2)

(3)

streams from these LCGs can be visualized very easily using scatter plots of the vectors. In Figure 1 we show 212 vectors xn and xn . Due to the correlations all these points lie on extremely few lines and hyperplanes, respectively. (2)

(3)

0 0 1

1

1 1

0

0

1

Figure 1: Correlations between parallel streams. Therefore most of the research work in this area is concentrated on the third approach, i.e. using separately initialized and disjoint portions of the output stream of a single generator. Four important classes of PRNGs have been investigated so far for their applicability in distributed environments: 1. Linear congruential generators, e.g. [6, 23, 28]. 2. (Generalized) Feedback shift-register generators, e.g. [3, 25, 28] and Lagged-Fibonacci generators, e.g. [1, 2, 27, 28]. 3. Inversive congruential generators, e.g. [14, 16, 31]. 4. Parallel combined generators, e.g. [8, 22, 29]. Two basic methods for splitting a given stream of PRNs into suitable parallel streams use either consecutive blocks or subsequences (often called \the leap-frog technique") and will be considered in detail in the next section for LCGs and EICGs.

3 Obtaining Parallel Streams

We denote the linear congruential generator with recursion yn+1  ayn + b (mod m) and seed y0 by LCG(m; a; b; y0), and the explicit inversive congruential generator yn  an + b (mod m) with modulus m, parameters a and b and seed n0 by EICG(m; a; b; n0 ) (the horizontal bar denotes inversion modulo m). Implementations of these generators can be found at [20]. Normalized PRNs in [0; 1) are obtained by putting xn := yn =m for both generators. Combining di erent streams (xn )n0 , 1  j  r, of pseudorandom number generators into a new (j)

stream (xn )n0 , xn  xn + : : : + xn (mod 1) yields an easy way to achieve large period lengths and improved statistical performance, while keeping the computational costs of generating the numbers low by choosing small and relatively prime moduli for each underlying generator. We therefore shall consider as well combined EICGs, abbreviated cEICGs. (1)

(r)

3.1 Splitting LCGs

By varying the initial seed y0 one can split a sequence of a LCGs into consecutive blocks of a given length L. An extensive discussion of this method for the linear congruential generator with respect to parallelization is given in [6, 7, 9]. It turns out that only small fractions of sequences produced by a LCG can be used safely because of the well known long-range correlations which may cause unwanted correlations among the parallel streams. Another way to partition a sequence (xn )n?=01 of a LCG with period  is to generate subsequences of the form (xkn+i )n0 ; k  2; 0  i  k ? 1 :

(1)

Consider a full-period LCG(m; a; b; y0). For this generator, the subsequence (1) is in the case a 6= 1 also produced by LCG(m; ak (mod m); b ; y0 ) with suitable parameters b , y0 (see [15]). Note that the period k of the subsequence equals k = =gcd(k; ). This way of generating parallel streams of PRNs is often denoted \leap-frog technique" [5, 10, 19]. Even if one chooses subsequences with maximal period k = , these sequences can be of inferior quality. This can be shown e.g. by the spectral test which measures the coarseness of the lattice structure of the overlapping s-tuples (xn ; : : : ; xn+s?1 ), 0  n < , produced by maximal period LCGs. Consider one of the top performing generators in the exhaustive analysis of multiplicative congruential random number generators conducted by Fishman [18]: FISH := LCG(248 , 55151000561141, 0, 1). The maximal period subsequence of FISH with k = 23 and i = 0 is produced by the generator FISH23 := LCG(248 ; 123662693302269; 0; 1) . The normalized spectral test values (values near 1 imply a \good" i.e. ne lattice structure) for both generators in dimensions s = 2; : : : ; 8 are given in the table below (including the values for the LCGs used in the example above). In almost any dimension, the performance of FISH23 in the spectral test is signi cantly worse than RANDU's (RANDU is a well known poor LCG [18]). (k)

(k)

(i)

(i)

LCG n s RANDU FISH23 FISH LCG1 LCG2

2 0.9 0.3 0.9 0.9 0.7

3 0.01 0.06 0.8 0.8 0.8

4 0.06 0.01 0.9 0.8 0.8

5 0.2 0.05 0.8 0.8 0.8

6 0.3 0.1 0.8 0.8 0.5

7 0.4 0.2 0.6 0.7 0.6

8 0.6 0.2 0.4 0.7 0.5

In a recent work [16] we have demonstrated that the use of such bad-quality subsequences in Monte Carlo integration leads to disastrous results. Figure 2 presents two sample results arising from the integration of a simple polynomial test function in dimensions s = 4 by Monte Carlo Integration (MCI) as described in Section 5.1. The node sets for the MCI have been constructed by non-overlapping s-tuples from FISH and FISH23. Each integration has been performed 64 times and the absolute errors have been analyzed by applying a T -test with 63 degrees of freedom. The x-axis denotes the dual logarithm of the sample size n. We varied n between 220 and 227 . The shaded area in the plots represents the corresponding 99% con dence interval for the true value of the integral which equals 0 (horizontal line). If this true value lies outside the shaded area, the generator is rejected at the 1% level of signi cance. This happens for FISH23 but not for FISH itself in clear accordance with the spectral test results in dimensions 4. In [16] we also used a more moderate type of parallelization and gave further examples for the coherence of spectral test and empirical tests built on the notion of MCI. For further examples of well known LCGs (including the Cray system generator) which produce low quality subsequences see [15].

3.2 Splitting EICGs and cEICGs

Theoretical correlation analysis for EICGs predicts a good stability with respect to splitting which is due to the following properties of parallel streams of explicit inversive pseudorandom numbers which we cite from [31]. Let Zp be the set f0; 1; : : :; p ? 1g. Fix a large prime p and choose a1 ; : : : ; as 2 Zp := Zp n f0g and b1 ; : : : ; bs 2 Zp , where s is the desired number of parallel streams. Put

yn(i)  ai n + bi (mod p); n  0; and x(ni) = yn(i) =p 2 [0; 1): (2) This de nes s explicit inversive congruential generators EICG(p; ai ; bi ; 0); 1  i  s. If a1 b1 , : : :, as bs are mutually distinct elements of Zp , these parallel streams o er highly uncorrelated and equidistributed PRNs. This property will provide an easy way to assign an EICG to each processor.

actually a cEICG, its quality is theoretically assessed by discrepancy estimates concerning parts of the period, see [12, Cor. 4]. We have demonstrated the stability of EICGs in a sample parallel Monte Carlo study in [16]. In sharp contrast to LCGs, EICGs and cEICGs require no tuning of the splitting parameters to the parameters of the generator itself. They thus provide a safe solution in parallel environments.

0

20

21

22

23

24

25

FISH: s = 4

26

27

0

4 A Software Package for Generating PRN for Parallel Architectures

Based on the pseudorandom number generator library of the pLab group [20], we developed a software tool for the application of split generators in parallel environments. The software consists of two main parts, which are described in Sections 4.1 and 4.2 below.

4.1 The PRNG Library

20

21

22

23

24

25

FISH23: s = 4

26

27

Figure 2: Monte Carlo integration results for FISH and the its full-period subsequence with step size k = 23, FISH23 in dimension s = 4 . Consecutive blocks of length L of an EICG are obtained by using the seeds fn0; n0 + L; n0 + 2L; n0 + 3L; : : :g. Note, that the same partition can be obtained by a xed seed n0 and di erent parameters b in fb; aL + b; 2aL + b; 3aL + b; : : :g. Subsequences (1) of EICG(m; a; b; n0 ) are produced by the generators EICG(m; ka (mod m), a(n0 + i) + b (mod m); 0). These EICGs ful ll the above conditions. According to [14, 31] the subsequences are expected to show excellent equidistribution properties. These splitting properties of the EICG with respect to subsequences carry over to the combination of the generators EICG(pj ; aj ; bj ; 0) with distinct primes pj and aj , bj 2 Zp , 1  j  r. In order to split the combined generator into k  2 subsequences, simply split each component: the i-th subsequence (0  i  k ? 1) is obtained by combining the generators EICG(pj ; kaj (mod pj ); iaj + bj (mod pj ); 0); where j ranges from 1 to r. Since every subsequence is j

We integrated the two splitting methods \sub" and \con" into the aforementioned library of PRNGs. This library provides very fast and portable C implementations of various PRNGs and features simple stringbased PRNG initialization. In order to allocate the explicit inversive generator EICG(248 +21; 1; 0; 0), the following lines of code have to be included in an application: Example #include "prng.h" void main() { struct prng *g; g=prng_new("eicg(281474976710677,1,0,0)"); }

Any generator gen of the package may be split into k parallel streams by using the method sub(gen; k; i), where i 2 f0; 1; : : :; k ? 1g selects the i'th out of k substreams. The j 'th consecutive block of length L is initialized by con(gen; L; j ), j 2 f0; 1; : : :g. In the following example, s1 and s2 implement two substreams of g, whereas c yields the third consecutive block of length 8096 of g: Example

#include "prng.h" void main() { struct prng *s1, *s2, *c; s1=prng_new("sub(eicg(281474976710677,1,0,0),5,1)"); s2=prng_new("sub(eicg(281474976710677,1,0,0),5,2)"); c =prng_new("con(eicg(281474976710677,1,0,0),8096,2)"); }

At initialization, LCGs and EICGs determine the fast implementation of subsequences or consecutive

blocks by simply changing the parameters as described in the previous section. In the case of combined generators, the splitting is propagated to each component.

4.2 Task List Mechanism

In the following, we assume a setup where the application decides to make at most n calls to a certain PRNG gen during a simulation or integration procedure. We call the tuple (n; gen) a \task". We use so-called \task lists" in order to implement splitting in a exible and reproducible way. At startup, the application initializes a task list which contains only one task specifying the \mother"-generator and the maximum number of calls to this generator. Example #include "task.h" #define maxcall 1048576 /* 2^20 */ void main() { handle list1; void task_list_init("eicg(...)",maxcall,&list1)); }

Task list list1 thus consists of a single task which denotes an EICG which will be called at most maxcall times. Any existing task list may be split by applying one of the two splitting methods \sub" and \con". For method \sub", a parameter k is required which denotes the number of substreams that should be generated. Every task T in the existing task list is then split into k substreams in the following way: denote by T:gen the string representation of the generator used in task T and by T:maxcalls the maximum number of calls to this generator. Applying \sub" yields tasks Ti , 0  i  k ? 1, where Ti :gen is set to \sub(T:gen; k; i)" and Ti :maxcalls becomes b(T:maxcalls)=kc. For method \con" we proceed as follows: Fix a number of blocks q. For each task T in the given task list generate q tasks Ti such that Ti :maxcalls becomes b(T:maxcalls)=qc and Ti :gen is set to \con(T:gen; Ti:maxcalls; i)", i.e. the length of the single consecutive blocks is set to the maximum number of calls to that block. If T.maxcalls is not divisible by q then the union of the blocks will not cover the whole range of the original sequence, thus it is advisable to use e.g. powers of two for both T.maxcalls and for the splitting parameter. This concept can be easily used for both static and dynamic load distribution in a parallel system. In the rst case, the procedure is to split the \mother"-task into a xed set of subtasks, distribute them to the processing elements (PEs) and then wait for each to report the results back. For more sophisticated applications, dynamic load balancing can be implemented

by recursive splitting the remaining tasks when computing resources become available. The concept of the task list and keeping count of the maxcalls parameter guarantees a deterministic and reproducible use of the pseudorandom numbers. Example

In the following sample code, task list list1 is rst split into 5 substreams yielding list2 with 5 entries. In the last line, we split list2 into 10 consecutive blocks. The resulting list3 thus has 50 entries, respectively. Each entry has a maximum number of calls equal to 20971. #include "task.h" #define maxcall 1048576 /* 2^20 */ void main() { handle list1,list2,list3; void task_list_init("eicg(...)",maxcall,&list1)); void task_list_init_sub(5,list1,&list2); void task_list_init_con(10,list2,&list3); }

The mechanism is easy to use since the single tasks in such a list will never use overlapping fractions of the period of the mother generator provided that the initial maximum number of calls is set at most to the period length of the generator.

5 Experimental Results

5.1 Parallel Monte Carlo Integration

Based on the generator library and the task list mechanism we have built a sample application which implements a simple Monte Carlo integration. One reason (in addition to those given in Section 1) for choosing this application is the fact that the overall computing time is dominated by the time needed for generating the PRNs. This clearly exhibits the differences in execution speed of the di erent generator types. For other applications these di erences may not even be noticeable. We consider the problem of numerical integration in dimension s  1. Denote by n 2 N a sample size and choose a test function f : I s ! R, where I = [0; 1] is the unit intervall. Put

(f; ; n) :=

Z Is



n X f ( )d ? n1 f ( i ) i=1

the integration error arising from Monte Carlo Integration with the sequence ( i ), i = 1; 2; : : :; n, i 2 I s . The selection of the test function f is a very crucial point in any attempt of rating pseudorandom number generators as it is in the case of rating low discrepancy sets with Quasi-Monte Carlo integration [30]. Among others, we consider the following test function (see [32]

for a discussion of the relevance of this function within Monte Carlo integration). Put g(x; a) := j4x 1?+2ja+ a ; Q and set f (Rx1 ; x2 ; : : : ; xs ) = si=1 g(xi ; i2 ) ? 1. For this numerically stable function, I f = 0 which permits Pn 1 calculation of the value n i=1 f ( i ) up to sample sizes n  220 and dimensions s  300. In the following section we will not consider the integration error itself since we restrict our attention to performance issues. See the examples in Section 3.1 and [16] for details on the quality of approximation in terms of (f; ; n). s

5.2 Experimental Settings

We use the parallel programming library PVM (parallel virtual machine [33]) for message passing in order to implement the host/node programming paradigm. A host-process is responsible for the generation and distribution of the task list and corresponding computations among several node-processes. The node-processes do their assigned calculations and send the result back to the host-process. As hardwareplatforms we use a Meiko CS-2 multicomputer and a NOW (a FDDI interconnected cluster consisting of 8 DEC AXP 3000/400 workstations). In the case of static and uniform load distribution the host process simply creates a task list consisting of equal-sized tasks (one for each (PE)) by applying either the \sub" or the \con" technique. Since the complexity of each task is completely deterministic there is no need for further load-balancing if the PEs are used exclusively. A SIMD implementation would use the same technique (of course without a host process). The second example uses a simple dynamic load balancing technique in order to be able to react to di erent load situations on the machines or to changes of the load on the machines caused by other users. The host process creates a task list consisting of (in this concrete case) 200 tasks which are dynamically distributed according to the asynchronous single task pool method [21, p.198]. For showing the strength of the dynamic load balancing technique compared to a static load distribution we produce an arti cial load environment on the NOW with one machine running load 3.0, three machines running load 1.0 and four idle machines. The theoretically highest achievable speedup is 2 (as compared to an idle machine) when using static and uniform load distribution (which is determined by the machine running initially with load 3.0 and processing 12.5% of the tasks) and is 5.75 when using dynamic

load balancing (if we neglect communication cost and small load-imbalances caused by the discrete nature of the task pool). The comparison of the execution speed of di erent PRN generators is performed using the dynamic load balancing technique described above in a uniform load environment (no load on any node) on a heterogeneous NOW (a DEC AXP 3000/700 and a DEC AXP Station 600 5/333 have been added to the cluster). The Monte Carlo integration is performed in dimension s = 256 using 211 up to 220 integration nodes, and we compare the execution times using the following PRNGs:    

MINSTD := (2147483647 16807 0 1) EICG1 := (2147483647 165662684 0 0) EICG2 := (2147483647 1 0 0) cEICG de nes a combination of (65413 1 0 0) and (65419 1 0 0). LC G

;

EI C G

;

EI C G

;

;

;

;

;

;

;

EI C G

EI C G

;

;

;

;

;

;

5.3 Results

Figure 3 shows a linear speedup for static load distribution on the Meiko even though the problem size was chosen to be rather small, e.g. 9 seconds execution time using 50 PEs. Figure 4 shows the results of a comparison between splitting techniques \sub" and \con" on the NOW (no load, static load distribution, EICG1). There \con" shows a slightly better performance although \sub" scales linearly as well. This small di erence is due to di erent parameter values of the \new" EICGs produced by the splitting techniques which show slightly di erent execution times (see also Figure 5). Speedup

40

30

20

10

0

5 nodes

10 nodes

20 nodes

30 nodes

40 nodes

50 nodes

Figure 3: Speedup on the Meiko CS-2. Concerning the eciency of the suggested load balancing strategy in the arti cial load environment we achieve speedup of 1.98 with the static load balancing versus 5.62 with the dynamic load balancing scheme. This is an excellent value (if we consider the theoretical optimum of 5.75) which is achieved in spite of the

slower. cEICGs o er higher speed in comparison to simple EICGs with the same period if the required overall period length exceeds the word size of the used CPU, however.

Speedup

6

6 Conclusion

4

2

0

2 nodes

4 nodes

6 nodes

8 nodes

Figure 4: Speedup on the homogeneous NOW with splitting techniques \sub" (light) and \con" (dark). communication overhead caused by 200 tasks. This shows that for Monte Carlo integration more sophisticated methods are not necessary (although they are possible with our approach).

In this paper we have presented a software package which provides a exible, uni ed, and easy-to-use approach for the generating and handling of parallel streams of linear congruential and (combined) explicit inversive congruential PRNs. Experimental results performing a Monte Carlo integration con rm the eciency of the approach. Although theoretically well suited for applications in parallel and distributed environments, the higher computational cost of EICGs is still prohibitive. Nevertheless, the advantage of stability with respect to splitting techniques suggests the use of these generators in applications which are not dominated by the time needed for PRN generation and in applications where there is no possibility to select suitable LCGs with proper parameters in a reasonable time.

Acknowledgements

Seconds

We thank Otmar Lendl who is the principal author of the PRNG library. Moreover we thank the Vienna Center for Parallel Computing for providing access (and support) to its Meiko CS-2. This work was partially supported by the Austrian Science Fund FWF, project no. P11143-MAT led by P. Hellekalek.

MINSTD

250

EICG1 200

EICG2 cEICG

150

References

100

50

12

14

16

18

20

Samplesize

Figure 5: Timings of di erent PRN generators, x-axis shows log2 (samplesize). Figure 5 compares execution times of the Monte Carlo integration using di erent PRNGs. EICGs are clearly slower than LCGs (this is due to the inversion operation needed in the former { if we consider the cost of multiplication modulo m to be the cost unit, then inversion modulo m is O(log2 m) [26]). EICGs with a small parameter a are slightly faster (EICG2 versus EICG1) which is the reason for the performance difference between splitting techniques \sub" and \con" exhibited in Figure 4b. The cEICG (which has about the same period as EICG1 and EICG2) is only a bit

[1] S. Aluru. Parallel additive lagged Fibonacci random number generators. In Proceedings of the International Conference on Supercomputing 1996, pages 102{108, 1996. [2] S. Aluru. Lagged Fibonacci random number generators for distributed memory parallel computers. Journal of Parallel and Distributed Computing, 45:1{12, 1997. [3] S. Aluru, Prabhu G.M., and J. Gustafson. A random number generator for parallel computers. Parallel Computing, 18:839{847, 1992. [4] S.L. Anderson. Random number generators on vector supercomputers and other advanced architectures. SIAM Rev., 32:221{251, 1990. [5] P. Coddington. Random Number Generators for Parallel Computers. NHSE Review, Second Issue, Northeast Parallel Architectures Center, 1996. Available at: http://nhse.cs.rice.edu/NHSEreview/RNG/. [6] A. De Matteis and S. Pagnutti. Parallelization of random number generators and long-range correlations. Numer. Math., 53:595{608, 1988.

[7] A. De Matteis and S. Pagnutti. Long-range correlations in linear and non-linear random number generators. Parallel Computing, 14:207{210, 1990. [8] A. De Matteis and S. Pagnutti. Long-range correlation analysis of the Wichmann-Hill random number generator. Statistics and Computing, 3:67{70, 1993. [9] A. De Matteis and S. Pagnutti. Controlling correlations in parallel Monte Carlo. Parallel Computing, 21:73{84, 1995. [10] W.F. Eddy. Random number generators for parallel processors. J. Comp. Appl. Math., 31:63{71, 1990. [11] J. Eichenauer-Herrmann. Statistical independence of a new class of inversive congruential pseudorandom numbers. Math. Comp., 60:375{384, 1993. [12] J. Eichenauer-Herrmann. A uni ed approach to the analysis of compound pseudorandom numbers. Finite Fields and their Appl., 1:102{114, 1995. [13] J. Eichenauer-Herrmann, E. Herrmann, and S. Wegenkittl. A survey of quadratic and inversive congruential pseudorandom numbers. In P. Hellekalek, G. Larcher, H. Niederreiter, and P. Zinterhof, editors, Proceedings of the MC and QMC, Salzburg 1996, Lecture Notes in Statistics, pages 66{97, New York, 1997. Springer. [14] J. Eichenauer-Herrmann and H. Niederreiter. Parallel streams of nonlinear congruential pseudorandom numbers. Finite Fields and their Applications, 3:219{ 233, 1997. [15] K. Entacher. Bad subsequences of well-known linear congruential pseudorandom number generators. ACM TOMACS, 8(1), 1998. To appear. [16] K. Entacher, O. Lendl, A. Uhl, and S. Wegenkittl. Analyzing streams of pseudorandom numbers for parallel Monte Carlo integration. In R. Wyrzykowski, H. Piech, B. Mochnacki, M. Vajtersic, and P. Zinterhof, editors, Proceedings of the International Workshop on Parallel Numerics (Parnum'97), pages 59{71, Zakopane, Poland, September 1997. [17] A. Ferscha. Parallel and distributed simulation of discrete event systems. In A.Y.H. Zomaya, editor, Parallel and Distributed Computing Handbook, pages 1003{1041. McGraw-Hill, 1996. [18] G.S. Fishman. Monte Carlo: Concepts, Algorithms, and Applications, volume 1 of Springer Series in Operations Research. Springer, New York, 1996. [19] G. Fox et al. Solving problems on concurrent processors, vol.1. Prentice-Hall, 1988. [20] P. Hellekalek, T. Auer, K. Entacher, H. Leeb, O. Lendl, and S. Wegenkittl. The plab www-server. http://random.mat.sbg.ac.at. Also accessible via ftp.

[21] A.R. Krommer and C.W. U berhuber. Numerical Integration on Advanced Computer Systems, volume 848 of Lecture Notes in Computer Science. Springer Verlag, 1994. [22] P. L'Ecuyer and T.H. Andres. A random number generator based on the combination of four LCGs. Mathematics and Computers in Simulation, 44:99{ 107, 1997. [23] P. L'Ecuyer and S. C^ote. Implementing a Random Number Package with Splitting Facilities. ACM Transactions on Mathematical Software, 17(1):98{ 111, 1991. [24] H. Leeb and S. Wegenkittl. Inversive and linear congruential pseudorandom number generators in selected empirical tests. ACM TOMACS, 7(2):272{286, 1997. [25] J. Makino and Miyamura O. Parallelized feedback shift register generators of pseudorandom numbers. Parallel Computing, 21:1015{1028, 1995. [26] M. Mascagni. Some methods of parallel pseudorandom number generation. In R. Schreiber, M. Heath, and A. Ranade, editors, Proceedings of the IMA Workshop on Algorithms for Parallel Processing. Springer-Verlag, 1997. To appear. [27] M. Mascagni, S. A. Cuccaro, D. V. Pryor, and M. L. Robinson. A fast, high quality, and reproducible parallel lagged-Fibonacci pseudorandom number generator. Journal of Computational Physics, 119:211{219, 1995. [28] N. Masuda and F. Zimmerman. PRNGlib: A Parallel Random Number Generator Library. Technical report, Swiss Center for Scienti c Computing, 1996. Available at http://www.cscs.ch/Official/ Publications.html. [29] N.M. McLaren. The Generation of Multiple Independent Sequences of Pseudorandom Numbers. Appl. Statist., 38:351{359, 1989. [30] H. Niederreiter. Random Number Generation and Quasi-Monte Carlo Methods. SIAM, Philadelphia, USA, 1992. [31] H. Niederreiter. New developments in uniform pseudorandom number and vector generation. In H. Niederreiter and P.J.-S. Shiue, editors, Monte Carlo and Quasi Monte Carlo Methods in Scienti c Computing, volume 106 of Lecture Notes in Statistics, pages 87{120. Springer, 1995. [32] I. Radovic, I.M. Sobol, and R.F. Tichy. Quasi-Monte Carlo Methods for Numerical Integration: Comparison of Di erent Low Discrepancy Sequences. Monte Carlo Methods and Appl., 2(1):1{14, 1996. [33] V.S. Sunderam, G.A. Geist, J. Dongarra, and R. Manchek. The PVM concurrent computing system: evolution, experiences, and trends. Parallel Computing, 20:531{545, 1994.