Lazy Codes For Scientific Computations

0 downloads 0 Views 205KB Size Report
mathematical identities and equations into effective algorithms. ... Functional programs rely strongly on such data structures as lists or trees, which are specified ...
Lazy Codes For Scientific Computations Jerzy Karczmarczuk,



Abstract The lazy evaluation protocol defers the evaluation of a function arguments until the moment, when the function uses them. If such a function constructs composite data structures, it is possible to manipulate “infinite”, recursively open objects. We use these techniques to manipulate power series, to construct Pad´e approximants, to implement the differentiation of programs, etc. in a compact and readable way. Laziness permits to transform some intricate mathematical identities and equations into effective algorithms. We suggest that the lazy languages may be used as powerful algorithmization tools, often avoiding the blind usage of symbolic manipulation packages.

1

Introduction

We discuss a coding methodology based on the deferred evaluation protocol in functional programming, which deserves to be better known by the community which implements intricate algorithms in physics, engineering, etc. The lazy programs may be less efficient than well optimized numerical code, but in several circumstances the presented techniques economizes the human time, and can be used to treat directly some complex mathematical objects, avoiding thus the usage of Computer Algebra packages, when those are used only for the generation of numerical programs.

1.1

Functional lazy style

Functional programs rely strongly on such data structures as lists or trees, which are specified recursively. Lists, e.g., [a, b, c, . . .] may represent coefficients of power series, convergents of an iterative process, etc. Functional languages are often considered too slow, but their semantic power becomes visible, when the user exploits such possibilities as laziness, and higher-order functions, e.g., the function map which takes as arguments a list and any function f of one argument, and which constructs the transformed list [f (a), f (b), f (c), . . .]. Such manipulations replace loops. The examples presented here will be coded in Haskell [1]. It is a strongly-typed language, with simple and compact syntax. Users can define their own data structures and new operators, and overload all functions, which permits to define arithmetic operations on any data types. The colon denotes a right-associative infix operator which adds an element in front of a list, i.e., a list [a, b, c] may be constructed as a : b : c : []. The function map is defined as map f (x:xq) = f x : map f xq map f [] = [] where we note the following particularities of Haskell: no redundant parentheses; f (x, y, z) is written as f x y z, and automatic destructuration of pattern arguments: the argument (x:xq) is a list whose head is x, and tail – xq. The lazy, or non-strict semantics, consists in delaying the evaluation of all expressions in a program until the moment when the executing module needs their values. If a user-defined procedure f (x) in a given context does not use the value of its parameter x, the call f (1/0) will not generate any error. The argument expression is compiled into a thunk – an anonymous procedure whose call computes the final value when this argument is requested. If a ∗ University

of Caen, France, [email protected]

This space left blank for copyright notice.

1

Lazy Codes

2

procedure is a constructor which builds an object, it may implement some “infinite” data structures. A stream of natural numbers: [0, 1, 2, 3 . . .] may be coded as integs = intsFrom 0 where

intsFrom n = n : intsFrom (n+1)

The keyword where prefixes local definitions, such as intsFrom, which is an “open” function. We call such objects co-recursive. The user should not print them, they must be consumed incrementally. The function map works correctly on co-recursive lists, and its second clause is never used. An even simpler construct is a cyclic list, e.g.: ones = 1 : ones. The list integs may be defined through another higher-order functional (zipWith binop list1 list2) which constructs a list composed of elements of two lists combined with binop: zipWith op (x:xq) (y:yq) = (op x y) : zipWith op xq yq integs = 0 : zipWith (+) ones integs This is an example of co-recursively defined data, and not a recursive function. The semantics of the definition above is equivalent to what we could have written in a loop: integs[n]=1+integs[n-1]. The lazy protocol permits thus to write fixed-point equations: p = f (p), which in the domain of lazy streams become effective algorithms. This is the principal message of this talk. The characteristic property of lazy programs is the separation between the creation of data, and their consumption. The generator produce an infinite stream, and the code is short, because the stopping clauses are absent. No truncations of the expansion order, etc. The terms which are not needed are not instantiated. This does not comes for free, the deferred thunks occupy the memory, and if the program is too complex, the lazy protocol may result the memory overflow. If needed, we may always copy and use a finite prefix of the stream. One typical usage of streams is the functional approach to iterative equation solving. We begin with x = x0 , and we construct x1 = f (x0 ), . . . , xn+1 = f (xn ), . . ., but instead of making a loop, we construct a stream (iterate f x0), where iterate f x = x : iterate f (f x). This stream is a piece of data which may be transported, and in some other place of the program we check, e.g., that xn and xn+1 obey some convergence condition b(xn , xn+1 ), whereupon the last computed value is returned. We need to code verify (x : y : xq) | p x y = y | otherwise = verify (y : xq) The syntactic construct “|” is an alternative to the cascading if-then-else conditionals. The presentation of other examples will be a little simplified, we omit the numeric conversions which sometimes are not automatic, and also some other syntactic details of Haskell needed for the construction of user-defined operators and their overloading. Instead of plain, “computer-like” listings we adopted a more mathematical notation for the defined procedures. They are presented like equations with special symbols, etc., but this is a decoration, our definitions are fully implementable.

2

Lazy Power Series

A lazy list [u0 , u1 , u2 , . . .] may represent an univariate power series u(x) = u0 + u1 x + u2 x2 + . . ., where the variable x is implicit, not used by the algorithms presented below. These are known, see e.g., Knuth [2], but our lazy implementation is much shorter. The argument pattern used henceforth is u0 : u), where u represent the series tail: u = u1 + xu2 + · · ·. A slightly different presentation of the lazy manipulation of series may be found in the paper [3].

2.1

Power series algebra

Haskell permits the overloading of operators for any user-defined data. We can thus define the addition of series u and v through zipWith (+) u v, the multiplication by a scalar c through map (c *) u, etc. The expression (c *) denotes a partial application, an anonymous functional object which takes one argument, and multiplies it by c. Unfolding maps and zips yields c · (u0 : u) = (c u0 : c · u), and (u0 : u) + (v0 : v) = (u0 + v0 : u + v). Multiplication, and other operations can also be defined by one-line procedures. (1) (2)

(u0 : u) × (v0 : v) (u0 : u)/(v0 : v)

= =

(u0 v0 : u0 · v + u × v) (w0 : (u − w0 · v)/v) where w0 = u0 /v0

Series can be easily (formally) differentiated or integrated term by term over the implicit variable x. (3)

df ( : u) = zipWith (×) u [1..];

integ c u = c : zipWith (/) u [1 ..]

Lazy Codes

3

where [1 ..] denotes in Haskell [1, 2, 3, . . .], and c is an integration constant. Such functions as the power w = uα , or the exponential g = eu , fulfil the identities: w0 = αw/u, and g 0 = g × u0 . We get α

(4)

(u@(u0 : )) = w where w = integ (uα 0 ) (α · w × df u/u) exp (u@(u0 : ) = w where w = integ (exp u0 ) (w × df u) log (u@(u0 : ) = integ (log u0 ) (df u/u) sincos u@(u0 : ) = (s, c) where p = df u; s = integ (sin u0 ) (p × c); c = integ (cos u0 ) (−p × s)

The form: a@p denotes an argument whose name is a, and whose structure is specified by the pattern p. The trigonometric functions are defined together for the sake of efficiency. In Haskell the form (a, b) defines an anonymous record containing two fields of any type. The integration permits a “co-recursive bootstrap” of such functions as the exponential: w is defined by itself.

2.2

Composition and reversion

If the series v(x) is free from the term v0 , the composition u(v(x)) = u0 + x(v1 + xv2 + . . .)u1 + . . . may be computed by a co-recurrence equivalent to the infinite Horner scheme. The function scomp u v may be defined as (5)

scomp u (0 : v) = aux u

where aux (u0 : u) = u0 : v × (aux u)

The auxiliary local function is a typical example of closure which “traps” the external argument v and uses it as global. The reversion of the power series u(x) = x + u2 x2 + . . . is the series x(u) = u + . . . resulting from the definition of u. This operation is useful for calculating the virial coefficients in statistical physics, and in many other domains where quantities are expressed through implicit equations solvable in terms of power series. Many reversion algorithms exist, but the following one yields the shortest code known to the author. The auxiliary series m defined through: x = um reduces the identity x = u − u2 x2 − u3 x3 − . . . into a self-referring composition algorithm: (6)

revs (0 : 1 : u) = x where x = 0 : m;

m = 1 : (−m2 ) × scomp u x

The generalization for the case u1 6= 1 is trivial, we just need to construct from u(x) the series u e(x) = u(κx).

2.3

Schr¨ oder function

The analysis of several iterative processes: xn+1 = f (xn ) becomes easier if we can say something general about the properties of iterated folds: f (n) (x) = f f (n−1) (x) = f (f (. . . f (x) . . .)), where f (0) (x) = x. The reversion of the series z = u(x) is just x = u(−1) (z), so, it would be interesting to define f (κ) also for negative, or even fractional κ. For a given u(x) we begin with the construction of the Schr¨ oder function vu (x) which obeys the equation (7)

vu (u(x)) = avu (x)

where a is a constant. We shall analyze the neighbourhood of zero, and because of the scaling freedom all Schr¨ oder functions should be of form v(x) = x + v2 x2 + . . ., which constrains the constant factor to be a = u1 . The value of u0 should be zero, and a different from 1 to prevent ambiguities. It is easy to see that vu (u(2) (x)) = avu (U (x)) = a2 vu (x). This can be generalized to any convolution power n, and inverted, which gives vu (u(κ) (x)) = aκ vu (x) for any κ. If (−1) we compute vu (x), we can reconstruct any u(κ) . The solution is: vu (aκ vu (x)). In order to obtain a co-recursive algorithm for the Schr¨ oder function, we look for a more general series v, fulfilling the equation v(u(x)) = w(x)v(x) + s(x). This constrains v0 to be equal to s0 /(1 − w0 ), and if s0 = 0 and w0 = 1, which is the interesting case for us, we define it as 1. The expansion v(u(x)) = v0 + u × v after the elimination of v0 and division by u gives: (8)

2.4

gschr u w@(w0 : w) s@(s0 : s) = (if s0 == 0 && w0 == 1 then 1 else s0 /(1 − w0 )) : gschr u (w/u) ((w0 · w + s)/u)

Example: zero-dimensional φ3 theory

From the perturbational point of view the Dyson-Schwinger equation in Quantum Field Theory may be treated as an open, co-recursive formula, whose solution is the generating functional for the propagators. For pedagogical reasons it is useful to consider the zero-dimensional variant of the theory, see e.g. the textbook of Cvitanoviˇc [4]. In this

Lazy Codes

4

case there is no integration, and the problem becomes just combinatoric, but not trivial. The field functional φ which depends on the external current J and the coupling constant γ obeys the equation   ∂φ 1 (9) φ=J+ γ + φ2 . 2 ∂J The derivatives ∂ n φ/∂J n for J = 0 are the Green functions of the theory, and they depend on γ. We want to obtain d2 = (∂φ/∂J)|J=0 as a power series in γ. We will represent φ as a series in γ, whose coefficients are series in J. The first coefficient J is thus the unit series 0 : 1 : Zeros, and the coefficient 1/2 must be lifted to the domain of series. The derivative of φ should be computed term-wise, since the object itself is a series in γ. The result is: (10)

phi = j : h · (phi × phi + map d phi ) where one = 1 : []; j = 0 : one ; h = (1/2) · one

It is an infinite row-wise matrix, and we need its second column. This is obtained by d2 = map (head . tail) φ, where the functions head and tail select the appropriate fields u0 and u from a pair (u : u), and the period is the composition  12155 8 11865 10 4 6 operator: (f . g) x = f (g x). The final result is: d2 = 1 + γ 2 + 25 + O γ 12 . 8 γ + 15γ + 128 γ + 16 γ

3

Lazy Computational Differentiation

In this section we will show how to differentiate functions specified by procedures. We shall not do any manipulation of symbolic data. The process should be easily integrable with any other program, inexpensive, and exact up to machine precision. The relevant approach, known as Computational Differentiation (CD) is widely known, see e.g., [5, 6]. Itis based on the observation that all expressions in a program are composed of primitive values combined with operations whose differential properties are known, and the chain rule for the differentiation does the rest. If the conditionals, loops, etc., are streamlined, i.e., if the differentiation process follows the control thread of the “main” computation, it is possible to compute fully automatically all the derivatives, exactly as if they were computed by a manual preprocessing of the program source. One of the standard CD methods avoids this preprocessing, and is based on overloading of mathematical operations over composite data which contain the expression value, and the value of its derivative. The user has only to specify which entities in his program are constants (with vanishing derivatives), and which should be considered as “variables”.

3.1

Local differential algebra

For simplicity we present the case with one independent variable. The laziness offers a new facility, absent in known CD systems: we can compute the derivatives of any order, without effort. We show how to augment the standard set of arithmetic functions by the derivation operator ∇, which should be linear, and it should obey the Leibniz rule. It may be re-applied, the resulting algebra of expressions should be closed. Since in general, expressions e and ∇ e are algebraically independent, the differentiation generates a “new entity” within the program. We propose to “lift” all numerical expressions (the values thereof) into the domain of infinite lazy sequences: e → [e0 , e0 , e00 , e(3) , . . .]. If we want to construct series whose elements belong to our differential algebra, it is preferable to define a different data constructor. In this paper we denote it by ., and the empty list equivalent to the constant 0 will be represented by the symbol Z. This is not an “indeterminate”, but just a constant tag with symbolic name, which is equivalent to (0 . 0 . 0 . . . .). The operator ∇ is just the tail of the sequence. All constants c will be lifted to c . Z. The independent variable, e.g., the parameter x of the differentiated function is embedded into the form x . 1 . Z. The multiplication of an expression by a number, and adding two expressions reduces – as in the case of power series – to the application of the functionals map and zipWith (+) The multiplication, division etc. are less trivial, but standard. The generic form of a lifted expression e will be e = (e0 . e0 ). We omit the definitions involving Z. (11) (12) (13) (14) (15)

e@(e0 . e0 ) × f @(f0 . f 0 ) = (e0 f0 . (e0 × f + e × f 0 )) e@(e0 . e0 )/f @(f0 . f 0 ) = (e0 /f0 ) . (e0 /f − e × f 0 /(f × f )) exp (e0 . e0 ) = r where r = (exp e0 . e0 × r) log e@(e0 . e0 ) = (log e0 . e0 /e) sqrt (e0 . e0 ) = r where r = (sqrt e0 . e0 /(2 · r))

Lazy Codes

3.2

5

Example: Taylor expansion of Lambert function

The idea behind this implementation of CD is not just to calculate derivatives, but to use the derivation operator as any other, which permits to construct and to solve differential recurrences as easily as the algebraic ones. They are sometimes very useful for functions defined by implicit equations. We find the expansion around zero of the Lambert function W (z), see the paper [7], given by the equation W (z)eW (z) = z. This function is useful in many branches of computational physics and in combinatorics. The differentiation of this equation gives (16)

dz = eW (1 + W ) dW



=

 z (1 + W ) for z 6= 0 W

whichyields

dW e−W W 1 = = , dz 1+W z 1+W

and gives a one-line code for the McLaurin sequence of W , knowing that W (0) = 0: w = 0.0 . (exp (−w)/(1.0 + w)).

3.3

WKB expansion

We derive here the higher order terms for the Wentzel-Kramers-Brillouin expansion, as presented in the book [8], and useful for some quasi-classical approximations in Quantum Mechanics. We begin with an equation for y(x), containing a small parameter : 2 y 00 = Q(x)y. A regular development of y(x) in  is impossible because of the essential singularity: the equation changes its character when  = 0. Within the standard WKB formalism y is represented as ! 1X n (17) y ≈ exp  Sn (x) ,  n=0 √ which generates a chain of coupled recurrent equalities satisfied by Sn . The lowest is S00 = ± Q, p approximation (which needs an explicit integration irrelevant for our discussion), and exp(S1 ) = 1/ S00 . But the recurrences become quickly unreadable. Our expansion separates the odd and the even powers of . The coefficients are omitted.   1 (18) y ≈ exp S0 + U (x, 2 ) + V (x, 2 ) .  Injecting this formula into the original equation produces the following differential identities: (19)

U0 =

−1 S000 + 2 V 00 2 S00 + 2 V 0

or

eU = p

S00

1 , + 2 V 0

and

V0 =

 −1 U 02 + U 00 + 2 V 02 . 2S00

These cross-referencing definitions they constitute an effective lazy algorithm for U (x) and V 0 (x). It is not obvious that the problem is well defined. In order to get the value of V 0 (x), we need U 00 , which needs V 00 , and V (3) etc. But the functions U and V should be treated as series in 2 , and the higher derivatives of U and V appear only in higher-order terms, which make the co-recursive formulae effective. We specify the value of the (differentiation) variable e.g. x. Then we define s00 and the equation (19) may be coded as (20)

u0 = −0.5 · (∇s00 : ∇v 0 )/(s00 : v 0 );

v 0 = (−0.5/s00 ) · (u02 + ∇u0 +: v 0 × v 0 )

where (a +: b) which represents a + 2 b is defined as (a0 : a) +: b = a0 : (a + b). We need only the main values of the series u0 and v 0 , so we map the head selector through them to obtain the numerical results.

4

Optimization of numerical sequences

Infinite streams may represent other objects, e.g., discretized trajectories. The generation of such sequences as t = [t0 , t0 + h, t0 + 2h, . . . , t0 + nh], and applying (map f ) to them is trivial. But there are better ways, for the sine aplied to an arithmetic sequence we would rather use the identity (21)

sin(t + (i + 2)h) = 2 sin(t + (i + 1)h) cos(h) − sin(x + ih) ,

Since the arithmetic and geometric progressions are so common, we might design alternative data structures which takes into account their properties, e.g., we may store only the first element, and the difference (quotient). We specify the following tags/operators: (C x) is a constant sequence, equivalent to the generic case [x, x, . . .]. (x +> d) denotes the differential sequence [x, x + d0 , x + d0 + d1 , . . .), and (x ×> q) is the generalized geometric progression

Lazy Codes

6

[x, q0 x, q0 q1 x, . . .]. The standard arithmetic progression is just (x +> (C h)). We see that the factorial sequence [0!, 1!, 2!, . . .] is represented by (1 ×> (1 +> (C 1))). The reader may easily find that c · (x ×> q) = (cx ×> q), that exp(x +> r) = (exp(x) ×> exp(r)), etc. For the arithmetic sequences the trigonometric functions take the form (22)

sin (x +> (C h)) = p where p = (sin x) : q; q = (sin(x + h)) : r;

r = (2 cos h) · q − p

but the most important simplification is the multiplication formula for the differential sequences (23)

p@(x +> x) × q@(y +> y) = (x y +> (p × y + x × q + x × y))

If the system cannot find any specific simplification, e.g. for the reciprocal of the arithmetic sequence, it is converted into its generic form, and the functional map is applied. Our compact representation may be considered slower than standard sequences, but we shall not forget that their evaluation is lazy, and once a term is computed, it remains memoized. While the tail of (x : q) is just q, its definition for the differential sequence is more involved: (24)

tail (x +> p) = (x + head p) (tail p)

Differential representation facilitates the manipulations which use the difference operator: ∆un = un+1 − un . We see that ∆ ( +> d) = d, while ∆ ( : q) = q − p. The partial sum operator Σ u = [u0 , u0 + u1 , u0 + u1 + u2 , + . . .] transforms the differential sequence into the generic one: atoseq (x +> r) = Σ (x : r). The sums are defined by (25)

4.1

Σ p@(C x) = x +> p;

Σ (x : q) = w

where w = x : (w + q)

Acceleration of convergence

There exist a rational transformation known as the Wynn process (see the article [9]), defined below, which can be used not only to accelerate the convergence but for the rationalization (Pad´eization) of power series. The procedure (2n) transforms xn into wn = 0 , using the following recurrences: (26)

(−1) = 0; n

(0) n = xn ,

1

(k−1)

(k+1) = n+1 + n

(k) n+1

(k)

− n

,

k = 1, 2, . . .

where the sequences (2k−1) are auxiliary only. We have to deal with the two sets of indices, so we construct a stream of streams, which is then reduced by mapping the head operator through all the rows. The procedure becomes (27)

4.2

e p = map ∆ p wynn x = map head ev where ∆ e e e (recip (∆ e ev))) ev = x +> (∆ ev + recip (∆ (tail od))); od = (C 0) +> (∆

Continued fraction evaluation

One known method of evaluating the continued fraction (CF) expansion f = b0 + a1 /(b1 + a2 /(b2 + . . .)) is the Lentz algorithm, see [10]. The lazy code below generates an infinite sequence of convergents, and the stopping criteria are left for the consumer module. The sequence b begins with b1 , the term b0 is separated. (28)

cfeval b0 b a = d = 0 : recip (b + ad);

c = b0 : (b + a/c) in b0 ×> tail (cd)

We know that ez satisfies: ez = 1 + (z/(b1 + z/(b2 + . . .)), with b = [1, −2, −3, 2, 5, . . .], which may be generated by: b = p + (0 : t), where t = −2 : 0 : (−t); p = 1 : 0 : (t − p). Applying cfeval to it gives the approximation to e.

4.3

Simple rationalization of series

The continued fraction expansion can be used for the rationalization of power series. We shall present a simplified algorithm, which is not ensured against zero coefficients. The corrections do not add anything to the lazy generation technique. A regular power series can be almost immediately transformed into a particular continued fraction: (29)

u0 + u1 x + u2 x2 + . . . = g0 +

g1 x . g2 x 1+ 1 + ···

Lazy Codes

7

We shall use the standard list constructor (:) for the sequence gn ; only its structure is needed, not its algebraic properties. The conversion function u → g is defined as follows sertocf u@(u0 : u) = u0 : scf u where scf v@(v1 : v) = v1 : scf (−v/v)

(30)

The truncations of the right-hand-side of the equation (29) form the sequence Pn (x) g0 g0 + g1 x g0 + (g0 g2 + g1 )x , , ,... ,... 1 1 1 + g2 x Qn (x)

(31)

where P0 (x) = Q0 (x) = Q1 (x) = 1, P1 (x) = g0 + g1 x, and the ratio of polynomials P/Q fulfills the recurrence Pn+1 (x) gn+1 xPn−1 (x) + Pn (x) = . Qn+1 (x) gn+1 xQn−1 (x) + Qn (x)

(32)

The 2n-th term of the sequence gives the [n/n] rational approximant. The reconstruction code is: ratser (g0 : g1 : g) = rts ([g0 ], [1]) (g0 : [g1 ], [1]) g where rts r@(a, b) s@(c, d) (p : p) = r : rts s ((0 : p · a) + c, (0 : p · b) + d) p

(33)

5

Final Remarks

The manipulation of complicated mathematical objects does not always require symbolic algebra packages. We have shown that a more direct approach, easier to integrate with number-crunching codes is possible, provided the underlying programming language has some facilities to process the user-defined data. But the co-recursive approach to algorithm construction does not depend on the underlying mathematical domain, and it can be applied to derive particularly compact procedures operating upon symbolic data as well.

References [1] S. Peyton Jones, Haskell 98: A Non-strict, Purely Functional Language. Report, software and documentation available from the site http://www.haskell.org [2] D. Knuth, The Art of Computer Programming, vol. 2, Addison-Wesley, Reading, (1981). [3] J. Karczmarczuk, Generating Power of Lazy Semantics, Theor. Comp. Sci. 187, (1997), pp. 203–219. [4] P. Cvitanoviˇc, Field Theory, Nordita Lecture Notes, (1983). [5] G. Corliss Automatic differentiation bibliography, SIAM Proceedings of Automatic Differentiation of Algorithms: Theory, Implementation and Application, ed. by G. Corliss and A. Griewank, (1991), and many times updated since then. Available from the netlib archives ([email protected]). [6] J. Karczmarczuk, Functional Differentiation of Computer Programs, Proceedings, III ACM SIGPLAN International Conference on Functional Programming, Baltimore, (1998), pp. 195–203. [7] R. Corless, G. Gonnet, D.E.G. Hare, D.J. Jeffrey, D. Knuth, On the Lambert W function, Advances in Comp. Math. 5 (1996), pp. 329–359. [8] C. Bender, S. Orszag, Advanced Mathematical Methods for Scientists and Engineers, McGraw-Hill, (1978). [9] P. Wynn, On a Device for Computing the em (Sn ) transformation, MTAC 10, (1956), pp. 91 – 96. [10] W. J. Lentz, Generating Bessel functions in Mie scattering calculations using continued fractions, Applied Optics, 15, (1976), pp. 668 – 671.