A Normal Form Projection Algorithm for

13 downloads 0 Views 389KB Size Report
The normal form projection algorithm provides one solution to the problem of ... As we discuss below, this method of network synthesis is roughly the inverse of ...
A Normal Form Projection Algorithm for Associative Memory Bill Baird

Dept Mathematics and Dept Molecular and Cell Biology, 129 LSA, U.C.Berkeley, Berkeley, Ca. 94720

1 INTRODUCTION

Frank Eeckman

Lawrence Livermore National Laboratory, P.O. Box 808 (L-426), Livermore, Ca. 94550

The learning rules described here are speci cally designed to permit the construction of biological models and the exploration of engineering or cognitive networks - with analytically guaranteed associative memory function - that employ the recurrent architectures and the type of dynamics found in the brain. Patterns of 40 to 80 Hz oscillation have been observed in the large scale activity (local eld potentials) of olfactory cortex [19] and visual neocortex [24], and shown to predict the olfactory [21] and visual [18] pattern recognition responses of a trained animal. Similar observations of 40 Hz oscillation in auditory and motor cortex (in primates), and in the retina and EMG have been reported. It thus appears that cortical computation in general may occur by dynamical interaction of resonant modes, as has been thought to be the case in the olfactory system. The oscillation can serve as a macroscopic clocking function and entrain or \bind" the relevant microscopic activity of disparate cortical regions into a well de ned phase coherent collective state or \gestalt". This can overide irrelevant microscopic activity and produce coordinated motor output. There is further evidence that though the collective activity is roughly periodic, it is actually chaotic (nonperiodic) when examined in detail [20]. If this view is correct, then oscillatory and possibly chaotic network modules form the actual cortical substrate of the diverse sensory, motor, and cognitive operations now studied in static networks. It must then be shown how those functions can be done with oscillatory and chaotic dynamics, and what advantages may be gained thereby. Our challenge is to accomplish real tasks that brains can do, using ordinary di erential equations, in networks that are as faithful as possible to the known dynamics and anatomical structure of cortex. It is our expectation that nature makes good use of this dynamical complexity, and our intent is to search here for novel design principles that may underly the superior performance of biological systems in pattern recognition, robotics, and intelligent behavior. These may then be applied in arti cial systems to engineering problems to advance the art of computation. To this end, we discuss at the close of the chapter a parallel distributed processing architecture that is inspired by the structure and dynamics of cerebral cortex. It is constructed of recurrently interconnected associative memory modules whose theoretical analysis forms the bulk of this chapter. Each module is a network model of a \patch" of cortex that can store oscillatory and chaotic attractors by a Hebb rule. The modules can learn connection weights between themselves which will cause the system to evolve under a clocked \machine cycle" by a sequence of transitions of attractors within the modules, much as a digital computer evolves by transitions of its binary ip- op states. Thus the architecture employs the principle of \computing with attractors" used by macroscopic systems for reliable computation in the presence of noise. Clocking is done by rhythmic variation of certain bifurcation parameters which hold sensory modules clamped at their attractors while motor states change, and then clamp motor states while sensory states are released to take new states based on input from external motor output and internal feedback. The mathematical foundation for the network modules described in this paper is contained in the projection theorem, which details the associative memory capabilities of networks utilizing the normal form projection algorithm for storage of periodic attractors. The algorithm was originally designed, using dynamical systems theory, to allow learning and pattern recognition with oscillatory attractors in models of olfactory cortex. Here we concentrate on mathematical analysis and engineering oriented applications of the algorithm, and brie y discuss biological models at the end. We focus attention on the storage of periodic 1

attractors, since that is the best understood unusual capability of this system. The storage of static and chaotic attractors are discussed as variations on this theme. We hope to give intuitive discussion and geometric perspectives to compliment and clarify the formal analysis. Other approaches to oscillatory memory may be found in [26, 17, 45, 33, 37]. The normal form projection algorithm provides one solution to the problem of storing analog attractors in a recurrent neural network. Associative memory storage of analog patterns and continuous periodic sequences in the same network is analytically guaranteed. For a network with N nodes, the capacity is N static attractors. Periodic attractors which are simple cycles can be stored at a capacity of N=2. There are no spurious attractors, and there is a Liapunov function in a special coordinate system which governs the approach of transient states to stored periodic trajectories. For storage of di erent types of attractors in the same network, there are N units of total capacity in an N node network. It costs one unit of capacity per static attractor, two per Fourier component of each sequence, and at least three per chaotic attractor. A key feature of a net constructed by the projection algorithm is that the underlying dynamics is explicitly isomorphic to any of a class of standard, well understood nonlinear dynamical systems - a normal form [27]. This system is chosen in advance, independent of both the patterns to be stored and the learning algorithm to be used. This confers control over the network dynamics since it permits the design of important aspects of the dynamics independent of the particular patterns to be stored. Stability, basin geometry, and rates of convergence to attractors can be programmed in the standard dynamical system. The storage of oscillation amplitude patterns by the projection algorithm begins with the speci cation of oscillations by coupled ordinary di erential equations in a special polar coordinate system, where the equations for the amplitudes of the oscillations are independent of those for the phase rotations. These equations are the normal form for the Hopf bifurcation. Amplitude coupling coecients are chosen to give stable xed points on the coordinate axes of the vector eld of amplitudes (see g 2) where the Liapunov function is de ned, frequencies are chosen for the phase equations, and the polar system is then transformed to Cartesian complex conjugate coordinates. The axes of this system of nonlinear ordinary di erential equations are then linearly transformed into desired spatial or spatio-temporal patterns by projecting the system into network coordinates(the standard basis) using the desired vectors as columns of the transformation matrix. As we discuss below, this method of network synthesis is roughly the inverse of the usual procedure in bifurcation theory for analysis of a given physical system. These operations may all be compactly expressed in network coordinates as a formula or \learning rule" for the coupling weights given the desired patterns and frequencies. Although it is never apparent in network coordinates, since no restricitons are imposed on the patterns that can be stored, these networks are always implicitly the product of a space of phases where frequencies are determined, and an independent space of amplitudes where attractors are programmed. In the general case, the network resulting from the projection algorithm has third order synaptic weights or \sigma pi" units [38], which appear as a four dimensional matrix Tijkl or tensor of coupling in the network equations. In the projection algorithm, the projection of the additional weights Tijkl for the competitive cubic terms ?xj xk xl allow us to guarantee exact storage of N static or N=2 oscillatory memories - without \spurious attractors". This compares with roughly 1:5N static attractor capacity for Hop eld nets. For engineering purposes, there are optical systems which can implement these higher order networks directly[40], and fast network simulation architectures that may allow application of these systems to real world engineering problems in pattern recognition, process control, and robotics. The projection network is an alternative network for implementation of the normal form projection algorithm. The autoassociative case of this network is formally equivalent to the general higher order network realization which is used later as a biological model. A matrix inversion determines network weights, given prototype patterns to be stored. It has 3N 2 weights instead of N 2 + N 4 , and is more useful for engineering applications. All the mathematical results proved for the projection algorithm in the network above carry over to this new architecture, but more general versions can be trained and applied in novel ways. For biological modeling, where possibly the oscillation amplitude patterns to be stored can be assumed to be sparse and nearly orthogonal, the learning operation for periodic patterns reduces to a kind of periodic or phase dependent outer product rule that permits local, incremental learning. The system can be truly self-organizing because a synapse of the net can modify itself according to its own activity during external driving by an input pattern to be learned. Standing or traveling waves may be stored to mimic the di erent patterns of neurophysiological activity observed in the olfactory system. Between units of equal phase, or for static patterns, learning reduces further to the usual Hebb rule. It is argued that some dimensionality of the higher order synapses in the mathematical model may be realized in a biological system by synaptic interactions in the dense axo-dendritic interconnection plexus (the neuropil) of local neural populations| given the information immediately available on the primary long range connections Wij . Theoretical work shows further that only N 2 of the N 4 possible higher order weights 2

are required in principle to approximate the performance of the projection algorithm. The method of network synthesis contained in the projection algorithm is roughly the inverse of the usual procedure in bifurcation theory for analysis of a given physical system. A bifurcation is an important qualitative change in the dynamical behavior of a system as a parameter is varied { the creation or destruction of attractors, or a change of stability of an attractor to become a repellor. A phase transition that a physicist sees at a critical point in a physical system may be viewed mathematically as a bifurcation in the stochastic dynamical equations describing that system. Where dynamics describes the continuous temporal evolution of a system, bifurcations describe discontinuous changes in the dynamical possibilities. We have found that mathematical analysis is simpli ed, and the power of our learning algorithm increased, when this additional dimension of organization about a bifurcation point (critical point) is considered in the design of the network. This approach describes how associative memory can be programmed and work precisely with continuous dynamics and smooth sigmoids that are weakly nonlinear, when it is operated in the vicinity of such a critical point. This may be thought of as operation in the low gain limit near the origin, as opposed to the high gain limit on a hypercube, where analytic results are usually obtained [32, 41, 42, 35, 29, 30]. The e ective decision making nonlinearity in this system is in the higher order synapses, not the axons. It is nonlinear coupling, as opposed to nonlinear activation . Although developed from a bifurcation theory approach, where the higher order terms are usually obtained from a truncated Taylor's expansion, the results on the networks discussed in this paper are global and exact since the only nonlinearity used is the cubic terms. The results are not restricted to the neighborhood of any bifurcation point, and for simplicity will be discussed without reference to bifurcation theory. There is, however, a multiple bifurcation at critical values of di erent bifurcation parameters that can be used to generate the attractors from a single equilibrium at the origin. This kind of parameter is essential to the control of attractor transitions in the subnetworks of the hierarchical system discussed at the end.

2 PROJECTION ALGORITHM 2.1 Cycles

The oscillatory patterns that can be stored by the projection algorithm may be idealized mathematically as a spatio-temporal pattern or cycle, r~xei~ ei!t . Such a cycle is a periodic attractor if it is stable. We will rst supply some concrete imagry about these cycles. A cycle is shown in the next gure 1, where the components of the amplitude vector x are shown laid out in space as compartments of a "spatial pattern" in one dimension. The notions of \relative" (vector :~x; ~) and \global" (scalar: r; w;  ) amplitude and phase are illustrated in the gure. The global amplitude r is just a scaling factor for the pattern ~x , and the global phase ! in ei!t is a periodic scaling that scales ~x by a factor between 1 at frequency ! as t varies. The same vector ~xs or \spatial pattern" of relative amplitudes of a set ofs components can appear as a standing wave, like that seen in the olfactory bulb, if the relative phase i of each component is the same, is+1 = is ; or as a traveling wave, if the relative phase components of ~s form a gradient in space, is+1 = (1= )is . The traveling wave will "sweep out" the amplitude pattern ~xs in time, but the root-meansquare amplitude measured in an experiment [19] will be the same ~xs , regardless of the phase pattern. At least one network component of the attractor pattern, however, must be out of phase with the rest, since all components cannot pass through zero at the same time, because zero is a xed point of these equations. For a randomly chosen phase vector, these simple single frequency cycles can make very complicated looking spatio-temporal patterns. From the mathematical point of view, the relative phase pattern ~ is a degree of freedom in the kind patterns that can be stored. Patterns of uniform amplitude ~x which di ered only in the phase pattern ~ could be stored as well. Although slight concentric phase gradients, that vary in location with each inspiration, have been found in the olfactory bulb, we model its activity as a standing wave. To store the kind of patterns found in bulb, the amplitude vector ~x is assumed to be parsed into equal numbers of excitatory and inhibitory components, with an excitatory/inhibitory pair forming a local oscillator at each point in space. The components of each class have identical phase, but there is a phase di erence of 60 - 90 degrees between the classes. There is also a slight but regular traveling wave of activity in the prepyriform cortex from rostral to caudal, which can easily be modeled here by introducing a slight phase gradient into both the excitatory and inhibitory classes.

3

Standing wave

relative phase pattern global phase

0=0 i+1

w(t)

i

x(t)

x1 01

xn 0n

Traveling wave 0=a0 i+1

x1 01

global amplitude rs

i

max amplitude envelope

travel

w(t)

xn 0n

Figure 1: Standing and traveling wave examples of spatio-temporal patterns or cycles, r~xei~ ei!t , that can be stored as periodic attractors by the projection algorithm. Shown are outlines of the histogram of the components of the vector ~x at an instant of time ~x(t). The maximum amplitude envelope of positive and negative excursions (the \spatial pattern") is outlined in each case as well.

4

2.2 Projection Theorem

The central mathematical result of this work is given in the following theorem:

Theorem 2.1 Any set S; s = 1; 2; : : :; N=2 , of cycles: r~xsei~s ei!st of linearly independent vectors of relative component amplitudes ~xs 2 RN and phases ~s 2 T N (N-torus) , with frequencies !s 2 R and global amplitudes rs 2 R, may be established in the vector eld of the analog third order network: x_ i = ?xi +

N X j =1

Tij xj ?

N X jkl=1

Tijkl xj xk xl + bi (t);

(1)

by the projection operation (learning rule):

Tij =

N X mn=1

Pim Jmn Pnj?1; Tijkl =

N X mn=1

?1P ?1P ?1: Pim Amn Pmj nk nl

(2)

Here the N  N matrix P contains the real and imaginary components [~xs cos ~s ;~xs sin ~s ] of the complex eigenvectors ~xsei~s as columns. J is an N  N matrix of complex conjugate eigenvalues in diagonal blocks, and Amn is an N  N matrix of 2  2 blocks of repeated coecients aij of the normal form equations (3,4) below. 3 3 2 2

J

6 6 = 66 4

1 ?!1 !1 1

2 ?!2 !2 2

...

7 7 7 7 5

6 6 A = 66 4

;

a11 a11 a21 a21

a11 a11 a21 a21

a12 a12 a22 a22

a12 a12 a22 a22

...

7 7 7 7 5

:

The vector eld of the dynamics of the global amplitudes rs and phases s is then given by the normal form equations for a multiple Hopf bifurcation:

r_s = us rs ? rs N=2

N= X2 j =1

asj rj2

X _ s = !s + bsj rj2

j =1

(3) (4)

In particular, if us = s ?  > 0 , ask > 0 , and ass=aks < 1 8s and k, the cycles s = 1; 2; : : :; n=2 are stable, and have amplitudes rs = (us=ass)1=2 . Input bi(t) is a delta function of time that establishes an initial condition.

2.3 Proof:

The normal form amplitude equations (3) describe the approach of the network to the desired oscillatory patterns because the network equations (1) are equivalent to the normal form equations (3,4). They can be constructed from them by two simple coordinate transformations. We proceed by showing rst that there are always xed points on the axes of the normal form amplitude equations (3), whose stability is given by the condition stated in the theorem for the coecients aij of the nonlinear terms. The network is then constructed by a transformation of the normal form equations rst from polar to Cartesian coordinates, and then by a linear transformation or projection from these mode coordinates into the standard basis e1 ; e2; : : :; eN , or network coordinates. This second transformation constitutes the \learning rule" (2), shown in the theorem, because it transforms the simple xed points of the amplitude equations (3) into the speci c spatio-temporal memory patterns desired for the network.

2.3.1 Amplitude xed points

Because the amplitude equations are independent of the rotation  , the xed points of the normal form amplitude equations characterize the asymptotic states of the underlying oscillations (see g 2) . The stability 5

r

2 B C

-A

A

r

1

-B Figure 2: Two dimensional normal form amplitude vector eld with symmetry. There are four static attractors (A; ?A; B; ?B), or two oscillation amplitude attractors (A; B) which must be positive. C is a saddle point on the dotted unstable manifold that forms the separatrix which partitions the vector eld into the \wedge" shaped basins of attraction. The equilibria occurr at intersections of the axes and the ellipsoids which are nullclines where r_i = 0 for each component. By neglecting _2 , the plane can be viewed roughly as a Poincare section, for  = 0, that is pierced periodically by the orbit of mode r1 . of these cycles is therefore given by the stability of the xed points of the amplitude equations. On each axis rs , the other components rj are zero, by de nition, hence r_j = rj (uj ? for rj = 0, which leaves

N= X2 k=1

ajk rk2 ) = 0;

r_s = rs(us ? assrs2);

and r_s = 0, when rs2 = us=ass . Thus we have an equilibrium on each axis s , at rs = (us =ass)1=2 as claimed. Now the Jacobian U of the amplitude equations at some xed point r^ has elements Uij = ?2aij r^ir^j ; Uii = ui ? 3aiir^i ? 2

N= X2 j =1

aij r^j2

(5)

For a xed point r^s on axis s , Uij = 0 , since r^i or r^j = 0 , making U a diagonal matrix whose entries are therefore its eigenvalues. Now Uii = ui ? ais r^s2 , for i 6= s , and Uss = us ? 3assr^s2 . Since r^s2 = us=ass; Uss = ?2us ; and Uii = ui ? ais(us =ass) . This gives ais =ass > ui =us as the condition for all negative eigenvalues that assures the stability of r^s . Choice of aji=aii > uj =ui ; 8i; j, therefore guarantees stability of all axis xed points. 6

2.3.2 Coordinate transformations

We now construct the neural network from these well behaved equations by two transformations. The rst is polar to Cartesian, (rs ; s) to (v2s?1; v2s): Using v2s?1 = rs coss , v2s = rs sins , and di erentiating these gives, v_2s?1 = r_scoss ? rssins _s ; v_ 2s = r_ssins + rscoss _s ; by the chain rule. Now substituting rscoss = v2s?1 , and rssins = v2s , gives: v_ 2s?1 = (v2s?1=rs)_rs ? v2s_s v_ 2s = (v2s =rs)_rs + v2s?1_s Entering the expressions of the normal form for r_s and _s , gives: v_ 2s?1 = (v2s?1=rs )(usrs + rs

N= X2 j =1

asj rj2 ) ? v2s(!s ?

N= X2 j =1

bsj rj2);

and since rs2 = v22s?1 + v22s , v_ 2s?1 = usv2s?1 ? !sv2s ? Similarly, v_ 2s = usv2s + !s v2s?1 ?

N= X2 j =1

N= X2 j =1

[v2s?1asj ? v2sbsj ](v22j ?1 + v22j )

[v2sasj + v2s?1bsj ](v22j ?1 + v22j )

Setting the bsj = 0 for simplicity, choosing us = s ?  to get a standard network form, and reindexing i; j = 1; 2; : : :; N , we get the Cartesian equivalent of the polar normal form equations. v_ i = ?vi +

N X j =1

Jij vj ? vi

N X j =1

Aij vj2

(6)

Here J is a matrix containing 2  2 blocks along the diagonal of the local couplings of the linear terms of each pair of the previous equations v2s?1 , v2s , with ? separated out of the diagonal terms , so that s = us +  . The matrix A has 2  2 blocks of identical coecients asj of the nonlinear terms from each pair. 2 3 2 3 a11 a11 a12 a12 1 ?!1 6 a11 a11 a12 a12 7 6 ! 1 1 7 6 a 7 6 7 a a a ? ! 6 7 : 21 21 22 22 6 7 2 2 ; A=6 a a a a J=6 7 7 ! 2 2 4 21 21 22 22 5 4 5 ... ...

2.3.3 Learning transformation - linear term

The second transformation is linear. J can be viewed as the canonical (diagonalized) form of a real matrix with complex conjugate eigenvalues, where the conjugate pairs appear in blocks along the diagonal as shown ( J ? I is the Jacobian of this system). The Cartesian normal form equations describe the interaction of these linearly uncoupled complex modes due to the coupling of the nonlinear terms. We can interpret the normal form equations as network equations in eigenvector(= mode) or \memory" coordinates. In other words, we can think of them as having been produced from some network by a diagonalizing coordinate transformation P, so that each axis pair v2s?1; v2s is a component of the complex eigenvector basis, and J displays the complex conjugate eigenvalues of some coupling matrix T in network coordinates. In this case, for the linear term, we know that J = P ?1TP: 7

Then it is clear that

T = PJP ?1: The matrix P is usually constructed of columns which are the eigenvectors calculated from the matrix T to be diagonalized. For the present purpose, we work backwards [36, 41], choosing eigenvectors for P and eigenvalues for J, then constructing T by the formula above. We choose as columns the real and imaginary vectors [~xs cos ~s ;~xs sin ~s ] of the cycles ~xs ei~s of the set S to be learned. Any linearly independent set of complex eigenvectors in P can be chosen to transform the system and become the patterns \stored" in the vector eld of network dynamics. This learning process might also be thought of as giving a vector representation to modes in idealized coodinates by projecting out of that space onto the vectors of the standard basis - hence the name \projection algorithm". If we write the matrix expression for T above in component form, we recover the expression given in the theorem for Tij , N X

Tij =

Pim Jmn Pnj?1:

mn

2.3.4 Nonlinear term projection

The nonlinear terms are transformed as well, but the expression cannot be easily written in matrix form. Using the component form of the transformation, N X

xi =

j

Pij vj ; x_ i =

N X j

Pij v_j ; vj =

and substituting in the Cartesian normal form, x_ i =

N X

PN

j

Pij v_ j =

N X j

Pij [?vj +

N X k=1

N X P ?1x

jk k ;

k

N X

Jjkvk ? vj

k=1

Ajk vk2 ]:

Now, using vj = k Pjk?1xk , we get, x_ i = (? + 1)

?

N X j

N X j

Pij (

Pij (

N X k

N X P ?1x

jk k ) +

k

Pjk?1xk )

Rearranging the orders of summation gives, x_ i = (? + 1)

?

N X N X

N X l

N X j

Ajl (

Pij

N X m

N X k

?1xm )( Plm

N

N N

l

k

N X P ?1x )

Jjk (

l N X n

kl l

Pln?1xn )

X XX Pij Jjk P ?1)xl Pij P ?1)xk + (

( jk k j N N X N X N X N X X n m k

(

j

l

j

kl

?1P ?1)xk xm xn Pij Pjk?1Ajl Plm ln

Finally, performing the bracketed summations and relabeling indices gives us the network of the theorem, x_ i = ?xi +

N X j

Tij xj ?

with the expression for the tensor of the nonlinear term, Tijkl = Q.E.D.

N X mn

N X jkl

Tijkl xj xk xl

?1P ?1P ?1 Pim Amn Pmj nk nl

8

2.4 Numerical Example

Everything mathematically required to implement this network is given in the statement of the theorem. One can pick desired amplitude patterns, phase patterns, frequencies, (~xs ; ~s ; !s; s), and normal form coecients 0 < ass=aks < 1 . Then form matrices P, J, and A. Numerically calculate P ?1 and execute the speci ed projection operation (2) to nd the weights Tij and Tijkl that store the desired patterns as periodic attractors. Inputs bi(t) which establish nearby initial conditions will \retrieve" these patterns as the cycle of activity into which the network settles. The only restriction on the patterns to be stored is that the columns of P must be linearly independent so that P may be inverted. This means, for example, that the relative phases i of any pattern cannot all be identical. This procedure may be done incrementally by replacing previously unused (randomly chosen) P matrix columns with new desired patterns, setting the corresponding J and A matrix entries (previously zero) to desired values, and re-executing the projection rule (2). Any prexisting set of stable patterns or sequences may also be modi ed by simply choosing new P matrix columns corresponding to those sequences to be modi ed and projecting again for the new weights. We give here a simple numerical example of the algorithm. This is a four dimensional system - four units or neural populations (N = 4) - that stores two oscillatory attractors of the same frequency !1 = !2 = 250rad=sec = 40Hz. 2 2 2 1 0 2 0 3 2 ?250 0 0 3 1 1 10 10 3 p 6 2 0 ?1 0 7 2 0 0 7 6 1 1 10 10 7 P = 1= 5 4 0 1 0 2 5 ; J = 64 250 0 0 2 ?250 5 ; A = 4 10 10 1 1 5 : 0 2 0 ?1 0 0 250 2 10 10 1 1 The patterns in the matrix P to be stored as attractors are chosen for numerical simplicity and to satisify the constraints of the Hebb rule to be introduced in the next section and the biological model introduced much later. With these constaints, the projection rule coincides with the Hebb rule, and will generate a network of the biological form. This illustrates how precisely networks may be programmed by the projection learning rule. The patterns are chosen to be orthonormal and each is divided into two sets of components (excitatory and inhibitory) of constant phase within the set. The rst two columns contain the real and imaginary parts [~xs cos ~s ;~xs sin ~s ] of the complex eigenvectors j~xsjei~s which determine the rst pattern. We have chosen 1 = 2 = 0 degrees for the rst two components, which represent excitatory neurons in the biological model. Since the cosine is one and the sine is zero, we see just the chosen amplitudes, x1 = 1 and x2 = 2, in the rst column, with the second column zero. Now we choose 3 = 4 = 90 degrees phase di erence from 1 ; 2 ; but with the same amplitudes, to represent the activity of the inhibitory neurons in the biological model. Here we get the nonzero column switched because the cosine is now zero, and the sine is 1. For the second attractor pattern to be speci ed by the next two columns, we use the same phase structure, but chose a di erent amplitude pattern where the larger amplitude value occurs now in the rst component x1 = 2 and x2 = ?1. The minus sign makes these two patterns orthogonal now, because the sum of the product of the corresponding components (inner product) is zero. The absolute value (magnitude) of these numbers determines the amplitude of the oscillation, so the sign does not a ect the amplitude speci cation, but reverses pthe phase of this component by 180 degrees. The norm of these column vectors is the same, p 12 + 22 = 5, so we can divide the whole matrix by this value to get normalized vectors. The chosen frequencies, !1 = !2 = 40Hz = 40(2) = 250 radians/sec, are placed in the J matrix , along with real parts 1 = 2 = 2 of the complex conjugate eigenvalues. We will chose  = 1 in the network equations, so that u = ?  = 2 ? 1 = 1 in the normal form amplitude equations.pSince we have p chosen ass = 1, we will get oscillations with identical global amplitude multipliers rs = us=ass = 1=1 = 1, as shown earlier. We will show later that our convergence rates near these periodic attractors will be e?2ust = e?2t along the s axis direction, and e(ui ?ais (us=ass ))t = e(1?10(1=1))t = e?9t in the direction of the other axes. Since the P matrix is orthonormal, P ?1 = P T , and need not be calculated numerically. Now applying the matrix multiplications of the projection rule for attractors meeting these particular speci cations to nd the weights Tij will yield the matrix T shown below, 2 2 2 0 ?250 0 3 1 2 0 0 3 p ?250 75 : P ?1 = P T = 1= 5 64 02 0?1 01 02 75 ; T = 64 0250 20 02 0 0 250 0 2 0 0 2 ?1 This is nearly the coupling structure (19) of the biological model discussed in section 6, which consists of identical oscillators of excitatory and inhibitory neural populations coupled only by excitatory connections. 9

The excitatory coupling is diagonal in this case because the eigenvalues s are equal, but full crosscoupling of oscillators is still given by the higher order terms. Here also there are diagonal entries in the lower right 2  2 block because for numerical simplicity (orthogonality) we used exactly 90 degree excitatory/inhibitory phase di erence in the patterns to be stored. As explained in [5], we must use nonorthogonal real and imaginary columns [~xs cos ~s ;~xs sin ~s ] di ering slightly from this (for example, try 3 = 4 = 85 degrees, 1 = 2 = 0 degrees), and calculate P ?1 numerically to get exactly the biological model. The biological model is thus generated as a special case of the projection learning rule for the particular kinds of attractors found in cortex. In the general case, the amplitude and phase patterns and frequencies of attractors can be chosen arbitraily - up to the requirement of linear independence of the columns of the P matrix. With these values chosen and calculated, we can execute the matrix multiplications specifying the projection learning rule (2) to nd the weights Tijkl . Simulate the network equations (1) with initial conditions close to each stored pattern on di erent runs to observe convergence to the two attractors with their di erent relative component amplitudes. Observe that oscillation amplitudes shrink and the attractors dissappear in a multiple Hopf bifurcation as  exceeds 2. Seting frequencies to zero in J and projecting again will give a network with two static attractors with the same basins of attraction as the oscillatory attractors.

2.4.1 Discussion

In addition to the linear terms Tij , the network has a speci c cubic nonlinearity programmed by the Tijkl weights, instead of the usual sigmoid nonlinearity Tij g(xj ) intimately tied to the Tij terms. Similar terms would appear if the symmetric sigmoid functions of such a network were Taylor expanded up to cubic terms (quadratic terms are killed by the symmetry). These competitive (negative) cubic terms constitute a highly programmable nonlinearity that is independent of the linear terms. Normal form theory shows that these cubics are the essential nonlinear terms required to store oscillations, because of the (odd) phase shift symmetry required in the vector eld. They serve to create multiple periodic attractors by causing the oscillatory modes of the linear term to compete, much as the sigmoidal nonlinearity does for static modes in a network with static attractors like the analog Hop eld network. The nonlinearity here is in the coupling instead of the activation functions of the network. Intuitively, these cubic terms may be thought of as sculpting the maxima of a \saturation" (energy) landscape into which the linear modes with positive eigenvalues expand, and positioning them to lie in the directions speci ed by the eigenvectors to make them stable. A Liapunov function for this landscape will be constructed later for the amplitude equations (3) of the polar coordinate form of this network. The the normal form equations for r_s and _ s determine how rs and s for pattern s evolve in time in interaction with all the other patterns of the set S . This could be thought of as the process of phase locking of the pattern that nally emerges. The unusual power of this algorithm lies in the ability to precisely specify these nonlinear interactions. The terms bij in the phase equations _ of the general Hopf normal form (4) represent a nonlinear dependence of the frequency of a mode on its amplitude r2 and those of others of nonzero amplitude during transient relaxation to a single mode. They can be used to model oscillations with \hard" or \soft" elasticity (springs) where the frequency increases or decreases, respectively, with greater amplitude of oscillation. For the sake of simplicity we set them to zero. There is a multiple Hopf bifurcation of codimension N=2 at  = = 2, where the real parts of all the complex eigenvalues of the Jacobian are zero. All of the stored attractors will arise from a single equilibrium at the origin as  decreases further. Since there are no approximations here, however, the theorem is not restricted to the neighborhood of this bifurcation, and can be discussed without further reference to bifurcation theory. The general diculty of ensuring stability of limit sets which are trajectories is due to the diculty of nding a simple \measure" to characterize them. When we start with this special polar coordinate description of periodic orbits, with amplitude equations (3) independent of the phase equations (4), then the amplitude equations give us a \handle" on these trajectories. Stable xed points of the amplitude equations correspond to stable limit cycles in the network vector eld. The usual tools of analysis and synthesis of nonlinear systems with xed points, such as Liapunov functions, can be applied to the amplitude equations to control the underlying oscillations. Although it will never be apparent in network coordinates, because no restriction is imposed on the patterns that can be stored, these systems may always be decomposed in normal form coordinates into the Cartesian product of a space of amplitude variables independent of a space of phase variables. Later we prove that the attractors on the amplitude axes are the only stable xed points in this system - there are no \spurious" attractors. We show that further restrictions on the A matrix produce a very well behaved vector eld with identical scale and rotation invariant basins that are bounded by hyperplanes. 10

When aii = c, and aij = d for all i; j, the amplitude component equations are identical, and the normal form vector eld is symmetric under permutation of the axes (the components). For the two component vector eld in the plane shown earlier in gure 2, it is obvious that this symmetry forces the diagonal separatrices through the interior saddle points to be straight lines, since only a line evenly dividing the quadrant is consistant with this symmetry under re ection of the axes. These basins are identical two dimensional \wedges" of in nite extent, and are thus naturally scale invariant. They de ne \categories" or classes of directions of vectors. The hyperplane boundaries are linearly transformed by the learning algorithm, and so remain hyperplanes in the network vector eld. Whenever there are symmetries chosen for the normal form equations, then the network equations inherit these symmetries as well, and the network attractors are interrelated by a group of transformations. These are given from the learning process even though there are no restrictions on the patterns being learned.

2.5 Learning Rule Extensions

This is the core of the mathematical story, and there are many ways in which it may be extended. For biological modeling, where the patterns to be stored may be assumed to be orthogonal as discussed previously, the linear projection operation described above becomes a generalized periodic outer product rule that reduces to the usual Hebb-like rule for the storage of xed points. There are mechanisms in neural layers that can normalize inputs [19], and we assume this here. When the columns of P are orthonormal, it follows that P ?1 = P T ; and the formula above for the linear network coupling becomes T = PJP T . To see how an outer product rule arises, suppose rst that we wanted to store static attractors. We ll the columns of the P matrix with real orthonormal eigenvectors ~xs to be learned, and for each one choose real eigenvalues s to be placed in the corresponding column on the diagonal of the matrix J. (T is the identity of course if the eigenvalues are all identical, but full cross-couplings arise for the case of complex eigenvalues.) Noting that Jmn = n mn ; and PnjT = Pjn , Tij =

N X N X m=1 n=1

Pim Jmn PnjT =

N X N X n=1 m=1

nmn Pim Pjn =

N X n=1

n PinPjn =

N X s=1

s xsixsj :

Here ij is the Kronecker delta:Pij = 1, for i = j and zero otherwise. Writing the matrix multiplication in component form, we get Tij = Nn=1 n PinPjn . This is a summation over the column entries of P, which we know by our construction to be thePpatterns we P want to learn. Thus the T matrix can be written in the familiar form of the Hebb rule Tij = s Tijs = s s xsixsj , where the eigenvalue s is revealed to be the \learning rate" or relative frequency of pattern s in the summation of patterns. In a similar fashion to the derivation above, the learning rule for the higher order nonlinear terms for static attractors becomes a multiple outer product rule when the matrix A~ of the normal form amplitude equations is chosen to have a simple form that will be shown later to provide a vector eld whose only attractors are the desired ones. Here we distinguish A~ from the previously displayed matrix A for the Cartesian normal form (6), which was shown in the proof of the projection theorem to be formed by doubling the entries of the matrix A~ of coupling coecients aij in the normal form amplitude equations (3). For static attractors, the amplitude equations alone are the normal form, since there are no phase equations. The A~ matrix is chosen to have uniform entries aij = c for all its o -diagonal entries, and uniform entries aij = c ? d for the diagonal entries. Thus A~ = cR ? dI , where R is a matrix of uniform entries Rij = 1 for all i; j , and I is the identity matrix. Choice of c > d > 0 then satis es the condition for stable axes xed points derived earlier. Now the expression for the higher order terms, when P ?1 = P T becomes Tijkl = = (using PnjT = Pjn;

N X N X

T PT PT = Pim A~mn Pmj nk nl

N X N X

T PT PT Pim [cRmn ? dmn ]Pmj nk nl

m=1 n=1 m=1 n=1 N X N N N X X X T PT PT T [ PT PT] ? d Pimmn Pmj c Pim Pmj nk nl nk nl m=1 n=1 n=1 m=1

and substituting patterns; )

Tijkl = cij kl ? d

N X

n=1

PinPjnPknPln = cij kl ? d 11

N X s=1

xsi xsjxsk xsl :

For the case of complex eigenvectors and eigenvalues to be learned to store oscillating patterns, the derivation is basically the same, but is notationally complicated by the fact that the patterns to be learned come in pairs { each eigenvector is a real and imaginary pair of columns in the P matrix, and each corresponding eigenvalue in the matrix J is a complex conjugate pair in a diagonal block, as shown previously. Tij =

N X N X

m=1 n=1

Pim Jmn PnjT =

N N X X

[

n=1 m=1

Pim Jmn ]PnjT =

N X

k=1

N=2

X P~ikPkjT = [P~i;2s?1P2Ts?1;j + P~i;2sP2Ts;j ];

s=1

Now the summation of patterns s is over pairs of columns and rows. Now the entries from the matrix P~ are P~i;2s?1 = ( s xsi cos is + !s xsi sin is ) ; P~i;2s = (?!s xsi cos is + s xsi sin is ); after the matrix multiplication by the complex eigenvalue block. Those from P T are P2Ts?1;j = xsj cos js , and P2Ts;j = xsj sin js , the jth real and imaginary components of pattern s. Thus Tij = = Then,

N= X2

s=1 N= X2 s=1

Tijs =

N= X2 s=1

( sxsi cos is + !s xsi sin is )xsj cos js + (?!s xsi cos is + sxsi sin is )xsj sin js

xsixsj[( s cos is + !s sin is ) cos js + (?!s cos is + s sin is ) sin js ] Tij =

N= X2 s=1

xsi xsj[ s cos(is ? js ) + !s sin(is ? js )]:

(7)

Thus for orthogonal cycles or xed points, the projection operation can be expressed as a local, incremental learning procedure [3]. The network can be truely self-organizing, because a synapse of the net can modify itself according to the pre and post synaptic activities it experiences during external driving by an input pattern to be learned. The learning rule above may be thought of as a \periodic" outer product rule, or a \phase dependent" Hebb rule, and there is evidence for it in the hippocampus. Between units of equal phase, or when is = js = 0 for a nonoscillating pattern, then this reduces to the usual Hebb rule. For is ? js from 0 to 90 degrees, the rule implies a positive weight change, as is found in long term potentiation (LTP) in the hippocampus [43], with the frequency dependent term !s sin(is ? js ) growing to a maximum at 90, while the cosine declines to zero. For (is ? js ) > 90 degrees, however, the synaptic change can be negative, depending on !s , since the cosine term is negative. It must be maximally negative for phase di erences of 180 to 270 degrees between pre and post synaptic depolarizations, since both sinusoidal terms are negative there. This kind of long term depression, or anti-Hebbian learning was found in hippocampus when the phase di erences between applied presynaptic stimulation and postsynaptic depolarizations was in this range - i.e. when stimulation is paired with postsynaptic hyperpolarization [43]. Again, the oscillation learning rule for the higher order nonlinear terms becomes a multiple outer product rule when the matrix A, shown in the projection theorem, is chosen to have a simple form. Recall again that the A matrix for the Cartesian normal form (6)was shown in the proof to be formed by doubling the entries aij of the matrix A~ of coupling coecients in the normal form amplitude equations (3). It is chosen here to have uniform entries Aij = c for all its o -diagonal 2 x 2 blocks, and uniform entries Aij = c ? d for the diagonal blocks. Thus A = cR ? dS , where R is a matrix of uniform entries Rij = 1 for all i; j , and S is a matrix of nonzero 2x2 diagonal blocks of uniform entries Sij = 1 . 2 3 1 1 1 3 2 1 1 6 1 1 1 7 1 1 7 ; S =4 5 R = 64 1 1 1 5 . . ... . Choice of c > d > 0 then satis es the condition for stable axes xed points derived earlier. Now the expression for the higher order terms, when P ?1 = P T becomes (where ij is the Kroneker delta) Tijkl =

N X N X

N X N X T T T T PT PT Pim Amn Pmj Pnk Pnl = Pim [cRmn ? dSmn ]Pmj nk nl m=1 n=1 m=1 n=1

12

= c T = Pkn) (usingPnk

N X m=1

T [ Pim Pmj

= cij kl ? d (usingSij = 1)

N N X X m=1 n=1

T PT PT Pim Smn Pmj nk nl

N= X2 s=1

[Pi;2s?1S2s?1;2s?1P2Ts?1;j P2Ts?1;kP2Ts?1;l + Pi;2sS2s;2s?1P2Ts;j P2Ts?1;kP2Ts?1;l

N= X2 s=1

[Pi;2s?1P2Ts?1;jP2Ts?1;kP2Ts?1;l + Pi;2sP2Ts;j P2Ts?1;kP2Ts?1;l

+Pi;2s?1P2Ts?1;jP2Ts;k P2Ts;l + Pi;2sP2Ts;j P2Ts;kP2Ts;l] = cij kl ? d

(substitute patterns)

n=1

T PT] ? d Pnk nl

+Pi;2s?1S2s?1;2sP2Ts?1;j P2Ts;kP2Ts;l + Pi;2sS2s;2sP2Ts;j P2Ts;kP2Ts;l] = cij kl ? d

(usingPijT = Pji)

N X

N= X2

[Pi;2s?1Pj;2s?1Pk;2s?1Pl;2s?1 + Pi;2sPj;2sPk;2s?1Pl;2s?1 s=1 +Pi;2s?1Pj;2s?1Pk;2sPl;2s + Pi;2sPj;2sPk;2sPl;2s ]

Tijkl = cij kl ? d

N= X2 s=1

xsixsj xsk xsl [cos is cos js cos ks cos ls + sin is sin js cos ks cos ls

+ cos is cos js sin ks sin ls + sin is sin js sin ks sin ls ] This reduces to the multiple outer product derived above Tijkl = cij kl ? d

N= X2 s=1

xsixsj xsk xsl;

(8)

for a non-oscillating system, when the phase  is zero for all patterns. As we have seen, c > d guarantees stability of all stored patterns [6]. Because the numerical example given earlier used orthonormal attractor patterns, the Hebbian rules for those patterns can be used here to nd weights that store them directly in this network, without any matrix inversions or multiplications. To implement that example with this rule, c = 10; d = 9,  = 1, = 2, and the pattern phases are 0 and 90 degrees, with amplitudes and frequencies as described.

2.6 Other Normal Forms

Given the projection algorithm for constructing networks with vector elds that are isomorphic to (given by a linear transformation of) the vector eld of a normal form or any system ordinary di erential equations with uncoupled linear terms and polynomial nonlinearities, a great may possibilities arise. One application of the projection algorithm is to construct networks for the exploration of the computational capabilities of more exotic vector elds involving, for example, quasiperiodic or chaotic ows. A neural network can probably be synthesized to realize any nonlinear behavior that is known in the catalogue of well analyzed normal forms. Later we give an example of a network that stores multiple Lorenz attractors, and we have constructed other networks for Roessler attractors, and the \double scroll" attractor of Chua [12]. The competitive Lokta-Volterra equations are an example of a di erent normal form that could be used for the projection algorithm for the storage of static patterns. These equations correspond to the normal form equations for a multiple transcritical bifurcation, and are of the form of our amplitude equations, but with quadratic instead of cubic terms. r_s = usrs ? rs 13

N X j =1

asj rj

Projection to store desired patterns then gives a network architecture with only third order correlations which does not have the additional attractors (?x is an attractor if x is) given by odd symmetry of the Hopf normal form, when it is used to store static attractors. x_ i = ?xi +

N X j =1

Tij xj ?

N X jk=1

Tijk xj xk ;

where Tijk =

N X mn=1

?1P ?1: Pim Amn Pmj nk

2.7 Networks with Sigmoids

The normal form theorem[27] asserts that there is a large class of ordinary, partial, and integro di erential equations which give local vector elds that are topologically equivalent to that of any normal form under consideration. This implies that there are many other possible network architectures to be discovered that can realize the performance of that produced by the projection algorithm. In fact, all of the above results hold as well for networks with sigmoids, provided their coupling is such that they have a Taylor's expansion which is equal to the above networks up to third order [3]. For example, the network x_ i = ?xi +

N X j =1

Tij g(xj ) ?

N X jkl=1

Tijkl g(xj )g(xk )g(xl )

(9)

is a valid projection of the Hopf normal form, if Tijkl =

N X mn=1

?1P ?1P ?1; Pim Amn Pmj nk nl

where

Amn = Amn ? (g000(0)Jmn =6g0(0)3 ); and g(x) has odd symmetry g(?x) = ?g(x) . The odd symmetry assures that there are no even order terms in the expansion about the origin (i.e. the in exion point, where g00(0) = g0000(0) = 0), and the correction Amn to AmnPremoves, in normal form coordinates, the cubic terms that arise from the expansion of the sigmoids of the Nj=1 Tij g(xj ) term in the network equations. The results now hold for bifurcation parameter values in the neighborhood of the bifurcation point for which the truncated expansion is accurate. The expected performance of this system has been veri ed in simulations, and in practice the valid region is large.

3 PROJECTION NETWORK

The projection network is an alternative network for implementation of the normal form projection algorithm [10]. The autoassociative case of this network is formally equivalent 2to the higher order network realization (2) shown above which is used later as a biological model. It has 3N weights instead of N 2 +N 4, and is more useful for engineering applications and for simulations of the biological model. All the mathematical results proved for the projection algorithm in the network above carry over to this new architecture, but more general versions can be trained and applied in novel ways. In the projection net for autoassociation, the algebraic projection operation into and out of memory coordinates?is1 done explicitly by a set of weights in two feedforward linear networks characterized by weight matrices P and P. These map inputs into and out of the nodes of the recurrent dynamical network in memory coordinates sandwiched between them. This kind of network, with explicit input and output projection maps that are inverses, may be considered an \unfolded" version of the purely recurrent networks described above. This network is shown in gure (3). Input pattern vectors ~x0 are applied as pulses which project onto each ? 1 vector of weights (row of the P matrix) on the input to each unit i of the dynamic network to establish an activation level vi which determines the initial condition for the relaxation dynamics of this network. The 14

x02

INPUT P ? matrix

x01

1

p?ij1

A matrix Dynamic v winner-take-all Network

v2

Memory Coordinates vn

P matrix

x2

Normal Form:

P

v_ i = vi ? vi j aij vj2

~x = P~v

pij x1

Network Coordinates ~v = P ?1~x0

aij 1

OUTPUT

x0n

xn

Network Coordinates

Figure 3: Projection Network - 3N 2 weights. The A matrix determines a k-winner-take-all net - programs attractors, basins of attraction, and rates of ?convergence. The columns of P contain the ouptut patterns associated to these attractors. The rows of P 1 determine category centroids recurrent weight matrix A of the dynamic network can be chosen so that the unit or prede ned subspace of units which receives the largest projection of the input will converge to some state of activity, static or dynamic, while all other units are supressed to zero activity. The evolution of the activity in these memory coordinates appears in the original network coordinates at the output terminals as a spatio-temporal pattern which may be fully distributed accross all nodes. Here the state vector of the dynamic network has been transformed by the P matrix back into the coordinates in which the input was rst applied. At the attractor ~v in memory coordinates, only a linear combination of the columns of the P weight matrix multiplied by the winning nonzero modes of the dynamic net constitute the network representation of the output of the system. Thus the attractor retrieved in memory coordinates reconstructs its learned distributed representation ~x through the corresponding columns of the output matrix P, e.g. P ?1~x0 = ~v ; ~v ! ~v ; P~v = ~x : For the special case of content addressable memory or autoassociation, which we have been describing here, the actual patterns to be learned form the columns of the output weight matrix P, and the input matrix is its inverse P ?1. These are the networks that can be \folded" into higher order recurrent networks, as?shown above. For orthonormal patterns, the inverse is the transpose of this output matrix of memories, P 1 = P T , and no computation of P ?1 is required toT store or change memories - just plug the desired patterns into appropriate rows and columns of P and P . In the autoassociative network, the input space, output2 space and normal form state space are each of dimension N. The input and output linear maps require N weights each, while the normal form coecients determine another N 2 weights. Thus the net needs 2only 3N 2 weights, instead of the N 2 + N 4 weights required by the folded recurrent network (2). The 2N input and output weights could be stored o -chip in a conventional memory, and the dynamic normal form network, with its xed weights, could conceivably be implemented in analog VLSI [?] for fast relaxation.

3.1 Learning Extensions for the Projection Network

More generally, for a heteroassociative net (i. e., a net designed to perform a map from input space to possibly di erent output space) the linear input and output maps need not be inverses, and may be noninvertible. They may be found by any linear map learning technique such as Widrow-Ho or by nding pseudoinverses (see Chapter 1 of this book for a detailed discussion of optimal linear associative mappings). 15

Learning of all desired memories may be instantaneous, when they are known in advance, or may evolve by many possible incremental methods, supervised or unsupervised. The standard competitive learning algorithm where the input weight vector attached to the winning memory node is moved toward the input pattern can be employed. We can also decrease the tendency to choose the most frequently selected node, by adjusting parameters in the normal form equations, to realize the more e ective frequency selective competitive learning algorithm [1]. Supervised algorithms like bootstrap Widrow { Ho may be implemented as well, where a desired output category is known. The weight vector of the winning normal form node is updated by the competitive rule, if it is the right category for that input, but moved away from the input vector, if it is not the desired category, and the weight vector of the desired node is moved toward the input. Thus the input map can be optimized for clustering and classi cation by these algorithms, as the weight vectors (row vectors of the input matrix) approach the centroids of the clusters in the input environment. The output weight matrix may then be constructed with any desired output pattern vectors in appropriate columns to place the attractors corresponding to these categories anywhere in the state space in network coordinates that is required to achieve a desired heteroassociation. If either the input or the output matrix is learned, and the other chosen to be its inverse, then these competitive nets can be folded into oscillating biological versions, to see what the competive learning algorithms correspond to there. Now either the rows of the input matrix may be optimized for recognition, or the columns of the output matrix may be chosen to place attractors, but not both. We are searching for a learning rule in the biological network which we can prove will accomplish competitive learning, using the unfolded form of the network. Thus the work on engineering applications feeds back on the understanding of the biological system.

3.2 Programming the Normal Form Network

The power of the projection algorithm to program these systems lies in the freedom to chose a well understood normal form for the dynamics, independent of the patterns to be learned. Here we give an intuitive discussion of vector eld programming of the normal form to preview the formal proof that follows in section 4.5. The Hopf normal form, shown here again in Cartesian coordinates, v_ i = ?vj +

N X j =1

Jij vj ? vi

N X j =1

Aij vj2

is especially easy to work with for programming periodic attractors, but handles xed points as well, when the frequencies ! in J are set to zero. J is a matrix with real eigenvalues for determining static attractors, or complex conjugate eigenvalue pairs in blocks along the diagonal forPperiodic attractors. Since the Aij and the vj2 are always positive, and the summation Nj=1 Aij vj2 is multiplied by ?vi , the cubic terms are always \competitive" { that is, of a sign that is opposite to the present sign of vi . The real parts of the eigenvalues of the Jacobian J ? I are set to be positive, and cause initial states to move away from the origin, until the competitive cubic terms dominate at some distance, and cause the ow to be inward from all points beyond. The o -diagonal cubic terms cause competition between directions of ow within a spherical middle region and thus create multiple attractors and basins. The larger the eigenvalues in J and o -diagonal weights in A, the faster the convergence to attractors in this region. It is easy to choose blocks of coupling along the diagonal of the A matrix to produce di erent kinds of attractors, static, periodic, or chaotic, in di erent coordinate subspaces of the network. The sizes of the subspaces can be programmed by the sizes of the blocks. The basin of attraction of an attractor determined within a subspace is guaranteed to contain the subspace [6]. Thus basins can be programmed, and \spurious" attractors can be ruled out when all subspaces have been included in a programmed block. This can be accomplished simply by choosing the A matrix entries outside the blocks on the diagonal (which determine coupling of variables within a subspace) to be greater (causing more competition) than those within the blocks. The principle is that this makes the subspaces de ned by the blocks compete exhaustively, since intersubspace competition is greater than subspace self-damping. Within the middle region, the ow is forced to converge laterally to enter the subspaces programmed by the blocks.

16

An simple example is a matrix of the form, 2 d 6 d  6 6 d c 6 c d 6 6 A = 66

3

(g)

 2 6 4

6 6 6 6 4

d d c c

d d c c

c c d d

c c d d

3 7 5

7 7 7 7 7 7 7; 7 7 7 7 7 5

... (g) where 0 < c < d < g. There is a static attractor on each axis (in each one dimensional subspace) corresponding to the rst two entries on the diagonal, by the agrument above. In the rst two dimensional subspace block there is a single xed point in the interior of the subspace on the main diagonal, because the o -diagonal entries within the block are symmetric and less negative than those on the diagonal. The components do not compete, but rather combine. Nevertheless, the ow from outside is into the subspace, because the entries outside the subspace are more negative than those within it. The last subspace contains entries appropriate to guarantee the stability of a periodic attractor with two frequencies (Fourier components) chosen in the J matrix. The doubling of the entries is because these components come in complex conjugate pairs (in the J matrix blocks) which get identical A matrix coupling. Again, these pairs are combined by the lesser o -diagonal coupling within the block to form a single limit cycle attractor. A large subspace can store a complicated continuous periodic spatio-temporal sequence with many component frequencies.

4 VECTOR FIELD PROGRAMMING

The key to the power of the projection algorithm is the ability to specify the global vector eld by projecting the nonlinear terms of the normal form. The resulting network is a superposition of a purely linear part and a highly speci ed cubic nonlinearity. The network vector eld may thus be programmed independently of the patterns assigned to equilibria by the linear term. The linear terms of the networks constructed above program the basis eigenvectors of the normal form, as we have seen above, and thereby determine the location of all limit sets (equilibria, cycles, sequences) in the network state space. The aij of the nonlinear terms then program the ambient ow about these limit sets - stability, basin geometry, rates of convergence. In this section we will see analytically how restrictions on the matrix A give rst a strict Liapunov function for the amplitude equations, and then a very well behaved vector eld with identical scale and rotation invariant basins that are bounded by hyperplanes.

4.1 Liapunov Function

Looking again at the Jacobian matrix of the amplitude equations at any xed point r^ , with entries Uij = ?2aij r^ir^j ;

Uii = ui ? 3aiir^i2 ?

N= X2 j 6=i

aij r^j2 ;

we see that it is clearly symmetric Uij = Uji for symmetric coupling, aij = aji: From this symmetry condition, it follows directly that the vector eld is \irrotational" (curl = 0), and it is therefore the gradient of some potential function V (~r) [31]. If we write the amplitude equations (including an input term si ), r_i = uiri ? ri

N= X2 j =i

aij rj2 + si ;

then taking anti-derivatives of the negative of the right hand terms yields the ith term of a scalar potential V (~r) , N= X2 Vi (~r) = ? 21 ui ri2 + 12 ri2 aij rj2 ? risi ; j =i 17

so V (~r) = ?1=2

N= X2 i=1

uiri2 + 1=2

N=2 N= X2 X i=1 j =i

aij ri2 rj2 ?

N= X2 i=1

ri si :

The reverse of this calculation clearly shows that ~r_ = ?rV (~r). Now V (~r) = DV (~r)~r_ = ? < ~r_ ; ~r_ >; by the chain rule, showing that V (~r) is a strict Liapunov function for the amplitude equations. However, ~r_ = ?rV (~r) is a much stronger condition than that given by a strict Liapunov function, and such a vector eld is called a gradient vector eld. The potential V (~r) is mathematically dual to the vector eld and provides an alternative description containing the same information. All trajectories are orthogonal to the level surfaces of the potential function V (~r), and must converge to xed points. The Jacobian matrix at every isolated equilibrium is symmetric with real eigenvalues, and spirals, limit cycles or chaos in the ow are strictly ruled out [31]. This guaranteed convergence to xed points of the amplitude equations, of course translates to convergence to periodic attractors for the network equations. If static attractors are being stored, then there are no phase equations, and the amplitude equations alone are the normals form for the network equations. Static attractors can be mixed in with periodic ones simply by setting ! = 0 for some s. The Cartesian normal form then contains two static modes in place of the usual complex mode. They are made stable by choosing the self coupling within the diagonal A matrix block less than the coupling between the modes. Real eigenvalues may be chosen for these in the J matrix, and two corresponding desired static patterns placed in the projection matrix P.

4.2 Rates of Convergence

The parameters ui in the amplitude equations are growth rates for the ith axis or mode ri , since ,in the absence of nonlinear terms, ri (t) = ri(0)eui t . Choice of large growth rates and strong nonlinear competition aij >> aii in the amplitude equations has been shown in simulations to produce fast convergence of an initial input state to the attractor in that basin. This is consistent with an intuitive picture of nonlinear mode competition. The input state is in the basin of a mode or axis attractor, and moves toward it under the \force" of the linear growth term and the nonlinear damping terms. The eigenvalues of the axis attractors, which we have previously calculated to get conditions for their stability, give a quantitative estimate of this convergence rate. For mode rs there is one eigenvalue ?2us , and N ? 1 eigenvalues ui ? ais (us=ass) , where the subscript i denotes the other modes ri . We therefore have exponential convergence near an axis attractor s at the rates e?2us t along the s axis, and e(ui ?ais (us =ass))t along the other axes. Thus large us and ais=ass >> ui =us program fast convergence.

4.3 Interior Fixed Points

With further restrictions on the coecients of the amplitude equations, aij = c , aii = d, for all i; j , where c > d; to maintain the stability of axis xed points, and with a uniform eigenvalue spectrum u, ui = u, the entire vector eld becomes symmetric under permutations of the axes, since the component equations are now identical. Looking again at the condition for an equilibrium in the amplitude equations, r_j = rj (uj ?

N= X2 k=1

ajk rk2 ) = 0;

it is clear that it is satis ed in every subspace Pthat is de ned by some of the components rj being zero, 2 2 wherever the remaining nonzero rk satisfy uj ? N= k =1 ajk rk = 0, for all j; k of the subspace. In vector form, 2 2 2 u = Ar , where r is a vector of elements r , for all i in the subspace. Thus, all subspace xed points are P 2 ?1 1=2i at r2 = A?1 u, hence ri = ( N= j =1 Aij uj ) , exhibiting the Z2 symmetry of the system under ips of the axes. Geometrically, each xed point can be viewed as an intersection of nullclines. The nullclines (locus of pointsPfor which r_s = 0), for each variable rs , where rs 6= 0 , are given by the states in r space where 2 2 us = N= j =1 asj rj . These are ellipsoids symmetric about the origin under ips of each axis (see gure 2. Thus equilibria not on an axis are given by the points of common intersection of all subsets of nullclines in the subspace of nonzero variables corresponding to that subset. In general, depending on the choice of aij there may be few of these xed points or there may be at most one in each orthant (because of the Z2 symmetry) of each subspace, which in general are given by a transversal intersection of the nullclines and are 18

therefore structurally stable. The stability of the interior xed points is again speci ed by the aij , which determine the eigenvalues of the Jacobian, and therefore control the direction of the ow in the state space. It has been shown for the competitive Lokta-Volterra equations, that stability of axis xed points is sucient to guarantee that all interior xed points are saddles (M. Zeeman, personnal communication). This kind of theorem probably holds for our amplitude equations as well and will be investigated in the future. For the present, for the A matrix above, we prove later that all the interior xed points are saddle points, hence the only attractors in the state space are the xed points on the axes.

4.4 Fourier Synthesis of Sequences

The stability of xed points on the axes of the normal form amplitude equations (3,4) can be transferred to an \interior" (non axis) xed point in a subspace of any dimension m < N=2 by violating the axis stability condition stated in the projection theorem, and increasing the size of the diagonal A matrix entries aii to something greater than the o diagonal entries aij in that subspace. The temporal pattern in the network corresponding to this \mixed mode" is then a linear combination of the m axis patterns at their di erent frequencies, m m X X ~x(t) = rs~xsei~s ei!s t ; or xsj(t) = rs xsjeijs ei!s t ; s=1

s=1

for spatial component j of this spatio-temporal pattern. This is a Fourier series description of a periodic function (discrete spectrum), where rs xsjeijs is the magnitude (rsxj ) and phase (eijs ) of the complex Fourier coecient Fjs for frequency !s . Given some continuous temporal pattern xj (t) of period T to be stored for component j , we can nd the real < and imaginary = parts of this coecient Fjs, as

1 mm) by themselves. These connections, and even the debated interconnections between them, are therefore left out of a minimal coupling model. The resulting network is a fair cartoon of the well studied circuitry of olfactory (pyriform) cortex. Since almost all of cortex has this type of structure in the brains of amphibia and reptiles, our super-network of these submodules has the potential to become a reasonable caricature of the full cortical architecture in these animals. Although the neocortex of mammals is more complicated, we expect the model to provide useful suggestions about the principles of oscillatory computation there as well. For an N dimensional system, this minimal coupling structure is described mathematically by the matrix   W ? hI (19) T = gI 0 : W is the N=2  N=2 matrix of excitatory interconnections, and gI and hI are N=2  N=2 identity matrices multiplied by the positive scalars g, and h. These give the strength of coupling around local inhibitory feedback loops. A state vector is composed of local average cell voltages for N=2 excitatory neuron populations ~x and N=2 inhibitory neuron populations ~y. Intuitively, since the inhibitory units receive no direct input and give no direct output, they act as hidden units that create oscillation for the amplitude patterns stored in the excitatory cross-connections W. This may perhaps be viewed as a speci c structural addition to a recurrent analog higher order network architecture to convert its static attractors to periodic attractors. Here the symmetric sigmoid functions of such a network are Taylor expanded up to cubic terms with third order weights (quadratic terms are killed by the symmetry). Network equations with the rst order coupling (19) shown above in gure 4, plus these higher order excitatory synapses, are shown below in component form. x_ i = ?xi ? hyi +

N= X2 j =1

Wij xj ?

N= X2

jkl=1

Wijklxj xk xl + bi ;

(20)

y_i = ?yi + gxi ; (21) [3, 6]. We use this network directly as our biological model. From a physiological point of view, (20) may be considered a model of a biological network which is operating in the linear region of the known axonal sigmoid nonlinearities, and contains instead these higher order synaptic nonlinearities. Adding the higher order weights corresponds, in connectionist language, to increasing the complexity of the neural population nodes to become \higher order" or \sigma-pi" units. Clusters of synapses within 28

a population unit can locally compute products of the activities on incomming primary connections Wij , during higher order Hebbian learning, to establish a weight Wijkl (see gure 5). These secondary higher order synapses are then used in addition to the synapses Wij , during operation of the overall network, to weight the e ect of triple products of inputs in the output summation of the population. Using only the long range excitatory connections Wij available, some number of the higher order synaptic weights Wijkl could be realized locally within a neural population in the axo-dendritic interconnection plexus known as \neuropil" [5]. Only (N=2)2 of these (N=2)4 possible higher order weights are required in principle to approximate the performance of the projection algorithm [8]. The size of our cortical patches is limited by this number, and is itself motivation for modularity.

6.1 Hebbian Learning

Only the higher order weights Wijkl between excitatory populations shown in the minimal model (20) are required for pattern storage that closely aproximates that of the full projection (2). The minimal network coupling for T results from the projection operation (2) when a speci c biological form is chosen, in the columns s ofs P, for the patterns to be stored. This complex form for P s and the corresponding asymptotic solutions X (t) established are shown below in (23). The numerical example for the projection learning rule shown earlier gives an example of of these patterns, and was chosen speci cally to demonstrate this result. The oscillatory chaotic attractors discussed previously may easily be stored when the A matrix has the form required for them, and sigmoids are used in the network equations. This is because, as we have seen, the Taylor expansion of the system, transformed into normal form coordinates, can be equivalent to that for the normal form equations with sigmoids (13). As we have shown earlier, for orthonormal static patterns ~xs, the projection operation for the W matrix reduces to an outer product, or \Hebb" rule, and the projection for the higher order weights becomes a multiple outer product rule: Wij =

N= X2 s=1

s xsixsj ;

Wijkl = cij kl ? d

N= X2 s=1

xsi xsjxsk xsl :

(22)

The rst rule, with s > , is analytically guaranteed to establish desired patterns ~xs as eigenvectors of the W matrix with corresponding eigenvalues s [5]. Here again, ij is the Kronecker delta, ij = 1, for i = j and zero otherwise. In the minimal net, these real eigenvalues and eigenvectors (modes) learned for W are converted by the network structure into complex conjugate eigenvalues and eigenvectors (modes) for the complete coupling matrix T which includes the local inhibitory feedback. These complex modes are standing wave oscillations s of each stored oscillation s whose amplitudes are the magnitudes of the eigenvectors. The frequency w is a speci ed function of the eigenvalue s corresponding to its amplitude pattern ~xs p and the strength s = 1=2( s  2 ? 4hg); and of the xed feedback connections g and h within the local oscillators,  s 1 ; 2 p !s = 1=2( 4hg ? 2s ) [5]. The second rule, with c > d, gives higher order weights Wijkl for the cubic terms in (20) which ensure, by the projection theorem, that the these complex modes of the coupling matrix T become the only attractors in the vector eld of thes full nonlinear system. The form of the eigenvectors of T and the corresponding asymptotic solutions X (t) established by this learning rule are shown below, "

s i~xs P = pj~gx jes i~ys h j~x je

#

"

#

s i~s +i!s t ) X s (t) = pj~gx jes xi~ys +i!s t : h j~x je

(23)

Because of the restricted coupling, the only oscillations possible in this network have zero phase lag across space, i.e. are standing waves. The phase x ; y is constant over the components of each kind of neural population x and y, and di ers only between them. This is basically what is observed in the olfactory bulb (primary olfactory cortex) and prepyriform cortex[5]. The phase of inhibitory components y always lags the phase of the excitatory components x by approximately 90 degrees. In ([5]) we show that this is a natural feature of the minimal network structure, since the phase lag for attractors s approaches ?90 degrees assymptotically for all values of hg >> s (or frequencies ! >> s ). This condition is satis ed in normal operation of the network, since attractor frequencies are typically orders of magnitude larger numbers than the positive eigenvalues of the W matrix, as is demonstrated in the numerical example below. 29

neuron population i

d

primary input connections

j

from other neuron pools

Q

W~ ijkl

output

P

Wij Wik

k

? xi Wil P

c x2

inhibitory bias l Figure 5: Neural population subnetwork acting as a sigma-pi unit. It uses secondary higher order synaptic weights W~ ijkl on products of the activities of incoming primary connections Wij ; and receives a global inhibitory bias.

When the Hebbian learning rule (22) is used, the higher order weights Wijkl of the network model (20) can be decomposed so that (20) becomes, x_ i = ?xi ? hyi + y_i = ?yi + gxi ;

N= X2 j =1

Wij xj + d

N= X2 jkl=1

N=2

X W~ ijkl xj xk xl ? cxi x2j + bi

(24)

j =1

(25) P

P

N=2 2 2 s s s s where W~ ijkl = N= s=1 xi xj xk xl comes from the multiple outer product in (22), and ?cxi j =1 xj comes from the cij jk term in (22). As we discussed earlier,Pfor the Hebb rule derived from the general projec2 2 tion algorithm, this single weight negative term ?cxi N= j =1 xj constitutes a shunting inhibitory bias which depends on the global excitatory activity of the network. \Shunting" here means multiplied by the current average cell voltage xi of population i. This is an input which is identical for all excitatory neural populations and could be calculated by a single node of the network which receives input from all excitatory populations as shown in gure 5. Such a node might correspond to one of the nuclei which lie below the prepyriform cortex. These send and receive the di use projections required to and from prepyriform cortex. The constants c and d of equation (24) give the magnitude of the inhibitory bias c and the average higher order weight d. These constants are derived from entries in the normal form matrix A, and, as we have seen, c > d guarantees stability of all stored patterns. The greater the bias c relative to d, the greater the \competition" between stored patterns, the more robust is the stability of the present attractor, and vice versa. This is the mechanism employed below in the biological sensory-motor architecture for central control of attractor transitions within modules.

6.1.1 Numerical Example

As we have noted before, the attractors of the numerical example given earlier were chosen so that they could also be stored by this network using these Hebbian rules. Here, the biological network structure we begin with determines the frequencies and phase patterns of all attractors. We start now with g = hp= 250 for the T matrix, which we found by projection before. We know this will give frequencies of !s = 1=2( 4hg ? 2s ) = p p p 1=2( 4hg ? 4) = 1=2(2 hg ? 1)  = h2 = 250=2  = 40 Hz. This Hebb rule (unlike the previous more general one) is applied directly to the amplitudes of the excitatory components only to get the matrix W and the higher order weights W~ ijkl above. As for the previous Hebb rule, = 2; c = 10; and d = 9. The orthonormal 30









p p excitatory amplitude patterns from the example are ~x1 = 1= 5 12 and ~x2 = 1= 5 ?21 : Calculating W by hand, p

p

p

p

5)(2= 5)p= 2=5 + 8=5 = 2 W11 = 2(1=p5)(1=p5) + 2(2= p W22 = 2(2= 5)(2= 5) + 2(?1= 5)(?1= 5) = 8=5 + 2=5 = 2 p p p p W12 = W21 = 2(1= 5)(2= 5) + 2(2= 5)(?1= 5) = 2=5 ? 2=5 = 0: Thus

2 0 ?250 0 3 ?250 75 ; W = 20 02 ; hence T = 64 0250 20 00 0 0 250 0 0 as was found before by projection. The W matrix is diagonal in this case only because the eigenvalues s are equal, but full crosscoupling of oscillators is still given by the higher order terms. Similarly we can calculate W~ ijkl for these amplitude vectors, p p p p p p p p W~ 1111 = (1= 5)(1= 5)(1= 5)(1= 5) + (2= 5)(2= 5)(2= 5)(2= 5) = 1=25 + 16=25 = 17=25 = :68; 



2

and so forth, to give the tensor components which can be displayed as four 2  2 matrices.        ?:24 ; W~ 12kl = ?:24 :32 ; W~ 21kl = ?:24 :32 ; W~ 22kl = :32 :24 W~ 11kl = ?:68 :24 :32 :32 :24 :32 :24 :24 :68



We get some negative weights here, and a phase reversed component in the second attractor, because we used a negative pattern value x22 = ?1 to get orthogonal patterns in this small example. In a larger more biologically realistic system where synapses respond to the mean amplitude of 3 to 5 cycles of 40 Hz activity over the 100 msec period of sensitivity for LTP, sparce patterns with all positive amplitudes would be used to get a biologically correct W~ ijkl with all positive weights. Now simulate the equations (24) above (with  = 1) to see the same relative amplitude attractors, as in the previous example, even though only half of the higher order terms are being used here. Equation (22) for Wijkl must be used to compare to numerical values of Wijkl of the previous example.

6.2 Architecture of Subnetwork Modules

Given the sensitivity of neurons to the location and arrival times of dendritic input [34], the sucessive volleys of pulses (the \wave packet" of Freeman) that are generated by the collective oscillation of a neural net may be ideal for reliable long range transmission of the collective activity of one cortical area to another. On the hypothesis that oscillatory network modules like those described above form the actual substrate of the diverse sensory, motor, and cognitive operations found in various cortical areas, we are investigating a parallel distributed processing architecture that is designed to model, however simplistically, the architecture of cerebral cortex [11]. Because we can work with this class of mathematically well-understood associative memory networks, we can take a constructive approach to building a cortical architecture, using the networks as modules, in the same way that digital computers may be designed from well behaved continuous analog ip- op circuits. The operation of the architecture can be precisely speci ed, even though the outcome of a particular computation set up by the programmed connections may be unknown. Even though it is constructed from a system of continuous nonlinear ordinary di erential equations, the system can operate as a discrete-time symbol processing architecture, like a cellular automaton, but with analog input and oscillatory or chaotic subsymbolic representations. The architecture is such that the larger system is itself a special case of the type of network of the submodules, and can be analysed with the same tools used to design the subnetwork modules. The modules can learn connection weights between themselves which will cause the system to evolve under a clocked \machine cycle" by a sequence of transitions of attractors within the modules, much as a digital computer evolves by transitions of its binary ip- op states. Thus the architecture employs the principle of \computing with attractors" used by macroscopic systems for reliable computation in the presence of noise. Clocking is done by rhythmic variation of certain bifurcation parameters which hold sensory modules clamped at their 31

attractors while motor states change, and then clamp motor states while sensory states are released to take new states based on input from external motor output and internal feedback. Supervised learning by recurrent back propagation or unsupervised reinforcement learning can be used to train the connections between modules. When the inputs from one module to the next are given as impulses that establish initial conditions, the behavior of a module is exactly predicted by the projection theorem. Each module is described in normal form or mode coordinates as the k-winner-take-all network discussed above, where the winning set of units may have static, periodic or chaotic dynamics. By choosing modules to have only two attractors, networks can be built which are similar to networks using binary units. There can be fully recurrent connections between modules. The entire super-network of connected modules, however, is itself a polynomial network that can be projected into standard network coordinates like those for the biological model. The attractors within the modules may then be distributed patterns like those described for the biological model and observed experimentally in the olfactory system [21]. The system is still equivalent to the architecture of modules in normal form, however, and it may easily be designed, simulated, and theoretically evaluated in these coordinates. The identity of the individual patches is preserved by the transformation between network and normal form coordinates. The connections within patches in network coordinates determine the particular distributed patterns that correspond to nodes in the normal form coordinates. The between-patch connections in network coordinates are transformed from those learned to e ect transitions between winning nodes in normal form coordinates; they now e ect transitions between the corresponding distributed patterns in the patches. Thus a network can be designed or analysed as a \spreading activation" style network where \concepts" correspond to single nodes, but be transformed to operate with the more fault tolerant and biologically plausible distributed representations. The \machine cycle" of the system is implemented by varying a \competition" bifurcation parameter in the sensory and motor patches, to alternately clamp one set of attractors, while the other is allowed to make any attractor transitions prescribed by the present sensory states. Motor states feed back to sensory states to cause transitions, so that internal activity can be self-sustaining, as in any recurrent network. After training, this internal activity can also be guided by a sequence of sensory inputs to produce a learned sequence of motor outputs. It can behave, in other words, like a nite state automaton, with oscillatory or chaotic states. It can therefore be applied to problems like system identi cation and control, and gramatical inference and language recognition problems. We are exploring a version of the architecture that is equivalent to the \simple" recurrent \Elman" networks that are currently being applied to these problems [14, 39, 22]. The capabilities of this system are being investigated by application to the problem of word recognition from character sequences. At low values of the clocked bifurcation parameter, there is no internal \competition" between attractors within a module. The learned internal dynamics is e ectively turned o , and the state vector is determined entirely by a clamped input vector. There is only a single attractor within a module then that is a scalar multiple of whatever input vector is applied. The previous attracting state of a module is completely erased at this point. As the competition increases again, this attractor is forced to a new position in state space determined by the nearest learned attractor. The other learned attractors reappear by saddle-node bifurcations at a distance from this attractor and create other basins of attraction. The greater the clock parameter, the greater the competition between attractors and the faster the convergence to a chosen attractor. At high values of competition the module is \clamped" at its current attractor, and the e ect of the changing input it receives by feedback from the modules undergoing transition is negligable. The feedback between sensory and motor modules is therefore e ectively cut. The super-system can thus be viewed as operating in discrete time by transitions between a nite set of states. This kind of clocking and \bu ering" of some states while other states relax is essential to the reliable operation of digital architectures. In our simulations, if we clock all modules to transition at once, the programmed sequences lose stability, and we get transitions to unprogrammed xed points and simple limit cycles for the whole system. This is a primary justi cation for the use of the \machine cycle".

6.3 Simulations

Simulations show robust storage of oscillatory and chaotic attractor transition sequences in a system with a sinusoidal clock and continuous intermodule driving. We have shown that the dynamic attractors - oscillatory or chaotic - within the di erent modules of this architecture must rapidly synchronize or \bind" to e ectively communicate information and produce reliable transitions [11]. In these simulations, we synchronized lorenz attractors for sucessful operation in the architecture using techniques of coupling developed by Chua [44, 15] for secure communication by a modulated chaotic carrier wave. Because communication between modules in the architecture is by continuous time-varying analog vectors, 32

the process is more one of signal detection and pattern recognition by the modules of their inputs than it is \message passing" in the manner of a more conventional parallel architecture like the connection machine. This is why the demonstrated performance of the modules in handwritten character recognition is signi cant, and why we expect there are important possibilities in the architecture for the kinds of chaotic signal processing studied by Chua [15]. An important element of intra-cortical communication in the brain, and between modules in this architecture, is the ability of a module to detect and respond to the proper input signal from a particular module, when inputs from other modules which are irrelevant to the present computation are contributing cross-talk and noise. We believe that sychronization is one important aspect of how the brain solves this coding problem. Attractors in these modules may be frequency coded during learning if biologically plausible excitatory to inhibitory connections are added in the lower left quadrant of the T matrix. Such attractors will sychronize only with the appropriate active attractors in other modules that have a similar resonant frequency. The same hardware (or \wetware") connection matrix can thus subserve many di erent networks of interaction between modules at the same time without cross-talk problems. Chaotic attractors may be \chaos coded", since only chaotic attractors with similar parameters will synchronize. We have shown in these modules a superior stability of oscillatory attractors over otherwise identical modules with static attractors (frequencies set to zero) in the presence of additive Gaussian noise perturbations with the 1/f spectral character of the noise found experimentally in the brain[21]. This may help explain why the brain uses dynamic attractors. The oscillatory attractor acts like a bandpass lter and is e ectively immune to the many slower macroscopic bias perturbations in the theta{beta range (3 - 25 Hz) below its 40 -80 Hz passband, and the more microscopic perturbations of single neuron spikes in the 100 1000 Hz range. In an environment with this spectrum of perturbation, modules with static attractors cannot operate reliably. This suggests that the hierarchical organization of periodic activity in the brain is a form of large scale frequency coding. Exciting recent work of Brown, Chua, and Popp (unpublished) demonstrates that the vector elds of chaotic systems are robust to noise perturbation in a way that allows them to become detectors of signals buried in noise up to 45db - far below what can be detected by any present conventional signal dection technique. The Brown and Chua result indicates that chaotic attractors may be even more robust to noise perturbation than periodic attractors and lend special signal processing capabilities to the mechanisms of intercortical communication. Many variations of the basic system con guration, and operation, and various training algorithms, such as reinforcement learning with adaptive critics, are being explored. The ability to operate as an nite automaton with oscillatory/chaotic \states" is an important benchmark for this architecture, but only a subset of its capabilities. At low competition, the supra-system reverts to one large continuous dynamical system. We expect that this kind of variation of the operational regime, especially with chaotic attractors inside the modules, though unreliable for habitual behaviors, may nontheless be very useful in other areas such as the search process of reinforcement learning.

7 SUMMARY

The normal form projection algorithm with various extensions and architectures for associative memory storage of analog patterns, continuous sequences, and chaotic attractors in the same network has been described and mathematically analysed. A matrix inversion and a learning rule determines network weights, given prototype patterns to be stored. There are N units of capacity in an N node projection network with 3N 2 weights. It costs one unit per static attractor, two per Fourier component of each sequence, and at least three per chaotic attractor. There are no spurious attractors, and there is a Liapunov function in a special coordinate system which governs the approach of transient states to stored trajectories. Unsupervised or supervised incremental learning algorithms for pattern classi cation, such as competitive learning or bootstrap Widrow-Ho can easily be implemented in this network. Network behavior was illustrated by a discussion of its application to the problem of real time handwritten digit recognition. The architecture can be \folded" into a recurrent network with higher order weights that can be used as a model of cortex that stores oscillatory and chaotic attractors by a Hebb rule. Hierarchical sensory-motor networks may further be constructed of interconnected \cortical patches" of these network modules. Computation then occurs as a sequence of transitions of attractors in these associative memory subnetworks - much as a digital computer evolves by transitions of state in its binary ip- op elements. The architecture thus employs the principle of \computing with attractors" used by macroscopic systems for reliable computation in the presence of noise. 33

Acknowledgements

Supported by AFOSR-87-0317 and a grant from LLNL. It is a pleasure to acknowledge the invaluable assistance of Morris Hirsch, Walter Freeman, Todd Troyer, and Jim Vacarro.

References

[1] C. Ahalt, A. Krishnamurthy, P. Chen, and D. Melton. Competitive learning algorithms for vector quantization. Neural Networks, 3:277{290, 1990. [2] B. Baird. Nonlinear dynamics of pattern formation and pattern recognition in the rabbit olfactory bulb. Physica D, 22:150{176, 1986. [3] B Baird. A bifurcation theory approach to vector eld programming for periodic attractors. In Proc. Int. Joint Conf. on Neural Networks, Wash. D.C., pages 1:381{388, June 1989. [4] B. Baird. Associative memory in a simple model of oscillating cortex. In D. S. Touretsky, editor, Advances in Neural Information Processing Systems 2, pages 68{75. Morgan Kaufman, 1990. [5] B. Baird. Bifurcation and learning in network models of oscillating cortex. Physica D, 42:365{384, 1990. [6] B. Baird. A learning rule for CAM storage of continuous periodic sequences. In Proc. Int. Joint Conf. on Neural Networks, San Diego, pages 3: 493{498, June 1990. [7] B. Baird. Learning with synaptic nonlinearities in a coupled oscillator model of olfactory cortex. In F. Eeckman, editor, Analysis and Modeling of Neural Systems, pages 319{327, 1992. [8] B. Baird. Bifurcation Theory Approach to the Analysis and Synthesis of Neural Networks for Engineering and Biological Modeling. Research Notes in Neural Computing. Springer, 1998. to appear. [9] B. Baird, W. Freeman, F. Eeckman, and Y. Yao. Applications of chaotic neurodynamics in pattern recognition. In SPIE Proceedings Vol. 1469, pages 12{23, 1991. [10] Bill. Baird and Frank H. Eeckman. CAM storage of analog patterns and continuous sequences with 3n2 weights. In D. S. Touretsky, editor, Advances in Neural Information Processing Systems 3, pages 192{198. Morgan Kaufman, 1991. [11] Bill Baird and Frank H. Eeckman. A hierarchical sensory-motor architecture of oscillating cortical area subnetworks. In Frank H. Eeckman, editor, Analysis and Modeling of Neural Systems II, pages 96{104, Norwell, Ma, 1993. Kluwer. [12] Leon O. Chua, Komuro Motomasa, and Takashi Matsumoto. The double scroll family. IEEE Transactions on Circuits and Systems, CAS-33(11):1073{1118, 1986. [13] R. Eckhorn, R. Bauer, W. Jordan, M. Brosch, W. Kruse, M. Munk, and H. Reitboeck. Coherent oscillations: A mechanism of feature linking in the visual cortex? Biological Cybernetics, 60:121, 1988. [14] J.L. Elman. Distributed representations, simple recurrent networks and grammatical structure. Machine Learning, 7(2/3):91, 1991. [15] Tetsuro Endo and Leon O. Chua. Synchronization and chaos in phase-locked loops. IEEE Transactions on Circuits and Systems, 38(12):620{626, 1991. [16] A. K. Engel, P. Konig, C. Gray, and W. Singer. Synchronization of oscillatory responses: A mechanism for stimulus-dependent assembly formation in cat visual cortex. In R. Eckmiller, editor, Parallel Processing in Neural Systems and Computers, pages 105{108. Elsevier, 1990. [17] J. Freeman, W., Y. Yao, and B. Burke. Central pattern generating and recognizing in olfactory bulb: A correllation learning rule. Neural Networks, 1(4):277, 1988. [18] W. J. Freeman and B. W. van Dijk. Spatial patterns of visual cortical EEG during conditioned re ex in a rhesus monkey. Brain Research, 422:267, 1987. [19] W.J. Freeman. Mass Action in the Nervous System. Academic Press, New York, 1975. 34

[20] W.J. Freeman. Simulation of chaotic EEG patterns with a dynamic model of the olfactory system. Biological Cybernetics, 56:139, 1987. [21] W.J. Freeman and B. Baird. Relation of olfactory EEG to behavior: Spatial analysis. Behavioral Neuroscience, 101:393{408, 1987. [22] C.L. Giles, D. Miller, C.B.and Chen, H.H. Chen, G.Z. Sun, and Y.C. Lee. Learning and extracting nite state automata with second order recurrent neural networks. Neural Computation, pages 393{405, 1992. [23] M. Golubitsky and D.C. Schaefer. Singularities and Groups in Bifurcation Theory Volume I. Springer, New York, 1985. [24] C. M. Gray and W. Singer. Stimulus dependent neuronal oscillations in the cat visual cortex area 17. Neuroscience [Suppl], 22:1301P, 1987. [25] C.M. Gray, P. Konig, A.K. Engel, and W. Singer. Oscillatory responses in cat visual cortex exhibit intercolumnar synchronization which re ects global stimulus properties. Nature(London), 338:334{337, 1989. [26] S. Grossberg and S. Elias. Pattern formation, contrast control, and oscillations in the short term memory of shunting on-center o surround networks. Biol Cybernetics, 20:69, 1975. [27] J. Guckenheimer and D. Holmes. Nonlinear Oscillations, Dynamical Systems, and Bifurcations of Vector Fields. Springer, New York, 1983. [28] Lewis B. Haberly and James M. Bower. Olfactory cortex: model circuit for study of associative memory? Trends in Neuroscience, 12(7):258, 1989. [29] J van Hemmen. Nonlinear neural networks near saturation. Phys. Rev. A, 36:1959, 1987. [30] A. Herz, B. Sulzer, R. Kuhn, and J. L. van Hemmen. Hebbian learning reconsidered: Representation of static and dynamic objects in associative neural nets. Biological Cybernetics, 60:457, 1989. [31] M. W. Hirsch and S. Smale. Di erential Equations, Dynamical Systems, and Linear Algebra. Academic Press, N.Y., 1974. [32] J.J. Hop eld. Neurons with graded response have collective computational properties like those of two state neurons. Proc. Natl Acad. Sci. USA, 81:3088, 1984. [33] D. Horn and M. Usher. Parallel activation of memories in an oscillatory neural network. Neural Computation, 3(1):31{43, 1991. [34] J.J.B. Jack, D. Noble, and R.W. Tsien. Electric Current Flow in Excitable Cells. Clarendon Press, Oxford, 1983. [35] D. Kleinfeld. Sequential state generation by model neural networks. PNAS, 1986. [36] S.R. Lenihan. Spontaneous oscillations in neural networks. Technical report, U.S. Army YumaProving Ground, 1987. [37] Z. Li and J.J. Hop eld. Modeling the olfactory bulb and its neural oscillatory processings. Biological Cybernetics, 61:379, 1989. [38] T. Maxwell, C.L. Giles, Y.C. Lee, and H.H. Chen. Nonlinear dynamics of arti cial neural systems. In Neural Networks for Computing - AIP Conf. Proc. 151, page 299, New York, 1986. AIP. [39] J.B. Pollack. The induction of dynamical recognizers. Tech Report 90-JP-Automata, Dept. of Computer and Information Science, Ohio State Uinv., 1990. [40] D. Psaltis, C. H. Park, and J. Hong. Higher order associative memories. Neural Networks, 1(2):149, 1988. [41] Dmitri Psaltis and Santosh S. Venkatesh. Neural associative memory. In Y.C. Lee, editor, Evolution, Learning, and Cognition. World Scienti c, 1989. 35

[42] H. Sompolinsky and L. Kantner. Temporal association in asymmetric neural networks. Phys. Rev. Lett., 57:2861, 1986. [43] P. K. Stanton and T. Sejnowski. Storing covariance by the associative long-term potentiation and depression of synaptic strengths in the hippocampus. In D. Touretzky, editor, Advances in Neural Information Processing Systems I, page 402. Morgan Kaufmann, 1989. [44] Y.S. Tang, A.I. Mees, and L.O. Chua. Synchronization and chaos. IEEE Transactions on Circuits and Systems, CAS-30(9):620{626, 1983. [45] DeLiang Wang, Joachim Buhmann, and Christoph von der Malsburg. Pattern segmentation in associative memory. Neural Computation, 2(1):94{106, 1990.

36