A Fast Mutual Information Calculation Algorithm - CiteSeerX

7 downloads 0 Views 202KB Size Report
C-Compiler under 386BSD Unix. Both Fraser's and our algorithm organize their compu- tations recursively according to the structure of the given attractor.
A Fast Mutual Information Calculation Algorithm Hans-Peter BERNHARD1

Gernot KUBIN2

Institut fur Nachrichtentechnik und Hochfrequenztechnik, Vienna University of Technology, Gusshausstr. 25/389, A-1040 Vienna, Austria, Tel/Fax: +43 1 58801 3524 / +43 1 5870583, 1Email: [email protected] 2 Email: [email protected]

Abstract. In modern signal analysis, chaos-theoretic interpretation is becoming more and more popu-

lar. One of the most important parameters of chaotic systems is their information gain or entropy rate. Its measurement provides performance limits for nonlinear signal models in source coding and prediction applications. We have developed a fast algorithm for the estimation of the entropy rate for attractors reconstructed from nite-length signals with the method of delays. With conventional algorithms, the computational load increases almost exponentially with the reconstruction dimension. We exploit dynamical memory organization principles to arrive at a linear increase, i.e. whenever we increment the reconstruction dimension by one, the calculation time of the new algorithm increases only by 1=10 of the calculation time needed in two dimensions. Furthermore, the calculation time is proportional to N log2 N where N is the signal length.

1. Introduction

where

I (X (t + T ); X~ nT (t)) = RTn ? RTn?1 (2) T with the redundancy Rn de ned later on in equation (3).

If we want to apply chaos-theoretic tools to the analysis of signals , it is necessary to nd out whether the given signal can be interpreted as the output of a nonlinear dynamical system . Generally, we may have only incomplete (if any) knowledge about the nonlinear equations of the underlying signal generation system. The signal is our primary knowledge source and, therefore, it should be used to estimate several parameters relevant for the description of the underlying system. Most of these (such as dimension, Lyapunov exponents, and entropy rate) can be determined by viewing stationary signal segments as lying on attractors (i.e. steadystate solutions) of the system. For a scalar real-valued signal x(t), an attractor can be reconstructed in the following way: we assemble several delayed samples x(t ? mT ) in an n{dimensional `state' vector ~xTn (t) = [x(t); x(t ? T ); : : : ; x(t ? (n ? 1)T )] where T is the delay time parameter and n is the dimension parameter of the reconstruction. If we follow ~xTn (t) as a function of time t, we obtain a trajectory in an n{dimensional `state space'. If the observed signal segment is stationary, the trajectory corresponds to an attractor of the underlying system. This can be associated with a stationary probability measure which is de ned on the ensembles of signal samples X (t) and state vectors X~ nT (t) where capital letters serve to distinguish statistical ensembles from instantaneous values. To calculate the entropy rate of the underlying system, we measure the mutual information I (X (t + T ); X~ nT (t)) between the ensemble of future signal samples X (t + T ) and the ensemble of state vectors X~ nT (t). If n is chosen high enough (such that the reconstruction is an embedding of the true attractor) the mutual information saturates at a value which corresponds to the amount of information predictable over a time delay T . This procedure is then applied e.g. for two di erent reconstructions of the same attractor where we use the delay time parameters T1 and T2 , respectively. >From this, the entropy rate is obtained as

~ T1 ? I (X (t + T2 ); X~ nT2 (t)) ; h = I (X (t + T1 ); Xn (tT)) ? T 2

1

R. Shaw describes in [SR81] the entropy rate properties of nonlinear dynamical systems. If we know the systems entropy rate we are able to decide whether it operates in a chaotic or in a non-chaotic regime. In the former case, the entropy rate is greater than zero and equal to zero in the latter case. Beyond the sign of the entropy rate, its magnitude is also an important quantitative measure. Positive entropy rate can be interpreted as information gain: observing the evolution of the system allows to continuously re ne the information about its (unknown) initial condition. In source coding applications, the entropy rate of a nonlinear deterministic signal model de nes the number of `new bits' produced per time unit and provides a performance limit complementary to the prediction gain for linear stochastic signal models.

2. The Fast Algorithm

In the following, we present a fast mutual information (FMI) calculation algorithm as needed for the estimation of the entropy rate of nite-length signals. We use existing algorithms such as the one published by Fraser in [Fra89] as a baseline for our performance comparisons1 Computation of the mutual information is based on the (implicit) estimation of an n-dimensional probability density function. Therefore, the computation load usually increases drastically with increasing reconstruction dimension n. This can be seen in gure 1 which shows the calculation time plotted versus the reconstruction dimension for Fraser's algorithm implemented on a 486 PC using the GNU C-Compiler under 386BSD Unix. Both Fraser's and our algorithm organize their computations recursively according to the structure of the given attractor. The algorithms partition the entire state space recursively in smaller and smaller hypercubes, which ultimately serve as histogram classes for the estimation of the

(1)

1 The C-code of Fraser's algorithm is available by anonymous ftp from the internet server lyapunov.ucsd.edu. The FMI algorithm can be obtained by contacting the rst author at the above email address.

0 This work has been supported by the Austrian Science Foundation (FWF) through grant P08779-TEC.

To appear in Proc. EUSIPCO-94

1

c 1994

Calculation time in [sec]  500 6  450  Fraser's algorithm  400   350  250  200   150 % % 100 " " FMI algorithm (  50 ( ( ( (( 2 3 4 5 6 7 8 9 10 11 12 13 14 15

the partition by a simple n-dimensional array. An array like this would clearly exceed all memory limits if the partitioning becomes ner. Therefore we store information of the tree as a system of linked vectors. For explanation we refer to gure 2 where nodes connected by pointers can be seen. The nodes denote a hypercube in phase space and the pointers represent the dependency between the nodes or cubes. Within the rectangular boxes of gure 2 the recursion depth is numbered. At the recursion level 0; a single cube is de ned that contains all vectors of the complete state space. This initialization is necessary to start the recursive algorithm with equation (5). The recursive algorithm operates in the following way independent of the recursion depth: Dividing level 0 eP

Total computation time versus reconstruction dimension

Figure 1.

Q PPP Q P s e PP qe P e?0 Q 2 ? J A 1 ? A J e ?A Ue e ^ e e? e?0 J Z BS Z1 0 1 2C 3 BS Z  C BBN S ZZ  CCW   ? ~ we e e e e eS

multidimensional probability density function. In our considerations we assume every hypercube is uniquely indicated by an m-tupel Km . m denotes the recursion depth of the state space dividing process. To address the subcubes one dividing level deeper than m, we simply extend the m-tupel by one number to an m + 1-tupel Km ; j , where j de nes one subcube contained in the cube determined by Km . In this context, a class will be de ned as a hypercube in state space which contains no further measurable details about the attractor structure. In other words, if the n-dimensional state vectors are uniformly distributed over a hypercube (no further substructure can be found) this cube is not further partitioned but retained as a class. The number of vectors in one class is given by N (Km), so we can evaluate the mutual information with the resulting ndimensional histogram. This calculation is done by considering the redundancy in di erent state space reconstructions (2). The redundancy is given by RTn = Px1 xn (Km)ld P (PKx1 ) :x:n:(PKm()K ) : (3) xn m x1 m K

0 1 2

Figure 2.

cubes

m

where Km denotes the sum over all hypercubes in the ndimensional state space considered as a class. In equation (3), xn denotes the n-th component of the reconstructed state space and Px1  xn (Km ) is the probability of state vectors to fall in class Km , given by (4) P (K ) = N (Km) : x1 xn

m

N

For a more detailed description we refer to [Fra89],[FS86] and [BK91]. With these de nitions we can evaluate the recursive procedure given in [FS86] where (5) RT = 1 F (K ) ? ldN; n

N

0

denotes the starting equation. We proceed with the recursion n X F (Km) = nN (Km) + F (Km ; j ) 2

j=1

1 2 3

Global structure: Recursive tree of linked

First, the distribution of the state vectors contained in the current cube is tested, and if we nd a uniform distribution, the entire cube is considered a homogeneous class. The contribution of this cube to the total redundancy result is determined from equation (7). Here N (Km ) is the number of vectors within the considered cube. But, if we can nd that the cube contains substructure we divide it further in subcubes. The distribution structure is tested with the chi-square test where the null-hypothesis is a uniform distribution of the state vectors. If the null-hypothesis fails, we suppose that the considered cube is not a class, and it has to be divided in subcubes. The subcubes are obtained from binary splits in each dimension. Therefore, the number of possible subcubes is 2n . In gure 2, the analysis of one particular attractor structure is schematically plotted. A node with pointers connected to other nodes symbolizes a partition of a cube into subcubes. Additionally, we de ne that one cube has to contain at least one state vector. If it is empty the cube does not contribute to the redundancy result at all and we do not process this cube in any way. Hence the number of pointers originating from a node can be less than 2n . This tree pruning is one of the reasons why our algorithm is very fast. Particularly in high dimensions, we observe that the calculation time of our FMI algorithm increases linearly with the number of dimensions instead of the exponential increase found in Fraser's implementation, see also gure 1. Moreover, to avoid a huge number of calculations every single state vector is processed only one time at each dividing level. To this end, the set of state vectors in one cube is divided up in subsets by the following operation. Initially, no subcube structure is given and we process every vector in the cube sequentially. For each state vector, its subcube location is determined with n threshold comparisons (one per dimension) and it is assigned as a subcube number. If a subcube number appears for the rst time, this subcube is created. In this context, creation means that, in the schematic plot of

X

P

3 0 1

0

(6)

and terminate if there exists no substructure with the terminating equation in every single class F (Km) = N (Km )ldN (Km): (7) The computation is based on a recursive tree of histogram classes. To deal with it we have to use an ecient description because it is not possible to indicate every element of 2



n-dimensional array containing n thresholds of the current subcube division;  Lnk cub call indicates if a cube contains at least one state vector or not;  Lnk cub ptr pointer to the last state vector of a cube; 2. The Recursion F MIND is shown in gure 4. Whenever this function is called new cube borders xg are calculated locally for this cube. Therefore if the function is left the borders are restored to their original values at calling time. this cube represents the current input cube, and indicates the cube to be divided in subcubes by the function NN MIND. The distribution of all state vectors in the cube is tested next by the function SUBSTRUCTURE. If there is no structure the recursive function returns the result of equation (7). Equation (6) is used to proceed with further recursion steps, where the return variable f is initialized with nN (Km ) of equation (6). The return values of the function calls are added to the return value f. 3. The dividing procedure NN MIND expects one pointer as input indicating the cube to be divided. All of the several vectors connected by LNK are processed and for each of them the subcube number elem is de ned. In Lnk cub call the current value of NN CALLS is stored if this cube contains at least one state vector. Hence if NN CALLS and Lnk cub call[elem] are not equal the actual NN CALLS is inserted, and the algorithm creates a new cube as explained in the last section. If the subcube already exists the current vector is attached to the set of subcube vectors. We count also the number of vectors in the array N. Accordingly, the element of N which has the same index as the rst vector in the subcube is incremented. For an e ective and fast indexing we use Lnk cub ptr to de ne the last element of a subcube. To determine the redundancy we start with the equation (5) where the recursive procedure F MIND is called the rst time.

gure 2, one new node is inserted and it is connected to the current node by a pointer from cube to subcube. The new node initially consists of only one state vector. Otherwise, if a vector belongs to an already existing subcube, the vector is attached to this subcube. The attaching of vectors requires an internal organization of vectors within one cube which can be seen in gure 3. Every single box symbolizes one state vector. For every vector, a pointer indicates the next vector that belongs to the same cube. This linked list of vectors is cyclically closed with the last vector pointing to the rst vector of the subcube. So for any given state vector in a cube we can access all state vectors belonging to the same cube. Consequently, it is also fast and easy to add a new vector to the subcube. The pointer of the last state vector (initially pointing to the rst one) is linked to the new added vector. The pointer of the new vector links itself to the rst vector. Now the added vector is the last element of the subcube. ?

??

?

1 2 3 4 5 6 7 8 9 10 11 12 66

6 6

subcube1

Figure 3.

6

6

subcube2

Cube structure: Cyclic list of linked vectors

Likewise, just as the state vectors, the subcubes are linked among themselves: the rst (and maybe only) state vector of the subcube has an additional pointer which points to the next subcube. The last subcube points again to the rst. For speci cation of one entire cube, only one vector pointing to the rst element of the rst subcube is used. To evaluate equation (6) for every recursion step, the recursive procedure has to be called for every subcube. Basically only one argument, the pointer to the subcube, is passed as argument to the recursive procedure. Subsequently, the procedure starts with the distribution test again. Reviewing the algorithm, we begin at level 0 with the entire attractor lying in a single cube. To describe the attractor structure, we divide it up in subcubes of equal size. Proceeding recursively, subcubes are analyzed in the same way. The dividing procedure stops if no signi cant substructure is found. Parallel to the dividing operations, the redundancy is calculated.

double { double long

xg

F_MIND(LONG nl, LONG this_cube) f; next_n,n;

DEFINE_NEW_BORDERS(xg); NN_MIND(this_cube); if (SUBSTRUCTUR(this_cube)) { f = (DOUBLE)(nl*DIM); n = this_cube; do { next_n = LNK_CUBE[n]; if (N[n] > 1) f = f + F_MIND(N[n],n); n = next_n; } while (n != this_cube); } else f = (double)nl*ld((double)nl); RESET_OLD_BORDERS(xg); return(f); }

3. Implementation

The practical implementation in C language consists of three main parts. 1. Initialization of the state vectors, arrays and variables according to the following list:  DIM reconstruction dimension;  VAL MI n-dimensional array with the reconstructed state vectors (to save memory it can be 1dimensional and the vectors are reconstructed during the calculation);  N array of integers containing the number of vectors in the subcube.  NN CALLS actual number of calls to the dividing function;  LNK array of pointers to link state vectors (see gure 3);  LNK CUBE array of pointers with the same structure as LNK to describe the cubes for one dividing step from cube to subcube;

Figure 4.

C-Code for the recursive procedure

4. Results and Conclusion

When progressively partitioning a hypercube in 2n subcubes, many of the subcubes may be empty, i.e. they do not contain a single vector. Still, direct processing of these empty subcubes results in the exponential increase of the calculation time, as can be seen in Fraser's algorithm. Therefore, it 3

void { long

NN_MIND(LONG

n,elem,last_cube,act_T,act_cub,next_act_T;

NN_CALLS++; last_cube = this_cube; act_T = this_cube; next_act_T = LNK[act_T]; do { elem = 0; for (n = 0; n < DIM; n++) elem += ((*(VAL_MI[n]+act_T) >= *(xg+n))