Tracking time-varying causality and directionality of information flow ...

PHYSICAL REVIEW E 86, 051919 (2012)

Tracking time-varying causality and directionality of information flow using an error reduction ratio test with applications to electroencephalography data Yifan Zhao, Steve A. Billings,* and Hualiang Wei Department of Automatic Control and System Engineering, University of Sheffield, Sheffield, United Kingdom

Ptolemaios G. Sarrigiannis Department of Clinical Neurophysiology, Sheffield Teaching Hospitals NHS Foundation Trust, Royal Hallamshire Hospital, Sheffield, United Kingdom (Received 21 May 2012; published 29 November 2012) This paper introduces an error reduction ratio–causality (ERR-causality) test that can be used to detect and track causal relationships between two signals. In comparison to the traditional Granger method, one significant advantage of the new ERR-causality test is that it can effectively detect the time-varying direction of linear or nonlinear causality between two signals without fitting a complete model. Another important advantage is that the ERR-causality test can detect both the direction of interactions and estimate the relative time shift between the two signals. Numerical examples are provided to illustrate the effectiveness of the new method together with the determination of the causality between electroencephalograph signals from different cortical sites for patients during an epileptic seizure. DOI: 10.1103/PhysRevE.86.051919

PACS number(s): 87.10.−e, 05.45.Tp, 05.10.−a, 87.19.le

I. INTRODUCTION

The detection of hidden interdependencies between the components of complex dynamic systems is an important problem that arises in many research fields. There are several ways to tackle this problem based on using either an explicit generative model that embraces the known nonlinear causal architecture [1] or by simply establishing statistical dependencies between two signals using coherence, phase synchronization, or the Granger causality test. The latter approaches are usually more viable and many methods based on these ideas have been developed recently and applied to the analysis of electrophysiological signals, such as directed coherence and partial directed coherence [2–4]. However, cross-correlation-based methods have two possible drawbacks: one is the requirement for reasonably long data sets, and the other is that correlation can fail to detect nonlinear interactions. The minimal required data window size to achieve correct results is important because too wide a window will decrease the temporal resolution of the analysis, which can be fatal if the causal relationship changes rapidly over time, and too narrow a window reduces the statistical reliability. Signals sampled from the real world are rarely stationary and well behaved, and causal interactions and couplings can typically appear, disappear, and reappear and may become weaker or grow stronger over time. Moreover, most complex systems exhibit nonlinear dynamic behaviors, which may lead to a possible failure of the cross-correlation method. Another established way to solve the causality detection problem is by mutually predicting selected observable measurements based on multivariate modeling. Many methods based on this idea have been proposed recently and one of the best established methods is based on the Granger causality test. The key idea of this method is that if a signal

*

[email protected]

1539-3755/2012/86(5)/051919(11)

X causes a signal Y , the knowledge of the past of both X and Y should improve the prediction of the present of Y in comparison with the knowledge of the past of Y alone. Many new methods have been developed which extend this idea [5–7]. However, all these methods are model-based and require that the system model is fully known or that an unbiased model can be fitted to the data sets before the Granger test can be applied. This is far from straightforward when the underlying system relationships are nonlinear and dynamic and the measured observations are noisy because, unless a complete and full model which accounts for any potentially nonlinear noise effects is estimated, the Granger test results will be compromised. Recently some nonparametric methods have been developed and one of the well-established methods is based on entropy methods [8,9] that allow nonlinear and model-free estimation of Granger causality based on mutual information. The results in [9] are impressive and show good detection of causality on simulated examples and real data sets. There are both advantages and disadvantages to current parametric and nonparametric methods, which typically means that the specific properties of the system under investigation have to be assessed to decide which method is more appropriate in any given application. However, in general, nonparametric methods tend to require longer data sets or averaging over many realizations to mitigate the effects of noise. The noise on the signals will usually be unknown prior to analysis but simple averaging methods will not work well if the noise is highly correlated and nonlinear, which may be expected if the causal relationships are also nonlinear. The advantage of the nonparametric methods is that no assumptions regarding the form of a model are needed. Parametric methods typically require the specification of a model form and estimation of this model. Even correlated nonlinear noise can be accommodated but only if this is specifically modeled. This can lead to complex modeling procedures but tests like the Granger causality test will only be reliable if an unbiased model is estimated and, while the parametric methods will typically

051919-1

©2012 American Physical Society

ZHAO, BILLINGS, WEI, AND SARRIGIANNIS


work well for much shorter data lengths that nonparametric methods, the former can involve complicated model estimation procedures. In the present study we introduce a causality test that overcomes most of the disadvantages of existing methods. Our causality detection method will be referred to as the error reduction ratio–causality (ERR-causality) test. The key advantage of this test is that it can be applied to nonlinear dynamic systems and, unlike the Granger-based tests, our method does not depend on the full knowledge or estimation of a complete and unbiased system model. By exploiting an important property of the ERR test that is part of the orthogonal least squares algorithm it is shown that the causal flow can be detected even when the model is incomplete. This is a significant advantage when the underlying system is nonlinear and dynamic and the measurements may be noisy, because a complete and full model including a nonlinear noise model, which would normally be required to yield unbiased model estimates, is not required and indeed not even the full parameter estimates are used in the test. These advantages mean that the test is relatively easy to apply and can be used to track fast transitions between causal effects to detect the direction of linear or nonlinear causal interactions, determine the strength of these interactions, and provide an estimate of the time lag shift between two directional signals. Numerical examples are used to illustrate the test to show the performance of the method together with a real application to electroencephalograph (EEG) data. II. METHODS

Let X = {x(t)} and Y = {y(t)} be two signals, with t = 1, . . . ,M, where M is the data length. The aim of this paper is to measure the causal interaction over time between these two signals. The results can be, for example, at a specific time, the signal X causes Y , Y causes X, and no interaction or bidirectional interaction occurs between them. Both X and Y may have been caused by an input u but here u is unknown and unmeasurable. For a complex system, the causality is often time-varying and the interaction is often dynamic and nonlinear, which makes the problem more challenging. This section begins with a brief introduction of the Granger causality tests, and then presents our ERR-causality test.

and vice versa for the causality from Y to X. The advantage of this method is that, if the model structure is chosen appropriately, it can tackle both linear and nonlinear systems. The test is also able to detect a bidirectional causality because the causalities from Y to X and from X to Y are calculated separately. The required window size of sampled data depends on the dynamical properties of the original signals and the complexity of the chosen model structure. One possible problem for this method is that if the model structure is not chosen appropriately, for example, the model is missing a significant term or terms, the calculated Granger causality may not be reliable, which will be demonstrated in a simulation example. Because this test is based on an accurate model, noise models may need to be estimated to ensure the model is unbiased. For nonlinear relationships this will often require a nonlinear noise model. Fitting complete nonlinear dynamic system and noise models is a significant overload for this test. Moreover, this method can only detect the direction of signal flow, but it is not able to provide quantitative insight into the time shift between the two signals. B. Adaptive-forward–orthogonal least squares

The orthogonal least squares (OLS) algorithm [11–13] is a popular approach that has been widely used in nonlinear system identification where the orthogonal least squares searches through all the possible candidate model terms to select the most significant model terms, which are then included to build models term by term. The significance of each of the selected model terms is measured by an index, called the error reduction ratio, which indicates how much (in percentages) of the variance change in the system response can be accounted for by including the relevant model terms. Complex nonlinear dynamic models and nonlinear noise models can all be identified using this algorithm. This section introduces an adaptive-forward-OLS algorithm which will be used later by modifying the well-known forward-regression version of OLS [11]. Consider the linear regression function y(t) =

N

pi (t)θi , t = 1, . . . ,M,

(2)

i=1

A. Granger causality

A well-established approach to detect causality in both linear and nonlinear systems is the Granger causality test [10]. To calculate the Granger causality of X to Y , a model has to be pre-established to define the relationship between the output Y and its past information Y − and the past information of the input X− , expressed as Y = f (Y − ,X− ). Based on the sampled data the parameters in the model f (Y − ,X− ) have to be estimated and then the predictions of Y based on Y − alone and on Y − and X− are generated. In both cases, the accuracy of prediction may be expressed by the variance of the prediction errors for two-dimensional modeling, var(Y |Y − ) and var(Y |Y − ,X− ). The Granger causality of X to Y , GX→Y , is defined by GX→Y = ln

var(Y |Y − ) var(Y |Y − ,X− )

(1)

where y(t) is the dependent variable or the term to regress upon, pi (t) are regressors, θi are unknown parameters to be estimated, and M denotes the number of data points in the data set. Equation. (2) can be written as Y = P , where ⎤ ⎤ ⎡ T ⎤ ⎡ ⎡ P (1) θ (1) y(1) ⎥ ⎢ ⎥ ⎢ ⎢ . ⎥ .. ⎥ ⎢ ⎥, = ⎢ .. ⎥, (3) Y =⎢ . ⎦ ⎣ . ⎦ ⎣ .. ⎦, P = ⎣ θ (N ) y(M) P T (M) and P T (t) = (p1 (t), . . . ,pN (t)). Matrix P can be decomposed as P = W × A, where ⎤ ⎡ w1 (1) . . . wN (1) ⎥ ⎢ . .. .. ⎥ (4) W =⎢ . . ⎦ ⎣ .. w1 (M) . . . wN (M)

051919-2

TRACKING TIME-VARYING CAUSALITY AND . . .


is an orthogonal matrix because M

M T 2 2 W W = Diag w1 (t), . . . , wN (t) t=1

(5)

t=1

and A is an upper triangular matrix with unity diagonal elements, ⎡ ⎤ 1 a12 a13 · · · a1N ⎢ 1 a23 · · · a2N ⎥ ⎢ ⎥ ⎢ ⎥ .. .. .. ⎥. (6) A=⎢ . . . ⎢ ⎥ ⎢ ⎥ ⎣ 1 aN−1N ⎦ 1 Therefore, Y = P can be rewritten as Y = W G, where G = A = [g1 , . . . ,gN ]T . The estimation of the original parameters can be computed from θˆN = gˆ N , N (7) θˆi = gˆ i − k=i+1 aik θˆk , i = N − 1, . . . ,1. In traditional forward OLS, the cutoff value of ERR, Coff , to stop the search procedure and determine the number of significant terms can be difficult to select, especially when the level of noise is unknown. Recently, several criteria based on ERR have been developed to monitor and stop the search procedure [14–16]. In this paper we introduce an adaptiveforward-OLS algorithm by utilizing the penalized error-tosignal ratio

n 1 χn = ri 1− (8) (1 − λn/M)2 i=1 to monitor the regressor search procedure, where n denotes the number of selected terms, M denotes the total number of sampled data, and ri is the error reduction ratio for each term. The search procedure stops when χn arrives at a minimum. The effect of the adjustable parameter λ on the results is discussed in [17], in which it is suggested that λ should be chosen between 5 and 10. The value of λ is chosen as 6 for all the examples in this paper based on experience, but other values in this range have also been tested and the results remained correct and unchanged. The whole procedure of the adaptive-forward-OLS algorithm can be summarized as follows: M t=1 w1 (t)y(t) . (1) Set a11 = 1, w1 (t) = p1 (t), and gˆ 1 = M w 2 (t) t=1

(2) Set akk = 1 and then calculate aik =

1

M t=1 wi (t)pk (t) M 2 t=1 wi (t)

for k = 2, . . . ,N, where i = 1, . . . ,k − 1. Next, calculate M k−1 t=1 wk (t)y(t) . ERR is wk (t) = pk (t) − i=1 aik wi (t), and gˆ k = M 2 t=1 wk (t) used as a criterion for model structure selection and is defined as gˆ 2 M w 2 (t) rk = kMt=1 k . (9) 2 t=1 y (t) (3) Calculate the value of χk based on (8). The search procedure stops when χk arrives at a minimum. Ideally, the trend of χk takes the shape of a valley if values of χk for all candidate terms are calculated. The search procedure stops

when χk arrives at the valley. Practically, it stops at the point before χk starts to rise. If there is no minimum at all, the search will continue until all candidate terms are scanned. Noise modeling, which will often be required to ensure unbiased models, is described in [18,19]. C. A Granger causality test based on the adaptive-forward-OLS algorithm

In this section a modification of the Granger test will be introduced based on the modeling algorithm in Sec. II B above. When applying the Granger test, the model is either pre-known, which is often impossible especially for real systems, or the model structure has to be detected and a model estimated as the initial step. It has been shown [20] that the estimator introduced in Sec. II B combines structure determination, parameter estimation, and noise modeling, and when coupled with model validity tests, it is particularly powerful in identifying parsimonious models for structure-unknown systems. The adaptive-forward-OLS algorithm therefore can be a part of the Granger method to improve the identification performance and hence enhance the detection capability of the Granger test [21]. D. The ERR-causality test

Our ERR-causality test is introduced in this section by tackling the problem another way to detect causality between two signals without the identification of a full or complete model, which is different with the method introduced in Sec. II C. It is shown that this test has significant advantages compared to existing tests and can be readily applied to linear and nonlinear dynamic systems even with noise-corrupted measurements. Consider a bivariate autoregressive (ARX) model x(t) =

y(t) =

px

ai x(t − i) +

py

i=1

j =1

qy

qx

bi y(t − i) +

cj y(t − j ) + ex (t), (10) dj x(t − j ) + ey (t),

j =1

i=1

where ex (t) and ey (t) denote noise sequences, which can be either white noise or colored noise. Obviously, if cj = 0, j ∈ {1, . . . ,py }, and dj = 0, j ∈ {1, . . . ,qx }, this is a typical bidirectional system, which means X leads Y , and at the same time Y leads X. Consider initially the causality from X to Y . A nonlinear autoregressive with exogenous inputs (NARX) model [22] constructed using basic function expansions using a linear-in-the-parameters form is introduced to express Y as N y(t) = θi φi (t) + e(t), (11) i=1

where θi are unknown parameters, N is the number of the total potential model terms involved, and φi (t) = φi (ϕ(t)) are model terms generated from a candidate terms set; for example, ϕ(t) can be ϕ(t) = {1,y(t − 1), . . . ,y(t − ny ),x(t − 1), . . . ,x(t − nx )}T, (12) which includes some simple linear components from the past information of X and Y .

051919-3



Instead of generating a complete model that has to pass the validity tests, the ERR-causality test can be summarized in the following: Initially, construct a candidate terms set which typically includes past information of Y and past information of X. Apply the adaptive-forward-OLS algorithm and compute ERR and the Penalized Error-to-Signal Ratio (PESR) values. If the significant terms selected by the adaptive-forward-OLS algorithm in Sec. II B include any term from the past information of X, this indicates that signal X leads Y during the considered time duration [t − h/2,t + h/2], where h denotes the sampling window size. The ERR-causality from X to Y at time t, expressed as FX→Y (t), is then defined as 1. If no component from the past information of X is included in the selected significant terms, this indicates that X has no interaction with Y during [t − h/2,t + h/2], and FX→Y (t) is defined to be 0. The strength of FX→Y (t) can be estimated using the summed ERR values of all the selected terms from X− , the maximum strength being 1. The time shift of X → Y is defined as the time lag of X in the first term ranked by the values of ERR. The selection of the candidate terms set can be much more complicated than (12). For example, a possible candidate terms set with some nonlinear components is ϕ(t) = {y(t − 1),y(t − 2), . . . ,y(t − ny ), x(t − 1),x(t − 2), . . . ,x(t − nx ), y(t − 1)x(t − 1), . . . ,y(t − 1)x(t − nx ), y 2 (t − 1),y 2 (t − 2), . . . ,y 2 (t − ny ), x 2 (t − 1),x 2 (t − 2), . . . ,x 2 (t − nx )}T .

y(t) =

wi (t)gi + ζ (t), t = 1, . . . ,M,

y(t) =

Np

N

wi (t)gi +

wj (t)gj + ζ (t).

(15)

j =Np +1

i=1

Now (15) can be rewritten as Np

y(t) =

wi (t)gi + e(t),

(16)

i=1

where N

e(t) =

wj (t)gj + ζ (t)

(17)

j =Np +1

represents missing model terms and can be viewed as colored noise and may not have zero mean. Squaring both sides of (16) and taking the expected value gives Np

Np wi2 (t)gi2 + 2E wi (t)gi e(t) E[y 2 (t)] = E i=1

i=1

+ E[e2 (t)].

(13)

If any significant term that includes any component from the past information of X is chosen in the ERR-causality test, this method, theoretically, is able to observe the causality from X to Y , even though ϕ(t) may not include a complete set of all the correct terms of the system. This advantage is based on the fact that the order of the ERR values or the order of term selection produced by the adaptive-forward-OLS algorithm is correct even when a complete model is not estimated. This important result, which is fundamental to the ERR-causality test, will be proved next. Before modeling, an initial set of candidate terms has to be chosen. For simulation data, traditionally, the set of candidate terms should be chosen to be quite large to try to include all the correct terms. However, a large candidate set always increases the computational time. A new fast Cellular Automata-Orthogonal Least Squares (CA-OLS) method [23] has recently been proposed that is able to adaptively search for the correct neighborhood without any preset tolerance and can considerably reduce the computational complexity and memory usage. For real data, the choice of the initial candidate terms is more challenging as the correct terms or neighborhood will always be unknown. Experience and prior information of the model structure are important in such situations. However, model validity tests can be used in conjunction with the term selection algorithms to determine whether important terms are missing. Also the ERR values should sum close to 100% and this is a good indicator of whether the correct initial terms have been selected. Consider the model N

where the first N terms represent all the correct model terms and ζ (t) is a white noise sequence with zero mean. Assume only Np terms are selected and the other terms are not considered in ϕ(t). Note that noise terms can be included in the N terms in the model (14), which should then reduce ζ (t) to be white. Then (14) can be expressed as

(18)

Obviously, E

Np

2

=E

wi (t)gi

Np

i=1

wi2 (t)gi2

(19)

i=1

because w(i) are orthogonal, i.e., w(i)w(j ) = 0 (i = j ). Then (18) can be rewritten as Np M M 1 2 1 2 y (t) − wi (t)gi2 = α, M t=1 M i=1 t=1

where

α = 2E

Np

(20)

wi (t)gi e(t) + E[e2 (t)].

(21)

i=1

Replacing e(t) in (21) by (17) gives Np

N α = 2E wi (t)gi wj (t)gj + ζ (t)

(14)

i=1

051919-4

j =Np +1

i=1

+E

N

2

wj (t)gj + ζ (t)

j =Np +1

N

= 0+0+E

wj2 (t)gj2

+ E[ζ 2 (t)]

j =Np +1

=

1 M

M

N

t=1 j =Np +1

wj2 (t)gj2 +

M 1 2 ζ (t). M t=1

(22)



M 2 Substituting α back into (20) and dividing M1 t=1 y (t) to both side produces Np 1 M w 2 (t)gi2 1 − i=11 MMt=1 i 2 t=1 y (t) M M N 1 1 M 2 2 2 t=1 j =Np +1 wj (t)gj + M t=1 ζ (t) M = . (23) 1 M 2 t=1 y (t) M

data set was used in this paper [24]. The phase of the Fourier transform of Y was randomized, while the amplitude of the Fourier transform of Y was preserved. Then the surrogate data were generated by applying the inverse Fourier transform. The functions fft and ifft in MATLAB with default settings were used. This procedure was repeated 100 times and then the 95% quantile was determined as the threshold for each time point.

III. SIMULATION STUDIES

Based on the definition of ERR in (9), finally this yields 1−

Np

ri =

i=1

N j =Np +1

rj +

σζ2 σy2

(24)

,

where σζ and σy denote the standard deviation of ζ (t) and y(t), respectively. Equation (24) implies that the ERR values for the selected terms can be calculated quite independently of the unselected terms of the correct term set. Hence, the proposed ERR-causality method can always provide a correct order of significant terms without fitting a complete model. In other words, even when not all the significant terms are included in ϕ(t), this method should still detect the causality. Notice also that only the ERR values are used in the test; there is no requirement to fit a complete model or no requirement to even estimate all the model parameters. In the nonlinear case this means that shorter data windows can be used and that the computation time is faster. The ERR-causality test is therefore a more powerful and robust causality detection method, and, moreover, many time-consuming calculations can be considerably avoided or reduced because the search procedure is monitored by PESR and no further parameter estimation is required. Notice that the method automatically defaults to select just linear terms if the relationship is linear but can also accommodate complete nonlinear dynamics relationships without full model estimation.

This section discusses the efficiency and performance of the proposed algorithm using three numerical examples to demonstrate the advantages of our method in flexibility of term selection and its good performance against noise and also to assess its performance when tackling bidirectional signals. A. Example 1

This example aims to demonstrate the flexibility of our proposed method in term selection and to compare the method with the Granger test. A total of 1000 data points were generated using a time-varying model based on the definition in Table I, where ζy(t) and ζx(t) were white noise sequences with zero mean and a standard deviation σ = 0.1, and r(t) denotes a random data sequence uniformly distributed in [−1,1]. To save space, the notation y(t − 1) is simplified as y −1 , and so on. The true time-varying causality is shown in Fig. 1, which together with the model clearly indicates that signal X leads Y at time 100-300 and that signal Y leads X at time 500–700, with no directional flow at other times. In the first test the Granger test was applied. By considering the causality from X to Y , this method used a nonlinear model y(t) = a(t) +

3

bi (t)y(t − i) +

i=1

+

E. Statistical evaluation by constructing a time-varying threshold for surrogate data

3 3

3

ci (t)x(t − i)

i=1

dij (t)y(t − i)y(t − j )

i=1 j =i

The distribution of the time-varying ERR-causality strength or the Granger causality strength is unknown. To determine the causal influence between two signals, we constructed a time-varying threshold, representing the level of causality above which values had less than a 5% probability of occurring by chance. For this purpose, the following surrogate data technique was used. Assume the signal X caused the signal Y . This kind of causality is destroyed when the Y data are ordered randomly in some way. For this purpose, a phase randomization surrogate

+

3 3

fij (t)x(t − i)x(t − j )

i=1 j =i

+

3 3

hij (t)x(t − i)y(t − j ) + ζy (t),

where parameters a, b, c, d, f , and h are no longer constants but are functions of time. The window size was chosen as 80 and the detected time-varying Granger causality is illustrated in

TABLE I. The time-varying model for Example 1. t 0–100 101–300 301–500 501–700 701–1000

(25)

i=1 j =i

x(t)

y(t)

r(t) r(t) r(t) −0.07y −1 + 0.32y −2 − y −1 y −2 + ζx(t) r(t)

r(t) −0.07x −1 + 0.32x −2 − x −1 x −2 + ζy(t) r(t) r(t) r(t)

051919-5



FIG. 1. The true causality of signal X to Y and signal Y to X over time for example 1, which shows FX→Y = 1 during the interval 100–300 (black color) and FY →X = 1 during the interval 500–700 (light color).

was used and the detected time-varying Granger causalities for both directions are illustrated by Fig. 2(b), where the measurability of causality is significantly decreased and accurate results cannot be achieved in terms of the computed corresponding time-varying threshold. This failure arises due to the large contribution of the term x(t − 1)x(t − 2) in the original model. The absence of this term in the model can dramatically change the signal-to-noise ratio (SNR) in Y and results in a very poor prediction of Y even when the past information of X is considered. This shows the disadvantage of any test based on knowledge of a full and complete model. The second test applied the ERR-causality method. Initially, the candidate terms set was chosen as {1,x(t − 1),x(t − 2),x(t − 3),y(t − 1),y(t − 2),y(t − 3), x 2 (t − 1),x(t − 1)x(t − 2),x(t − 1)x(t − 3),x 2 (t − 2), x(t − 2)x(t − 3),x 2 (t − 3),

Fig. 2(a), where accurate causalities for both directions can be achieved in terms of the computed corresponding time-varying threshold. This is essentially an implementation of the new algorithm described in Sec. II C. Now a scenario when some key model terms are missed in the model structure is considered. By removing all the nonlinear terms in (25), the model y(t) = a(t) +

3

bi (t)y(t − i) +

i=1

3

y 2 (t − 1),y(t − 1)y(t − 2),y(t − 1)y(t − 3),y 2 (t − 2), y(t − 2)y(t − 3),y 2 (t − 3), x(t − 1)y(t − 1),x(t − 1)y(t − 2),x(t − 1)y(t − 3), x(t − 2)y(t − 1),x(t − 2)y(t − 2),x(t − 2)y(t − 3), x(t − 3)y(t − 1),x(t − 3)y(t − 2),x(t − 3)y(t − 3)}T , (27)

ci (t)x(t − i) + ζy (t)

i=1

(26)

which has 28 members. The detected time-varying ERRcausality strength for both directions and corresponding threshold are illustrated in Fig. 3(a), which clearly shows that the strength is more consistent during interaction interval [101,300] and [501,700] than the strength of the Granger test with the same model complexity, and the detected causalities for both directions are accurate. To study the flexibility of the new ERR-causality test in term selection, the candidate terms set was deliberately chosen to be insufficient: {1,x(t − 1),x(t − 2),x(t − 3),y(t − 1),y(t − 2),y(t − 3)}T , (28)

(a)

(b)

FIG. 2. (a) The detected time-varying Granger causality based on the model (25) for example 1. (b) The detected time-varying Granger causality based on the model (26) for example 1, where the causality is not distinctive. GX→Y is depicted by the thick black curve and the corresponding threshold is depicted by the thin dotted black curve; GY →X is depicted by the thick gray curve and the corresponding threshold is depicted by the thin dotted gray curve.

where all nonlinear terms have been removed and now only seven linear terms were considered. The detected time-varying ERR-causality strength for both directions and corresponding threshold, illustrated in Fig. 3(b), shows a better performance even though a significant nonlinear term with a large contribution was not considered, and a complete parameter set was not estimated, compared with the results from the Granger causality test with the same model complexity. The strength of the ERR-causality during interactions is not as strong as shown in Fig. 3(a) due to the absence of the nonlinear term, but it is still well above the threshold to reflect the original causality. This results shows the better robustness of the ERR-causality test compared to the Granger test. It is well understood that the estimation accuracy for both methods can be improved by increasing the number of trials, but for a real system, multitrials are often impossible. Note that the detected start and end positions of Y leading X and X leading Y are not at the exactly the same positions as the true causality, which is not surprising because the window size determines the reaction speed of causality detection. A selection of small window size means a fast reaction to the

051919-6



(a)

(a)

(b)

FIG. 3. (a) The detected time-varying causality strength based on the candidate terms set (27) using the ERR-causality test for example 1. (b) The detected time-varying causality strength based on the candidate terms set (28) using the ERR-causality test for example 1. The strength of X → Y is depicted by the thick black curve and the corresponding threshold is depicted by the thin dotted black curve; the strength of Y → X is depicted by the thick gray curve and the corresponding threshold is depicted by the thin dotted gray curve.

change of causality over time, but it may lead to insufficient data to achieve an accurate result. Conversely, a selection of a large window size can improve the accuracy of causality detection, but it may significantly slow down the reaction to the change of causality over time. B. Example 2

This example aims to explore the application of the proposed method in the detection of causality strength and time shift for a more challenging problem. The time series x1 (t) and x2 (t), t = 1, . . . ,N, describe two bidirectionally coupled Henon systems [9] with identical coupling strength of 0.08, while the time series y(t) is driven by x1 (t), through the coupling parameter C, and by x2 (t), where x1 (t) = 1.4 − x 1 2 (t − 1) + 0.3x1 (t − 2) + 0.08 x 1 2 (t − 1) − x 2 2 (t − 1) , x2 (t) = 1.4 − x 2 2 (t − 1) + 0.3x2 (t − 2) + 0.08 x 2 2 (t − 1) − x 1 2 (t − 1) , y(t) = 1.4 + 0.1x2 (t − 2) − [Cx1 (t − 1) + (1 − C)y(t − 1)]y(t − 1).

(29)

After setting the value of C in the range from 0 to 1, in steps of 0.1, 100 realizations of Eq. (29) were generated by varying the initial conditions and discarding the first 105 iterations as transients. To study the performance of the proposed methods

(b)

FIG. 4. Dependence on noise contamination for example 2 with two different sets of initial candidate terms. The solid plots with scatter depict the average causal coupling over 100 realizations estimated from x1 to y with different levels of noise. The dotted curve depicts the threshold when no noise was involved in the sampled data. (a) px1 = 2, px2 = 2, and py = 2; (b) px1 = 3, px2 = 3, and py = 3.

in the presence of noise, several tests with sampled data length N = 500 were performed without noise and with additive white noise with SNR levels of 10, 3, and 2 dB. Before the implementation of the ERR algorithm, a set of candidate terms was chosen, where px1 = 2, px2 = 2, and py = 2, and the obtained ERR-causality strengths and corresponding threshold when no noise was involved in the sampled data are shown in Fig. 4(a). All values of detected strength are well above the threshold. It is clearly seen that increasing values of C lead to increasing values of the estimated ERR-causality strength. Moreover, increasing levels of noise leads to decreasing values of the estimated ERR-causality strength. However, for all SNR levels below 10 dB, the decrease in performance is not significant, which indicates a very good performance of the proposed methods against noise. The detected time shift for all noise levels and all values of C are 1, as shown in Fig. 5 (square scatter), which are correct based on Eq. (29). In the second case, a larger set of candidate terms was chosen with px1 = 3, px2 = 3, and py = 3. The ERR-causality strengths with corresponding threshold are shown in Fig. 4(b). The main trend of strength by noise level and value of C are similar; however, the values of strength are higher for all C below 0.3. This observation can be explained by observing that x2 (at time lag 2) has a large influence on y when the value of C is small, and x2 (t − 2) is a function of x1 (t − 3). Hence, it is not surprising to detect that the

051919-7



are well above the corresponding threshold, which indicates significant interactions were detected during this time. During the time intervals 1–300 and 901–1200, the detected strength is very close to the threshold, which indicates no significant interaction was detected during this time. Moreover, the observed jump of strength at time 600 reflects the change of the parameter k. IV. APPLICATION TO EEG DATA FIG. 5. Dependence on detected time shifts for example 2 with two different sets of initial candidate terms.

neighbor x1 (t − 3) has the greatest influence to y, which can also be observed in the detected time shift by C, shown in Fig. 5 (triangular scatter). Following the increment of C, the direct influence that x1 takes over the indirect influence and the correct time shifts have been detected. C. Example 3

We now consider a bidirectional model with noise where both directional flows are active at the same time, associated with the following equations: √ x1 (t) = 0.95 2x1 (t − 1) − 0.9025x1 (t − 2) − 0.9x2 (t − 2) + ζ (t), x2 (t) = −1.05x2 (t − 1) − 0.85x2 (t − 2) − kx1 (t − 1) + ζ (t),

(30)

where the parameter k controls the strength of the interaction, and ζ (t) is white noise with zero mean and a standard deviation 0.1. A total of 1200 data points were generated, where during time 1–300 and 901–1200 both x1 (t) and x2 (t) were assigned a random value uniformly distributed in [−1,1]. During time 301–900, the model (30) was applied, and k = 1.3 during time 301–600 and k = 1.8 during time 601–900. The window size was chosen as 80 and the candidate terms were chosen as Eq. (27). The results of the ERR-causality test for both directions are shown in Fig. 6. It is observed that causality strengths during time 300–900 for both directions

FIG. 6. Bidirectional ERR-causality strength between x1 and x2 for example 2. The thick black curve depicts the ERR-causality strength from x1 → x2 and the thin dotted black curve depicts the corresponding threshold; the thick gray curve depicts the ERRcausality strength from x2 → x1 and the thin dotted gray curve depicts the corresponding threshold.

Recently, more and more studies have focused on the problem of causal effects in neural data [25,26], the investigation of which is usually carried out by correlation, coherence [27–29], or phase synchronization measures [30,31]. These methods measure the strength of the interactions between signals, but no insight into the directionality of information flow is produced and the methods cannot accommodate all nonlinear effects. Some nonparametric methods, involving the use of the directed transfer function [32], partial directed coherence [3], and transfer entropy [9] have been used to quantitatively describe the information flow for EEG data. Using the introduced parametric method, in this example, a data set consisting of an epileptic sample of scalp EEGs recorded by the EEG Laboratory of Neurophysiology, Sheffield Teaching Hospitals NHS Foundation Trust, Royal Hallamshire Hospital, was studied to find out the directional flow of signals collected from different cortical sites and to determine the corresponding quantitative measurements of the time shift to try to better understand the functional organization of the brain during an epileptic seizure. In this example, to simplify the problem, only dominant directional flow is considered at a specific time by comparing the strength of both causalities. A. Data acquisition

Scalp EEG signals are synchronous discharges from cerebral neurons detected by electrodes attached to the scalp. An XLTEK 32-channel headbox (Excel-Tech, Ltd.) with the international 10–20 electrode placement system was used. The sampling rate of the device was 500 Hz. A total of 32 EEG series were recorded in parallel from 32 electrodes located on the scalp of an epileptic patient while he was having a generalized epileptic seizure (typical absence). Both bipolar and referential montages were available. This example considered four bipolar montages: F7-F3, T5-O1, F8-F4, and T6-O2, which were located in different sites of the brain, as illustrated in Fig. 7. The montage F7-F3 represents the voltage difference between the channel F7 and F3. The purpose of this example is to detect the causality associated with the corresponding time shift between the signals from the anterior and posterior brain areas. An entire seizure of a patient was sampled (13 000 data points) starting from the 200th s of the recording and ending at the 226th. Two experiments were conducted. In the first the montages F7-F3 and T5-O1, two signals from the left hemisphere of the brain, were sampled after noise removal and the data are shown in Fig. 8. Figure 8 clearly shows the epileptic seizure where regular oscillation starts at the 203rd s and ends at the 223rd s. To apply the ERR-causality test, the initial

051919-8



(a)

FIG. 7. Distribution of EEG channels in the brain.

candidate terms set was chosen as {1,x(t − t), . . . ,x(t − 20t),y(t − t), . . . ,y(t − 20t)}T (31) and t = 2 ms. Experimentation showed that a window size of 300, a choice dependent on the dominant frequency of the signals, was appropriate. Figure 9(a) shows the contribution of the first term from the past information of the other signal detected by the proposed approach along with the threshold, where the black scattering curve denotes the strength of the signal F7-F3 leading T5-O1, and the gray scattering curve denotes the strength of the signal T5-O1 leading F7-F3. The majority of the strength is above the threshold during the seizure. The threshold is above the strength outside of the seizure, which was not of interest in this study. The corresponding values of ERR-causality between these two signals are shown in Fig. 9(b). Inspection of both figures shows that during the time interval 200.5–202 s, before the epileptic seizure, F7-F3 leading T5-O1 dominates the interactions. During the time interval 202–223 s, T5-O1 leading F7-F3

FIG. 8. The recorded EEG signals from the left brain.

(b)

(c)

FIG. 9. The results produced by the presented approach for the signal F7-F3 and T5-O1. (a) The strength of the ERR-causality, where the black scattering represents F7-F3 causing T5-O1 and the gray scattering represents T5-O1 causing F7-F3. (b) The detected map of the time-varying causality, where black denotes F7-F3 causing T5-O1 and gray denotes T5-O1 causing F7-F3. (c) The detected time-varying time shift of the signal T5-O1 in front of F7-F3.

dominates the interaction, although F7-F3 leading T5-O1 appears occasionally with very short duration, especially during 202–212 s, the strength of T5-O1 leading F7-F3 is relatively higher and the causality is more consistent than that during 212–223 s. During the time interval 223–226 s, after the seizure, two causalities appear alternating with relatively small strength. The detected time shift of the signal T5-O1 in front of F7-F3 is shown in Fig. 9(c). It is observed that, during the stable period of the epileptic seizure (time interval 203–223 s), the detected time shift of the signal T5-O1 in front of F7-F3 is very consistent (with an average value of about 28 ms), although a few gaps appear, indicating when the opposite causality is detected. From the causality point of view, this observation indicates that the signal T5-O1 leads F7-F3 during the seizure with an averaged time shift of about 28 ms. Notice that both T5-O1 and F7-F3 may be the result of or caused by a generator within the brain, but without information on the generator only the relationship between T5-O1 and F7-F3 can be tested. Four more epileptic seizures from the same patient have also been studied. The observations of causality are very 051919-9



TABLE II. The detected averaged time shift for five seizures from the same patient. Interval (s) 14–40 202–223 560–583 1361–1386 1674–1694

τ (T5-O1 → F7-F3) (ms)

τ (T6-O2 → F8-F4) (ms)

27.46 28.03 30.04 31.31 31.16

25.19 22.90 27.90 30.32 29.55

similar, and the averaged time shifts during the seizure are very close, as shown in Table II, which demonstrates that the time shift of the two considered signals is within the range of 25–32 ms. All the above results produced by the ERR-causality test indicate that the signals from the posterior brain areas precede the signals from the anterior brain areas during an epileptic seizure for the studied patient. Moreover, the time shifts of the signal from the anterior brain areas in front of the signal from the left anterior brain areas were observed to be very close to the time shifts of the signal from the right posterior brain areas in front of the signal from the right anterior brain areas. For all five epileptic seizures studied in this example, τ (F7-F3 →T5-O1) values are slightly different but are consistently greater than τ (F8-F4 → T6-O2) values. The observations extracted from EEG data are very interesting and may provide significant potential in future studies of brain activity during various forms of epileptic seizure. V. CONCLUSIONS

method is the ERR-causality test that can detect the timevarying causality between two signals without depending on full knowledge or estimation of a complete and unbiased system model. We have shown that our ERR-causality test can detect the time-varying causality of two signals, a measure of the corresponding coupling strength, and estimate the time shift. Both the ERR-causality test and the Granger test can detect causality accurately if the candidate terms are chosen appropriately. However, the ERR-causality test has a better performance if some model terms are omitted and estimates of the full parameter set in the model are not required. This in turn allows the use of smaller windows and hence increased fidelity in finding time shifts. Example 2 demonstrates that our method has a good performance against noise. Applications in bidirectional signals were also investigated and the results are encouraging. The application of the ERR-causality test to detect the directed interaction between EEG signals has been presented in the last example. By analyzing the detected causality map along with the strength and the estimated time shifts, it has been found that the dominant causality for absence epilepsy in a child is very consistent during epileptic seizure, but the dominant causality before and at the end of seizure are random. Furthermore, the estimated time shifts of the signals from the posterior brain regions leading the signals from the anterior brain areas are in the range of 25–32 ms for the studied patient. These observations show that the proposed method could be a very important tool to help understand the functional organization of the brain during an epileptic seizure by providing insight into directionality of information flow. The results showing the causality between signals from different brain areas are highly encouraging, and a full map of signal flow will be developed in future publications.

Two methods of detecting causality have been presented in this paper. The first method is the Granger causality test based on the adaptive-forward-OLS algorithm that utilizes an advantageous optimal parameter search routine to produce a complete model, and the Granger causality is then computed in terms of the variance of the prediction errors. The second

The authors gratefully acknowledge that part of this work that was financed by the Engineering and Physical Sciences Research Council (EPSRC), UK, and the European Research Council (ERC).

[1] O. David, S. J. Kiebel, L. M. Harrison, J. Mattout, J. M. Kilner, and K. J. Friston, NeuroImage 30, 1255 (2006). [2] L. A. Baccala, K. Sameshima, G. Ballester, A. C. D. Valle, and C. Timo-laria, Appl. Signal Process. 5, 40 (1998). [3] L. A. Baccala and K. Sameshima, Biol. Cybern. 84, 463 (2001). [4] H. Ombao and S. V. Bellegem, IEEE Trans. Signal Process. 55, 2259 (2008). [5] W. Hesse, E. Moller, M. Arnold, and B. Schack, J. Neurosci. Methods 124, 27 (2003). [6] D. Marinazzo, M. Pellicoro, and S. Stramaglia, Phys. Rev. E 73, 066216 (2006). [7] J. P. Hamilton, G. Chen, M. E. Thomason, M. E. Schwartz, and I. H. Gotlib, Mol. Psychiatry 16, 763 (2011). [8] T. Schreiber, Phys. Rev. Lett. 85, 461 (2000). [9] L. Faes, G. Nollo, and A. Porta, Phys. Rev. E 83, 051112 (2011). [10] C. J. Granger, Econometrica 37, 424 (1969). [11] S. A. Billings, M. Korenberg, and S. Chen, Int. J. Syst. Sci. 19, 1559 (1988).

[12] M. Korenberg, S. A. Billings, and Y. P. Liu, Int. J. Control 48, 193 (1988). [13] S. Chen, S. A. Billings, and W. Luo, Int. J. Control 50, 1873 (1989). [14] Y. Zhao and S. A. Billings, IEEE Trans. Syst. Man Cybern. Part B 36, 473 (2006). [15] S. A. Billings and H. L. Wei, IEEE Trans. Neural Networks 18, 306 (2007). [16] H. L. Wei, S. A. Billings, Y. Zhao, and L. Z. Guo, IEEE Trans. Neural Networks 20, 181 (2009). [17] S. A. Billings and H. L. Wei, Int. J. Control 81, 714 (2008). [18] Q. M. Zhu and S. A. Billings, Int. J. Control 64, 871 (1996). [19] K. Z. Mao and S. A. Billings, Int. J. Control 68, 311 (1997). [20] S. A. Billings, S. Chen, and M. J. Korenberg, Int. J. Control 49, 2157 (1989). [21] Y. Li, H. L. Wei, S. A. Billings, and X. F. Liao, Phys. Rev. E 85, 041906 (2012).

ACKNOWLEDGMENT

051919-10



[22] Y. Li, H. L. Wei, S. A. Billings, and P. G. Sarrigiannis, J. Neurosci. Methods 196, 151 (2011). [23] Y. Zhao, H. L. Wei, and S. A. Billings, IEEE Trans. Syst. Man Cybern. Part B 42, 1283 (2012). [24] J. Theiler, S. Eubank, A. Longtin, B. Galdrikian, and J. D. Farmer, Physica D 58, 77 (1992). [25] A. Brovelli, M. Z. Ding, A. Ledberg, Y. H. Chen, R. Nakamura, and S. L. Bressler, Proc. Natl. Acad. Sci. USA 101, 9849 (2004). [26] D. W. Gow, J. A. Segawa, S. P. Ahlfors, and F. H. Lin, NeuroImage 43, 614 (2008). [27] S. L. Bressler and J. A. S. Kelso, Trends Cognit. Sci. 5, 26 (2001).

[28] G. Marrelec, J. Daunizeau, M. Pelegrini-Issac, J. Doyon, and H. Benali, IEEE Trans. Signal Process. 53, 3503 (2005). [29] L. Astolfi, F. Cincotti, D. Mattia, F. D. Fallani, G. Vecchiato, S. Salinari, G. Vecchiato, H. Witte, and F. Babiloni, J. Psychophysiol. 24, 93 (2010). [30] F. Varela, J. P. Lachaux, E. Rodriguez, and J. Martinerie, Nat. Rev. Neurosci. 2, 229 (2001). [31] S. Aviyente and A. Y. Mutlu, IEEE Trans. Signal Process. 59, 3086 (2011). [32] M. J. Kaminski and K. J. Blinowska, Biol. Cybern. 65, 203 (1991).

051919-11