Multichannel fusion models for the parametric ... - IEEE Xplore

0 downloads 0 Views 580KB Size Report
Index Terms—Evoked potentials, multisensor fusion, parametric classification. I. INTRODUCTION. NUMEROUS human brain function studies employ brain.
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 52, NO. 11, NOVEMBER 2005

1869

Multichannel Fusion Models for the Parametric Classification of Differential Brain Activity Lalit Gupta*, Beomsu Chung, Mandyam D. Srinath, Dennis L. Molfese, and Hyunseok Kook

Abstract—This paper introduces parametric multichannel fusion models to exploit the different but complementary brain activity information recorded from multiple channels in order to accurately classify differential brain activity into their respective categories. A parametric weighted decision fusion model and two parametric weighted data fusion models are introduced for the classification of averaged multichannel evoked potentials (EPs). The decision fusion model combines the independent decisions of each channel classifier into a decision fusion vector and a parametric classifier is designed to determine the EP class from the discrete decision fusion vector. The data fusion models include the weighted EP-sum model in which the fusion vector is a linear combination of the multichannel EPs and the EP-concatenation model in which the fusion vector is a vector-concatenation of the multichannel EPs. The discrete Karhunen–Loeve transform (DKLT) is used to select features for each channel classifier and from each data fusion vector. The difficulty in estimating the probability density function (PDF) parameters from a small number of averaged EPs is identified and the class conditional PDFs of the feature vectors of averaged EPs are, therefore, derived in terms of the PDFs of the single-trial EPs. Multivariate parametric classifiers are developed for each fusion strategy and the performances of the different strategies are compared by classifying 14-channel EPs collected from five subjects involved in making explicit match/mismatch comparisons between sequentially presented stimuli. It is shown that the performance improves by incorporating weights in the fusion rules and that the best performance is obtained using multichannel EP concatenation. It is also noted that the fusion strategies introduced are also applicable to other problems involving the classification of multicategory multivariate signals generated from multiple sources. Index Terms—Evoked potentials, multisensor fusion, parametric classification.

I. INTRODUCTION

N

UMEROUS human brain function studies employ brain waveforms to determine relationships between cognitive processes and changes in the brain’s electrical activity. Brain waveforms are also used extensively in clinical investigations to study normal and abnormal brain functions. The goal of this paper is to introduce and evaluate multichannel fusion Manuscript received September 3, 2004; revised March 26, 2005. Asterisk indicates corresponding author. *L. Gupta is with the Department of Electrical and Computer Engineering, Southern Illinois University, Carbondale, IL 62901 USA (e-mail: [email protected]). B. Chung and H. Kook are with the Department of Electrical and Computer Engineering, Southern Illinois University, Carbondale, IL 62901 USA. M. D. Srinath is with the Department of Electrical Engineering, Southern Methodist University, Dallas, TX 75205 USA. D. L. Molfese is with the Department of Psychological and Brain Sciences, University of Louisville, Louisville, KY 40292 USA. Digital Object Identifier 10.1109/TBME.2005.856272

strategies that will automatically classify differential brain activity into their respective categories. The categories may include brain activity correlated with different stimuli or with different stimuli presentation methods that are typical in human cognition studies. The categories in clinical investigations may be the hypothesized dysfunctional states or different stages of medical conditions. The paper focuses on the classification of multichannel evoked potentials (EPs), however, the strategies developed are also applicable to the classification of multichannel electroencephalograms (EEGs) and in general, to the classification of multisensor signals. Because of the poor signal-to-noise ratio (SNR), analysis and classification is typically conducted on EPs averaged over a large number of trials. However, repeating an experiment a large number of times in order to collect sufficient single-trial data to form averages is not practical and may even not be possible in some studies and investigations [1]. Consequently, improving the classification accuracies for small-average EPs will be a major breakthrough for the more flexible application of EPs in differential brain waveform studies and clinical investigations. Analyses of multichannel EPs show that different channels reflect different aspects of the brain activity from the presentation of an external stimulus [2]. For example, consider the ensemble averaged EPs of six different channels shown in Fig. 1. Each figure shows EPs belonging to two different categories (brain activity classes). The EPs within the same category (responses to the same external input) are clearly different across the six channels. Methods that focus on classifying the EPs of each channel independently do not fully exploit this different but complementary information buried in the multichannel recordings of brain activity. This paper, therefore, focuses on ways by which the information from multiple-channels can be combined in order to improve the classification accuracies of averaged EPs. Classifying multichannel EPs can be formulated as that of classifying multisensor data. Several fusion methodologies have been proposed to combine data, features, or decisions of multiple sensors in order to improve the performance over that of single sensor systems. [3] and [4] contain excellent overviews of multisensor fusion models, issues, methodologies, and applications. Our previous efforts to improve the classification accuracies led to the development of a decision fusion strategy in which multichannel EPs were classified by fusing the classification results of all channels [1]. A 2-category classifier was developed independently for each channel and the classification results were fused using a majority decision rule. It was shown that the classification accuracy of the majority decision

0018-9294/$20.00 © 2005 IEEE

1870

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 52, NO. 11, NOVEMBER 2005

Fig. 1. Ensemble-averaged EPs of 6-channels.

fusion rule classifier was consistently superior when compared with the rule that selected the classification accuracy of a single best channel. This paper focuses on comparing the performances of several parametric models uniquely formulated for both multichannel decision fusion and multichannel data fusion. Although most studies and investigations typically involve two categories, multicategory formulations are presented to generalize the fusion models and classification strategies. A parametric weighted decision fusion model and two data fusion models are introduced. In the weighted decision fusion model, a classifier is designed independently for each channel and the decisions of the channel classifiers are fused into a single discrete random fusion vector. The channel decisions are weighted using a priori classification accuracy information and a separate parametric classifier is designed to determine the EP class from the weighted decision fusion vector. The two data fusion models introduced are referred to as multichannel EP-sum and multichannel EP-concatenation. In the multichannel EP-sum model, the weighted EPs are summed across all channels to form a single fusion vector which has the same dimension as the individual channel EPs. Without weighting, this is similar to the “grand averaging” operation that is often used in EP analysis. The EP elements are weighted according to their multiclass separations. The EPs of all channels are stacked into a fusion vector in the multichannel EP-concatenation model. The dimension of the fusion vector increases by a factor equal to the number of channels. A 2-step dimensionality reduction algorithm which takes into account the interclass separation and the correlation of the components in the fusion vector is introduced to facilitate classifier development. In order to compare the performances of the different fu-

sion strategies, multivariate parametric classifiers are developed for each fusion strategy and the performances are compared by classifying 14-channel EPs collected from five subjects involved in picture-word matching tasks. II. SINGLE-TRIAL EP AND AVERAGED-EP MODELS In the fusion formulations to follow, represents the -dimensional response vector to an external input at channel , where is the number of channels. Single-trial responses are referred to as 1-EPs and responses averaged over single-trials are referred to as -EPs. Single-trial and averaged responses elicited by external stimulus , at channel are represented by and , respectively, where is the total number of external stimuli. It is assumed that each stimulus elicits a different class of brain activity, therefore, the number of EP classes is also . A. Single-Trial EP Model The single-trial response vector at channel an external stimulus is often modeled as

in response to

(1) where is the EP elicited by stimulus and is the everpresent EEG which is regarded as background noise. B. -Average EP Model In order to show the improvement in SNR through single-trial and averaging, it is assumed in the single-trial model that are independent, is deterministic, and is a zeromean correlated process [1], [5]–[7]. The model assumed for single-trial EPs averaged over -trials is [1] (2)

GUPTA et al.: MULTICHANNEL FUSION MODELS FOR THE PARAMETRIC CLASSIFICATION OF DIFFERENTIAL BRAIN ACTIVITY

Fig. 2.

1871

Multichannel decision fusion.

where is the noise averaged over trials. We have shown can in [1] that from the multivariate central-limit-theorem, be assumed to be a zero-mean Gaussian random vector. That is, , where is the proba, is bility density function (PDF) of the random vector , and represents the multivariate the covariance of Gaussian PDF with mean and covariance . Furthermore, we have shown that the parameters of the smaller -EP design set can be determined from the parameters of the larger 1-EP design set. Specifically, the following was shown. and are the mean vectors of 1) If and , respectively, then (3) 2) If and are the covariance matrices of respectively, then

and

,

(4) Therefore

the covariance matrices of the -EP and 1-EP ensembles, respectively, then (6) (7) It must be emphasized that these results are quite important for the design of practical parametric -EP classifiers because a lower bound is placed on the larger number of 1-EPs in the design set and not on the smaller number of -EPs in the design set for covariance estimation. Consequently, an impractically large number of single-trial EPs does not have to be collected to generate enough -EPs for parameter estimation. These results will be used to determine the PDF parameters of the -EP fusion vectors of each fusion model, in terms of the PDF parameters of the -EPs of the multiple channels, in order to determine the discriminant functions for the brain activity categories. The specific goal in the next three sections is to develop fusion-based classification strategies to determine, from a set of multichannel -EPs, the class of the stimulus that elicited the -EPs. III. MULTICHANNEL EP DECISION FUSION MODEL

(5) is the PDF of the -EPs of channel bewhere longing to class . Note that because the noise in each channel is assumed statistically equivalent under the different categories, the covariance matrices of the EPs of each channel are independent of . and are the eigenvalue vectors and 3) If and are the eigenvector matrices of

In the multichannel EP decision fusion model, the outputs of the channel classifiers are fused so as to implement a single decision rule as shown in Fig. 2. Each channel classifier is designed to independently classify the brain activity elicited by the external stimulus at its corresponding location on the scalp. A categeneral weighted fusion rule is now introduced for gories and it is shown that the simple majority rule is a special case of the weighted fusion rule. In order be explicit, the formulations are described in terms of the averaging parameter , the

1872

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 52, NO. 11, NOVEMBER 2005

channel parameter , and the class parameter . The -class classifier for the -EPs of channel is developed as follows.

The mean vectors and covariance matrices of respectively, by

are given,

A. DKLT Feature Selection

(14)

The features for classification are selected from a set of linear . The linear combinations combinations of the samples of are given by

(15)

(8)

Under the linear transformation, remains Gaussian. The class-conditional density of the feature vector of channel under class is, therefore

where is the DKLT of the -EPs of channel . That is, is the eigenvector matrix of the -EP covariance matrix of channel with the eigenvectors arranged in an order correbe the eigenvector sponding to decreasing eigenvalues. Let in a descending matrix formed by arranging the rows of order of separation so that the first row yields the linear combination with the highest -class separation and the last row yields the lowest -class separation where the -class separais found by summing the pairwise tion of each element in interclass separations [8]. That is, if and are and , respecthe th elements of the th vectors of tively, then the class- vs class- pairwise scalar separation of element of is given by

(9) and are the means of the th elements where of and , respectively, and is the number of vectors in each class (assumed equal for convenience). Then, the -class separation of the th element of is given by (10)

(16) From (7), . Therefore, the class-conditional density of the feature vector in terms of the single-trial EP parameters is (17) The Bayes minimum-error classification rule (0–1 loss function) to class if is to assign a test -EP (18) is the a priori probability of the -EPs belonging to where is the probability of occurrence of external class . That is, stimulus . The discriminant function for the -EPs of channel under class is given by (19) Substituting the PDF in (16) into (19) and taking the natural logarithm, the discriminant function for the -class EPs of channel can be written as

Typically, only a small number of linear combinations (eigenvectors) yield high -class separations. We, therefore, select the eigenvectors to form the feature vectors based on their corresponding -class separations. That is, we arrange the eigenvectors in a descending order of -class separation and choose the eigenvectors that satisfy first

The discriminant function in terms of the single-trial parameters can be written as

(11)

(21)

where is a specified threshold, to form the truncated DKLT of dimension . For the feature selection matrix dimension of the feature vectors for all the channels to be the same, the dimension is selected as

(20)

and the class detected by channel

is, therefore, given by (22)

B. Weighted Decision Fusion Rule (12) dimensional feature vector of channel The given by

for class

is

(13)

takes one of values, therefore, For each channel , is a discrete random variable. The outputs of the channel classifiers can be fused into a single -dimensional decision fusion discrete random vector (23)

GUPTA et al.: MULTICHANNEL FUSION MODELS FOR THE PARAMETRIC CLASSIFICATION OF DIFFERENTIAL BRAIN ACTIVITY

1873

The channel decisions can be weighted by taking the probabilinto account. Let , be the ities of is class when the true class probability that the decision is and let the PDF of under class be , then, the Bayes decision function for class is

Through substitution of the probabilities into the discriminant if function in (30), the Bayes rule: decide class ; otherwise decide class , reduces to: decide class if

(24)

(33)

which can also be written as

otherwise decide class

, which is the majority rule.

(25) The final decision resulting from decision fusion is given by (26) For this general case, the discriminant function , , can be derived explicitly by noting that the PDF of under class can be written as

D. Estimation of Probabilities Prior knowledge of the classification results can be used to assign the probabilities . Alternatively, the probabilities can be estimated from a validation set of channel as shown in the equation at the bottom of the page. IV. MULTICHANNEL EP-SUM FUSION MODEL

(28)

, In the EP-sum fusion model, each element , of the -dimensional fusion vector is given by a linear combination of the th elements of the -channel EPs as shown in Fig. 3. That is, can be written as

Because the classifiers are developed independently for each channel classichannel, we assume that the decisions of the fiers are independent, therefore, the PDF of under the class can be written as

(34) and the fusion vector for this case can, therefore, be written as the linear combination

(27) where if if

(35)

(29) By substituting the PDF into (25), it can be shown that the discriminant function for class can be written as shown in (30) at the bottom of the page. C. Majority Decision Fusion Rule

where is a diagonal matrix of weights. The dimension of the fusion vector is the same as the dimension of the EPs of each channel. Each component in the sum term of (35) is a Gaussian random vector with PDF (36) so that

In order to show that the majority decision fusion rule is a special case of the above weighted decision fusion rule, assume that the number of EP classes is , the number of channels is odd, and the classes are equi-probable. Also let (31) (32)

(37) where (38)

(30)

1874

Fig. 3.

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 52, NO. 11, NOVEMBER 2005

Multichannel EP-sum fusion.

A. Classifier Development

Because

Using the DKLT approach outlined in Section III-A, the features for class are selected from

(44)

(39)

Furthermore, and , therefore, the discriminant function for class can be written as

is the truncated matrix formed by selecting the rows where that satisfy (12), of the DKLT of the fusion vector. The PDF of is, therefore

(40)

(45)

The Bayes discriminant function for class is given by V. MULTICHANNEL EP-CONCATENATION FUSION MODEL

(41) and a test fusion vector

is assigned to the class

In this fusion model, the -channel fusion is formed by concatenating the EPs of the single multichannel EP of dimension , as shown in Fig. 4. The resulting vector is given by

vector for class channels into a , where, -channel fusion

given by (42)

(46) where we use is

to represent the concatenation operation. That

B. Determining the Weighting Matrix The weight assigned to element in the linear combiis . The weights can be selected nation that forms according to the normalized -class separations of the th element of each channel. The highest weight is assigned to the channel element with the highest -class separation across the channels which is found through summing the pairwise interclass separations in a manner similar to that described in Section III. Let the -class separation of the th element of channel be . Then, the weight for the th component of channel is given by

(47) The elements of are related to the elements of the vectors according to

-channel (48)

If of

and are the , respectively, then

mean and

covariance

(49) (43) (50)

GUPTA et al.: MULTICHANNEL FUSION MODELS FOR THE PARAMETRIC CLASSIFICATION OF DIFFERENTIAL BRAIN ACTIVITY

1875

Fig. 4. Multichannel EP-concatenation fusion.

That is, the covariance matrix of the fusion vector is formed by concatenating the channel covariance matrices and the interchannel cross-covariance matrices according to

where

is a

matrix. The PDF of

is, therefore

(54) The two-step procedure to determine the dimensionality reducis described next. tion matrix (51) A. Separation-Based Dimensionality Reduction

The PDF of each component of the fusion vector is Gaussian and it will be assumed that the PDF of the fusion vector is also Gaussian. That is (52) The advantage of concatenation fusion is that the information in the raw EP data from all channels is contained in the multichannel EP fusion vector. However, the drawback is that the dimensionality of the EP vector is increased by a factor . This increase exacerbates even further the dimensionality problem identified in [1]. Note that the DKLT approach cannot be used single-trial EPs are to decrease the dimension unless, at least collected for the training set to ensure that the covariance matrix of the fusion vector is nonsingular [1]. In practice, collecting such a large number of single-trial EPs is quite prohibitive even when the number of channels is small. In order to facilitate the design of practical parametric classifiers for the EP concatenation fusion strategy, the dimension of the concatenated fusion vector must be decreased to satisfy the condition that the dimension is less than the number of single-trial EPs in the training set. The challenge, therefore, is to decrease the dimension of the fusion vector without losing useful discriminatory information. A 2-step procedure consisting of multiclass separation-based dimensionality reduction and correlation-based dimensionality reduction is introduced to decrease the dimension to the desired , with . The operations in the two steps are value combined into a single dimensionality reduction matrix . The multichannel fusion vector of reduced dimension is, then, given by (53)

In the first step of the 2-step procedure, the -class separation is used to select only those elements of each element of that have high -class separations and discard those with poor with a diagonal separations. This is achieved by multiplying whose element is given by matrix if otherwise

(55)

where is a -class separation threshold and is the normalized -class separation of element in . That is, if is the -class separation of element , found by summing is the pairwise interclass separations as in Section III, given by (56)

The separation matrix of dimension is obtained by retaining the nonzero rows of . Because , can be dropped from . Then, the separation-based fusion 1 is given by vector of reduced dimension (57) B. Correlation-Based Dimensionality Reduction In the second step, the correlation is exploited to systematically decrease the dimension by combining the most correlated elements of . Let (58) be the -class mixture of , . The dimension is decreased iteratively by combining the most correlated pair

1876

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 52, NO. 11, NOVEMBER 2005

of elements in and replacing the correlated pair with the combination. As a result, the dimension is decreased by one in each iteration. The combining of the most correlated pairs of el. If ements is repeated until the dimension of the vector is is the vector at the th iteration, the procedure for correlation-based dimensionality reduction can be summarized as (59) with dimensions (60) , if and were the most corIn iteration and their -class normalized separations related pair in were and , respectively, then, element of is given by

(61) and (62) The element is removed from , therewill be . The fore, the dimension of combining of the most correlated pairs is repeated until has nonzero elements. At the end of each iteration, the elements and the weights of the pairwise linear combinations are noted. Upon termination, the weights assigned to that form the linear combination are the elements of stored in row , in the corresponding column, to form the dimensionality reduction matrix . The concatenated fusion vector of reduced dimension is given by (53). C. Classifier Development Let (63) where is a matrix formed by selecting the rows of DKLT matrix of the concatenated fusion vector the is that satisfy (12). Then, the PDF of

(64) Because is not a factor in determining and is also not a factor in the correlation-based dimensionality reduction procedure, can be written as . Therefore, the discriminant function for class , in terms of the single-trial EP parameters, is given by

(65)

and a test vector

is assigned to the class

given by (66)

VI. PERFORMANCE EVALUATIONS The goal in this section is to compare the performances of the six (three unweighted and three weighted) fusion strategies using a real EP data set. The EPs in the data set were collected from individuals engaged in making explicit match/mismatch sequentially presented stimuli. comparisons between The match/mismatch effect [2], [8]–[10] along with the related mismatch negativity [10]–[15] is one of the most studied ERP effects for over 30 years. The data set consisted of 14-channel EPs of two female and three male subjects. The goal of these experiments was to show that EPs can identify when a match occurs between what a subject thinks and sees. The subjects were instructed to “think” about the first video stimulus (picture of an object) and then to respond whether the next video stimulus (printed word of an object) matched or did not match the first stimulus in “meaning.” The period between the first and second stimulus varied randomly between 2 and 5 s. The response from each channel, time-locked to the onset of the second stimulus, was recorded as a match category or mismatch category depending on which of two keys was pressed by the subject. Responses due to incorrect key presses were discarded. EP data were collected from F7, F8, F3, F4, Fz, T3, T4, T5, T6, C3, C4, P3, P4, and Pz. The 14 electrodes were referenced to linked earlobe leads. The electrooculogram (EOG) was also recorded with two electrodes placed lateral and below the left eye (bipolar montage). Single-trial EPs were digitized over 1 s using a 10-ms sampling period beginning 100 ms prior to stimulus onset. The ten samples corresponding to the prestimulus period were removed, thus the dimension of the single-trial EPs . Trials in which the peak-to-peak amplitude exwas in any one electrode channel or 50 in the ceeded 100 eye channel were regarded as artifacts and rejected. In order to accommodate the different amplitudes across the channels, each single-trial EP was scale normalized by dividing the samples of the EP by the standard deviation of the EP samples. Additionally, each single-trial EP was de-trended to remove slope variations in the EPs within and across the channels. From the ensembles collected, an equal number of artifact-free single-trial match and mismatch EPs were selected for each subject, however, the number varied across the subjects. The total number of single-trial EPs collected for each category was 71, 71, 82, 71, and 63 for the five subjects. A. Classifier Design The single-trial covariance matrix for each channel was estimated, using the maximum likelihood estimate, by pooling the match and mismatch single-trial EPs because it is assumed that the noise under the match and mismatch conditions are statistically equivalent. The channel match and mismatch mean vectors were estimated directly from their respective single-trial training sets. Because the number of single-trials in the design set mixture of the EPs should exceed the dimension of the EP vector for covariance estimation, the EPs were downsampled from 90 to have dimensions 70, 70, 80, 70, and 60, respectively.

GUPTA et al.: MULTICHANNEL FUSION MODELS FOR THE PARAMETRIC CLASSIFICATION OF DIFFERENTIAL BRAIN ACTIVITY

1877

TABLE I CLASSIFICATION ACCURACIES AVERAGED ACROSS THE SUBJECTS AND CLASSIFIER RANKINGS

The EPs were downsampled by assuming a piecewise linear fit to the EP samples and uniformly resampling the resulting piecewise linear curve to obtain the desired dimension. The channel weights for decision fusion were selected based on the individual channel classification accuracies of the channels. The unweighted results for the sum fusion corresponded to selecting . The weights introduced in the correlation step of dimensionality reduction for concatenation fusion were selected to be (1/2) for the unweighted case. The dimension of the concatenation fusion vector was decreased to satisfy the condition that the number of 1-EPs should be greater than or equal to the dimension of the EP vector. For example, for the subjects who had 71 single-trial EPs in each category, the dimension was decreased from to 70. The dimensionality reduction matrix was determined by first selecting the threshold in (55) to discard 50% (490) elements with the smallest interclass separations. It must be noted that the 490 elements discarded accounted for only 7.6% of the total separation (sum of the 980 separations) averaged across the subjects. The correlation algorithm was applied to decrease the dimension from 490 to 70. Note that the dimension was decreased in the correlation algorithm, not by discarding any of the remaining 50% of the elements, but by systematically combining the most correlated elements in a pairwise fashion until the desired dimension was obtained. The features selected in all three fusion strategies were the DKLT linear combinations of the EP samples that accounted for the highest separation amongst the classes. In the two data fusion strategies, the features were computed after the fusion vector was formed. For all cases, the threshold in (11) for selecting the number of eigenvectors, thus the number of features, was 0.1. Because the second stimulus matched or did not match the first stimulus with equal probability, the a priori , probabilities in the discriminant functions were set to ,2. B. Classification Results For each stimulus, the single-trial EP data of each channel was randomly partitioned into two equi-sized sets: the training set and the test set. In each set, the single-trial responses were randomly selected and averaged to form the -EP training and test sets. A re-sampling approach [1], [16] was employed to gendesign and test set pairs. Each pair is referred to as a erate , the probapartition and for each partition , bility of classification error was estimated as (67)

is the estimated probability of misclassifying where for the -EPs of class in partition . The classification rate class and the classification accuracy , estimated over partitions, are given by (68) and (69) respectively. The classification accuracy was used to compare the performances of the different fusion strategies. For each subject, the classification accuracy was averaged over 200 stimulus presentations for taking values 2, 4, 8, and 16. The classification accuracies, as a function of , averaged across the five subjects are summarized in Table I. Also included in the table is a ranking, enclosed in parentheses, of the six classifiers for each . The classifier ranked 1 yielded the highest classification accuracy and the classifier ranked six yielded the lowest classification accuracy. The two columns under decision fusion compare the majority decision rule (unweighted) and weighted decision fusion rules. The best result for a given is presented in boldface. Fig. 5 shows the match and mismatch classification rates corresponding to the classification accuracies of the EP-concatenation strategy shown in Table I. In order to investigate the improvement in performance by increasing the number of channels, the concatenation fusion strategy was implemented by se, 4, 6, 8, 10, 12, and 14 channels. The classificalecting tion accuracies, averaged across the five subjects, for and 8 are shown in Fig. 6. The following conclusions can be drawn from the results in Table I and the results in Figs. 5 and 6. 1) The classification accuracies increased when was increased for all fusion methods. This confirms an expected result. 2) Also as expected, the match and mismatch classification rates increased when was increased. The rates were not significantly different for the smaller values of (2 and 4) and there was a small increase in the match rate for the higher values (8 and 16). 3) The performance improved by incorporating weights into the fusion rules. In some decision and sum-fusion cases, the improvement was quite dramatic.

1878

Fig. 5.

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 52, NO. 11, NOVEMBER 2005

Match and mismatch classification rates averaged across the subjects.

Fig. 6. Classification accuracies, averaged across the subjects, as a function of the number of channels.

4) The EP-concatenation fusion strategy yielded the highest classification accuracies consistently. If the majority rule (unweighted decision fusion) is used as a benchmark, the improvement in classification accuracy resulting from weighted concatenation fusion is quite significant. 5) The rankings of the fusion strategies are consistent across all the values of . 6) The performance improved when the number of channels concatenated was increased. This is in spite of the fact , the dimensionality reduction ratio, defined as increased proportionately when the number of channels concatenated was increased. The individual classification accuracies of the five subjects and an alternate method to rank the six classifiers are presented in the Appendix.

VII. CONCLUSION The goal of this paper was to develop fusion strategies to improve the classification accuracies of averaged EPs by exploiting the brain-activity information from multiple channels. A multichannel decision fusion model and two multichannel data fusion models were introduced and parametric classifiers were developed for each fusion model. The discriminant functions of the resulting fusion classifiers were derived in terms of the parameters of the single-trial EPs thus making it possible to develop and test -EP classifiers even when the number of -EPs was smaller than the EP dimension. Results from the classification experiments showed that the performance improved by incorporating weights in the fusion rules. The channels were weighted according to their respective

GUPTA et al.: MULTICHANNEL FUSION MODELS FOR THE PARAMETRIC CLASSIFICATION OF DIFFERENTIAL BRAIN ACTIVITY

1879

TABLE II INDIVIDUAL CLASSIFICATION ACCURACIES OF THE FIVE SUBJECTS AND THE CLASSIFIER RANKINGS

classification accuracies and the EP elements were weighted according to the multiclass separations in the decision fusion and data fusion strategies, respectively. The best results were obtained through weighted EP-concatenation fusion. It could be argued that this is an expected result because the information from all the channels is initially contained in the -channel EP fusion vector. However, the dimension of the fusion vector inthus exacerbating the problems even creases by a factor of further in estimating the classifier parameters. The challenge, therefore, was to decrease the dimension of the fusion vector without losing useful discriminatory information. A two-step dimensionality reduction algorithm which took into account the interclass separation and the correlation of the components in the fusion vector was developed. Furthermore, it was shown that the performance improved when the number of channels was increased even though the dimensionality reduction ratio increased proportionately. It could, therefore, be expected that the performance would increase even further if a larger number of channels were used provided that the most useful discriminatory information is selected without loosing too much information during dimensionality reduction. The dimension of the data fusion vector for EP-sum fusion was the same as that of the channel EPs. However, summing the EPs of different channels involves convolving the statistical information from the individual channels. As a result, the performance of the EP-sum data fusion strategy was not as impressive as that of concatenation fusion. An alternative approach to fuse multichannel EPs without increasing the dimension is to pool the multichannel EPs of each class to form a multichannel EP mixture for each class. We refer to this strategy as “mixture-fu-

sion.” The mixture components can be weighted according to their respective classification accuracies. The discriminant function for each class can be derived by noting that the PDF of the mixture is given by the sum of the weighted component PDFs. Our preliminary studies showed that the mixture-fusion results were worse than the sum-fusion results due to the increase of the scatter within each class (intraclass scatter) resulting from the variations of the EPs across the multiple channels. Because this approach was not promising, it was not pursued any further. The concatenation fusion strategy may also be used to directly fuse features computed from the EPs of each channel. Examples of features derived from EPs, among many others, include autoregressive model parameters [17], peak and latency measurements [10], [18], [19] and wavelets [20], [21]. Fusing a small set of discriminatory features from each channel will help decrease the dimensionality problem in the concatenation fusion model. However, as noted in [3], due to the loss of information during feature generation, feature-fusion systems yield lower classification accuracies when compared with data fusion systems. In terms of implementation, the data fusion strategies are simpler than decision fusion strategies because only a single classifier is required. All the parameters in the discriminant functions are pre-computed during the design stage, therefore, testing is quite straight-forward. The additional dimensionality reduction stage in the concatenation strategy involves only a simple matrix multiplication during testing. In summary, the concatenation fusion strategy yields the best performance and is also simpler to implement than decision fusion. Furthermore, the proposed data fusion classification strategies are quite general in their formulations and can, therefore, be applied to other

1880

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 52, NO. 11, NOVEMBER 2005

problems involving the classification of multicategory-multivariate signals collected from multiple sensors. APPENDIX The individual classification accuracies of the five subjects are shown in Table II. The classifier rankings for each are enclosed in parentheses. In order to determine the ranks of the six classifiers across the subjects and across simultaneously, the ranks of each classifier were first summed along each column to give the rank-sum as shown in the row labeled “Rank-Sum.” Next, the classifiers were ranked by ranking the rank-sums as shown in the row labeled “Rank of Rank-Sum.” It is interesting to observe that although there were variations in the classifier rankings, the final rankings are identical to the ranking that would be obtained by ranking the rank-sums of the classifiers using the averaged (across subjects) classification accuracies shown in Table I. ACKNOWLEDGMENT The authors would like to thank the reviewers for their comments and their suggestions for improving this paper. REFERENCES [1] L. Gupta, J. Phegley, and D. L. Molfese, “Parametric classification of multichannel evoked potentials,” IEEE Trans. Biomed. Eng., vol. 49, no. 8, pp. 905–911, Aug. 2002. [2] P. G. Simos and D. L. Molfese, “Event-related potentials in a two-choice task involving within-form comparisons of pictures and words,” Int. J. Neurosci., vol. 90, no. 3–4, pp. 233–254, 1997. [3] P. K. Varshney, “Multisensor data fusion,” Electron. Commun. Eng. J., pp. 245–253, Dec. 1997. [4] D. L. Hall, “An introduction to multisensor data fusion,” Proc. IEEE, vol. 85, no. 1, pp. 6–23, Jan 1997. [5] L. Gupta, D. L. Molfese, R. Tammana, and P. G. Simos, “Nonlinear alignment and averaging for estimating the evoked potential,” IEEE Trans. Biomed. Eng., vol. 43, no. 4, pp. 348–356, Apr. 1996. [6] C. E. Davila and R. Srebro, “Subspace averaging of steady-state visual evoked potentials,” IEEE Trans. Biomed. Eng., vol. 47, no. 6, pp. 720–728, Jun. 2000. [7] M. Hansson, T. Gansler, and G. Salomonsson, “A system for tracking changes in the mid-latency evoked potential during anesthesia,” IEEE Trans. Biomed. Eng., vol. 45, no. 3, pp. 323–334, Mar 1998. [8] S. E. Barret and M. D. Rugg, “Event-related potentials and the semantic matching of picture names,” Brain Cogn., vol. 14, pp. 201–212, 1990. [9] A. F. Kramer and E. Donchin, “Brain potentials as indexes of orthographic and phonological interaction during word matching,” J. Exp. Psychol., vol. 13, pp. 76–86, 1987. [10] T. Hinterberger, B. Wilhelm, J. Mellinger, B. Kotchoubey, and N. Birbaumer, “A device for the detection of cognitive brain functions in completely paralyzed and unresponsive patients,” IEEE Trans. Biomed. Eng., vol. 52, no. 2, pp. 211–220, Feb 2005. [11] L. A. Farwell and E. Donchin, “Talking off the top of your head: Toward a mental prosthesis utilizing event-related brain potentials,” Electroenceph. Clin. Neurophysiol., vol. 70, 1988. [12] J. Polich, T. Brock, and M. W. Geisler, “P300 from somatosensory stimuli: Probability and interstimulus interval,” Int. J. Psychophysiol., vol. 11, pp. 1–5, 1991. [13] C. Ogura, Y. Koga, and M. Shimokochi, Recent Advances in Event-Related Brain Potential Research. New York: Elsevier, 1996. [14] S. Bentin, “Event-related potentials, semantic processes, and expectancy factors in word recognition,” Brain Lang., vol. 31, pp. 308–327, 1987. [15] J. Polich, “Semantic categorization and event-related potentials,” Brain Lang., vol. 26, pp. 304–321, 1985. [16] J. Phegley, K. Perkins, L. Gupta, and L. Hughes, “Multi-category prediction of multifactorial diseases through risk factor fusion and rank sum selection,” IEEE Trans. Syst., Man, Cybern. A, to be published.

[17] C. W. Anderson, E. A. Stolz, and S. Shamsunder, “Multivariate autoregressive models for classification of spontaneous electroencephalographic signals during mental tasks,” IEEE Trans. Biomed. Eng., vol. 45, no. 3, pp. 277–286, Mar. 1998. [18] A. A. Dingle, R. D. Jones, G. J. Carroll, and W. R. Fright, “A multistage system to detect epileptiform activity in the EEG,” IEEE Trans. Biomed. Eng., vol. 40, no. 12, pp. 1260–1268, Dec. 1993. [19] D. L. Molfese, “Predicting dyslexia at 8 years of age using neonatal brain responses,” Brain Lang., vol. 72, pp. 238–245, 2000. [20] U. Hoppe, S. Weiss, R. W. Stewart, and U. Eysholdt, “An automatic sequential recognition method for cortical auditory evoked potentials,” IEEE Trans. Biomed. Eng., vol. 48, no. 2, pp. 154–164, Feb. 2001. [21] A. Ademoglu, E. M-Tzanakou, and Y. Istefanopulos, “Analysis of pattern reversal visual evoked potentials (PRVEPs) by spline wavelets,” IEEE Trans. Biomed. Eng., vol. 44, pp. 881–890, 1997.

Lalit Gupta received the B.E. (Hons) degree in electrical engineering from the Birla Institute of Technology and Science, Pilani, India, in 1976, the M.S. degree in digital systems from Brunel University, Middlesex, U.K., in 1981, and the Ph.D. degree in electrical engineering from Southern Methodist University, Dallas, TX, in 1986. He is currently a Professor of electrical and computer engineering at Southern Illinois University, Carbondale. His research interests are in pattern recognition, signal processing, neuroinformatics, and neural networks. He has served as a Consultant for the Army Research office (ARO) on smart munitions design and for Seagate Technology on image compression. He is currently a signal processing and pattern recognition Consultant for Think-A-Move, Neuronetrix Inc., and Cleveland Medical Devices on NIH funded projects. He has numerous publications in the fields of pattern recognition, neural networks, and evoked potentials. He is an Associate Editor of the Pattern Recognition Journal. Dr. Gupta is listed in Marquis Who’s Who in America (2002).

Beomsu Chung received the B.E. and M.S. degrees in electrical engineering from Southern Illinois University at Carbondale (SIUC) in 1997 and 1999, respectively. He received the Ph.D degree in engineering systems (electrical engineering) from SIUC in 2004. He is currently a Senior Engineer in the Terminal Technology Laboratory, Telecommunication R&D Center, Samsung, Suwon, Korea. His research interests are in pattern recognition, signal processing, and neuroinformatics.

Mandyam D. Srinath received the B.Sc. degree from the University of Mysore, India, in 1954, the Diploma in electrical technology from the Indian Institute of Science, Bangalore, in 1957, and the M.S. and Ph.D. degrees in electrical engineering from the University of Illinois, Urbana, in 1959 and 1962, respectively. He has been on the Electrical Engineering faculties at the University of Kansas, Lawrence and the Indian Institute of Science, Bangalore, India. He is currently Professor of electrical engineering at Southern Methodist University, Dallas, TX, where he has been since 1967. He has published numerous papers in signal and image processing, control and estimation theory, and video coding. He is principal author of the book Introduction to Statistical Signal Processing with Applications (with P. K. Rajasekaran and R. Viswanathan) (Englewood Cliffs, NJ: Prentice-Hall, 1996) and coauthor of Continuous and Discrete Signals and Systems (with S. Soliman) (Englewood Cliffs, NJ: Prentice-Hall, 1990). His current research interests are in image processing, and video data compression. Dr. Srinath is a Registered Professional Engineer in Texas.

GUPTA et al.: MULTICHANNEL FUSION MODELS FOR THE PARAMETRIC CLASSIFICATION OF DIFFERENTIAL BRAIN ACTIVITY

Dennis L. Molfese received the Ph.D. in psychology from the Pennsylvania State University, University Park, in 1972. He is an internationally recognized expert on the use of brain electrical recording techniques to study the emerging relationship between brain development and cognitive processes. He is currently Chair of the Department of Psychological and Brain Sciences at the University of Louisville, Kentucky. He has served as the Chair of a number of national panels on Learning Disabilities as well as on numerous NIH and NIMH grant review panels. He is co-director of one of 16 national laboratories that make up the National Institutes of Health Reading and Learning Disabilities Research Network. He serves as the Editor-in-Chief of Developmental Neuropsychology. He has obtained research funding for his work from the National Institutes of Health, the National Science Foundation, The National Foundation/March of Dimes, the MacArthur Foundation, the Kellogg Foundation, NATO, and NASA. He has organized over 30 national and international conferences on brain processes and language and has given over 80 invited lectures and 110 conference presentations here and abroad on this topic. He has published 120 books, journal articles, and book chapters on the relationship between developing brain functions and cognitive processes. Dr. Molfese is the recipient of a number of honors for outstanding research contributions from societies such as Sigma Xi and Phi Kappa Phi. He also received the Outstanding Scholar Award from Southern Illinois University, and is a Distinguished University Scholar at the University of Louisville. He is a Fellow of both the American Psychological Association and the American Psychological Society.

1881

Hyunseok Kook received the B.E. degree in electronics engineering from Wonkwang University, Korea, in 1995, the M.S. degree in electronics engineering from Wonkwang University, Korea, in 1997, and the M. S. degree in electrical engineering from Southern Illinois University at Carbondale in 2000. He is currently working towards the Ph.D. degree in the Department of Electrical and Computer Engineering at Southern Illinois University. His research interests are in pattern recognition, signal processing, neuroinformatics, and image processing.