{"title": "Aggregating Classification Accuracy across Time: Application to Single Trial EEG", "book": "Advances in Neural Information Processing Systems", "page_first": 825, "page_last": 832, "abstract": null, "full_text": "Aggregating Classification Accuracy across Time: Application to Single Trial EEG\n\nSteven Lemm Intelligent Data Analysis Group, Fraunhofer Institute FIRST, Kekulestr. 7 12489 Berlin, Germany\n\nChristin Schfer a Intelligent Data Analysis Group, Fraunhofer Institute FIRST, Kekulestr. 7 12489 Berlin, Germany\n\nGabriel Curio Neurophysics Group, Dept. of Neurology, Campus Benjamin Franklin, Charit,University Medicine Berlin, e Hindenburgdamm 20, 12200 Berlin, Germany\n\nAbstract\nWe present a method for binary on-line classification of triggered but temporally blurred events that are embedded in noisy time series in the context of on-line discrimination between left and right imaginary hand-movement. In particular the goal of the binary classification problem is to obtain the decision, as fast and as reliably as possible from the recorded EEG single trials. To provide a probabilistic decision at every time-point t the presented method gathers information from two distinct sequences of features across time. In order to incorporate decisions from prior time-points we suggest an appropriate weighting scheme, that emphasizes time instances, providing a higher discriminatory power between the instantaneous class distributions of each feature, where the discriminatory power is quantified in terms of the Bayes error of misclassification. The effectiveness of this procedure is verified by its successful application in the 3rd BCI competition. Disclosure of the data after the competition revealed this approach to be superior with single trial error rates as low as 10.7, 11.5 and 16.7% for the three different sub jects under study.\n\n1\n\nIntro duction\n\nThe ultimate goal of brain-computer interfacing (BCI) is to translate human intentions into a control signal for a device, such as a computer application, a wheelchair or a neuroprosthesis (e.g. [20]). Most pursued approaches utilize the accompanying EEG-rhythm perturbation in order to distinguish between single trials (STs) of left and right hand imaginary movements e.g. [8, 11, 14, 21]. Up to now there are just a few published approaches utilizing additional features, such as slow cortical potential, e.g. [3, 4, 9] This paper describes the algorithm that has been successfully applied in the 2005 international data analysis competition on BCI-tasks [2] (data set IIIb) for the on-line discrimina\n\nsteven.lemm@first.fhg.de\n\n\f\ntion between imagined left and right hand movement. The ob jective of the competition was to detect the respective motor intention as early and as reliably as possible. Consequently, the competing algorithms have to solve the on-line discrimination task as based on information on the event onset. Thus it is not within the scope of the competition to solve the problem of detecting the event onset itself. We approach this problem by applying an algorithm that combines the different characteristics of two features: the modulations of the ongoing rhythmic activity and the slow cortical Movement Related Potential (MRP). Both features are differently pronounced over time and exhibit a large trial to trial variability and can therefore be considered as temporally blurred. Consequently, the proposed method combines at one hand the MRP with the oscillatory feature and on the other hand gather information across time as introduced in [8, 16]. More precisely, at each time point we estimate probabilistic models on the labeled training data - one for each class and feature - yielding a sequence of weak instantaneous classifiers, i.e. posterior class distributions. The classification of an unlabeled ST is then derived by weighted combination of these weak probabilistic classifiers using linear combination according to their instantaneous discriminatory power. The paper is organized as follows: section II describes the feature and its extraction, In section III introduces the probabilistic model as well as the combinatorial framework to gather information from the different features across time. In section III the results on the competition data are given, followed by a brief conclusion.\n\n2\n2.1\n\nFeature\nNeurophysiology\n\nThe human perirolandic sensorimotor cortices show rhythmic macroscopic EEG oscillations (-rhythm) [6], with spectral peak energies around 10 Hz (localized predominantly over the postcentral somatosensory cortex) and 20 Hz (over the precentral motor cortex). Modulations of the -rhythm have been reported for different physiological manipulations, e.g., by motor activity, both actual and imagined [7, 13, 18], as well as by somatosensory stimulation [12]. Standard trial averages of -rhythm power show a sequence of attenuation, termed event-related desynchronization (ERD) [13], followed by a rebound (event-related synchronization: ERS) which often overshoots the pre-event baseline level [15]. In case of sensorimotor cortical processes accompanying finger movements Babiloni et al. [1] demonstrated that movement related potentials (MRPs) and ERD indeed show up with different spatio-temporal activation patterns across primary (sensori-)motor cortex (M1), supplementary motor area (SMA) and the posterior parietal cortex (PP). Most importantly, the ERD response magnitude did not correlate with the amplitude of the negative MRPs slope. In the subsequent we will combine both features. Thus, in order to extract the rhythmic information we map the EEG to the time-frequency domain by means of Morlet wavelets [19], whereas the slow cortical MRP are extracted by the application of a low pass filter, in form of a simple moving average filter. 2.2 Extraction\n\nLet X = [x[1], . . . , x[T ]] denote the EEG signal of one single trial (ST) of length T , recorded from the two bipolar channels C3 and C4, i.e. x[t] = [C3[t], C4[t]] T . The label information about the corresponding motor intention of a ST is denoted by Y {L, R}. For information obtain from observations until time s T , we will make use of subscript |s throughout this paper, e.g. X|s refers to [x[1], . . . , x[s]]. This observational horizon becomes important with respect to the causality of the feature extraction process, especially in order to ensure the causality of filter operations we have to restrict the algorithm to a certain observational horizon. Note that X|T denotes a completely observed ST. However for notational convenience we will omit the index |T in case of complete observations.\n\n\f\nConsidering ERD as a feature for ST classifications we model the hand-specific time course of absolute -rhythm amplitudes over both sensorimotor cortices. Therefore we utilize the time-frequency representations of the ST at two different frequency bands (, ), obtained by convolution of the EEG signal with complex Morlet wavelets [19]. Using the notation , and for a wavelet centered at the individual spectral peak in the alpha (8-12Hz) and the beta (16-24Hz) frequency domain, the ERD feature of a ST, observed until time s is calculated as: e , ERD|s = rd|s [1], . . . , erd|s [s] with |(C3|s )[t]| |(C4|s )[t]| . erd|s [t] = (1) |(C3|s )[t]| |(C4|s )[t]| In a similar manner we define the ST feature for the MRP by convolution with a moving average filter of length 11, abbreviated as MA(11). m , MRP|s = rp|s [1], . . . , mrp|s [s] with ( mrp|s [t] = C3|s MA(11))[t] (C4|s MA(11))[t] . (2) X ( k ) (k ) m ,Y aps\n\nAccording to (1) and (2) the k -th labeled, observed STs for training, i.e. to a STs in feature space, namely (MRP(k) , ERD(k) ).\n\n3\n\nProbabilistic Classification Mo del\n\nBefore we start with the model description, we briefly introduce two concepts from Bayesian decision theory. Therefore let p(x|y , y ), y {L, R} denote the PDFs of two multivariate Gaussian distributions with different means and covariance matrices (y , y ) for two classes, denoted by L and R. Given the two class-conditional distribution models, and under the assumption of a class prior of P (y ) = 1 , y {L, R}, and given an observation x, the 2 posterior class distribution according to Bayes formula is given by p(y |x, L , L , R , R ) = p(x|y , y ) . p(x|L , L ) + p(x|R , R ) (3)\n\nFurthermore the discriminative power between these two distributions can be estimated using the Bayes error of misclassification [5]. In case of distinct class covariance matrices, the Bayes error cannot be calculated directly. However, by using the Chernoff bound [5] we can derive an upper bound and finally approximate the discriminative power w between the two distributions by p 2w 1 - min (x|L , L ) p(x|R , R )1- dx. (4) =\n0 1\n\nIn case of Gaussian distributions the above integral can be expressed in a closed form [5], such that the minimum solution can be easily obtained (see also [16]). Based on these two necessary concepts, we will now introduce our probabilistic classification method. Therefore we first model the class-conditional distribution of each feature at each time instance as multivariate Gaussian distribution. Hence at each time instance we estimate the class means and the class covariance matrices in the feature space, based on the mapped training STs, i.e. ERD(k) , MRP(k) . Thus from erd(k) [t] we obtain the following two classconditional sets of parameters: e Y (5) y [t] = E rd(k) [t] (k) =y e Y y [t] = Cov rd(k) [t] (k) , y {L, R}. (6)\n=y\n\n\f\nFor convenience we summarize the estimated model parameters for the ERD feature as [t] := (L [t], L [t], R [t], R [t]), whereas [t] := (L [t], L [t]), R [t], R [t]) denote the class means and the covariance matrices obtained in the similar manner from mrp(k) [t]. Given an arbitrary observation x from the appropriate domain, applying Bayes formula as introduced in (3), yields a posterior distribution for each feature: y , p erd, [t] erd R4 (7) y , 2 mrp, [t] p mrp R . (8) Additionally, according to (4) we get approximations of the discriminative power w[t] and v [t] of the ERP resp. MRP feature at every time instance. In order to finally derive the classification of an unlabeled single trial at a certain time s T , we incorporate knowledge from all preceding samples t s, i.e. we make the classification based on the causally extracted features: ERD|s and MRP|s . Therefore we first apply (7) and (8) given the observations erd|s [t] resp. mrp|s [t] in order to obtain the class posteriors based on observations until s T . Secondly we combine these class posteriors with one another across time by taking the expectation under the distributions w and v , i.e. t + y y t w[t] p erd|s [t], [t] v [t] p mrp|s [t], [t] c(y , s) = . (9) s w [t] + v [t]\ns\n\nAs described in [16] this yields an evidence accumulation over time about the decision process. Strictly speaking Eq. (9) gives the expectation value that the ST, observed until time s, is generated by either one of the class models (L or R), until time s. Due to the submission requirements of the competition the final decision at time s is C [s] = 1 - 2 c(L, s), (10) where a positive or negative sign refers to right or left movement, while the magnitude indicates the confidence in the decision on a scale between 0 and 1.\n\n4\n4.1\n\nApplication\nComp etition data\n\nThe EEG from two bipolar channels (C3, C4) was provided with bandfilter settings of 0.5 to 30 Hz and sampled at 128 Hz. The data consist of recordings from three different healthy sub jects. Except for the first data set, each contains 540 labeled (for training) and 540 unlabeled trials (for competition) of imaginary hand movements, with an equal number of left and right hand trials (first data set provides just 320 trials each). Each trial has a duration of 7 s: after a 3 s preparation period a visual cue is presented for one second, indicating the demanded motor intention. This is followed by another 3 s for performing the imagination task (for details see [2]). The particular competition data was provided by the Dept. of Med. Informatics, Inst. for Biomed. Eng., Univ. of Techn. Graz. The specific competition task is to provide an on-line discrimination between left and right movements for the unlabeled STs for each sub ject based on the information obtained from the labeled trials. More precisely, at every time instance in the interval from 3 to 7 seconds a strictly causal decision about the intended motor action and its confidence must be supplied. After competition deadline, based on the disclosure of the labels Y (k) for the previously unlabeled STs the output C (k) [t] of the methods were evaluated using the time course of the mutual information (MI) [17], i.e. MI[t] = SNR[t] = 1 log2 (SNR[t] + 1) 2 E C (k ) Y C Y 2 [t] (k) =L - E (k) [t] (k) =R VC Y C Y 2 ar (k) [t] (k) =L + Var (k) [t] (k) =R (11) (12)\n\nMore precisely, since the general ob jective of the competition was to obtain the single trial classification as fast and as accurate as possible, the maximum steepness of the MI was\n\n\f\nconsidered as final evaluation criterion, i.e. max MI[t] . t - 3s (13)\n\nt3.5\n\nNote, that the feature extraction relies on a few hyperparameters, i.e. the center frequency and the width of the wavelets, as well as the length of the MA filter. All those parameters were obtained by model selection using a leave-one-out cross-validation scheme of the classification performance on the training data. 4.2 Results and Discussion\n\nAs proposed in section 3 we estimated the class-conditional Gaussian distributions cf. (5) (8). The resulting posterior distributions were then combined according to (9) in order to obtain the final classification of the unlabeled STs. After disclosure of the label information our method turned out to succeed with a MI steepness (cf. (13)) of 0.17, 0.44 and 0.35 for the individual sub jects. Table 4.2 summarizes the results in terms of the achieved minimum binary classification error, the maximum MI, and the maximum steepness of MI for each sub ject and each competitor in the competition. min. O3 10.69 14.47 13.21 23.90 11.95 10.69 34.28 error rate[%] S4 X11 11.48 16.67 22.96 22.22 17.59 16.48 24.44 24.07 21.48 18.70 13.52 25.19 38.52 28.70 max. MI [bit] O3 S4 X11 0.6027 0.6079 0.4861 0.4470 0.2316 0.3074 0.5509 0.3752 0.4675 0.2177 0.2387 0.2173 0.4319 0.3497 0.3854 0.5975 0.5668 0.2437 0.0431 0.0464 0.1571 max. MI/t [bit/s] O3 S4 X11 0.1698 0.4382 0.3489 0.1626 0.4174 0.1719 0.2030 0.0936 0.1173 0.1153 0.1218 0.1181 0.1039 0.1490 0.0948 0.1184 0.1516 0.0612 0.0704 0.0229 0.0489\n\n1. 2. 3. 4. 5. 6. 7.\n\nTable 1: Overall ranked results of the competing algorithms (first row corresponds to the proposed method) on the competition test data. For three different sub jects (O3, S4 and X11) the table states different performance measures of classification accuracy (min. Error rate, max MI, steepness of MI), where the steepness of the MI was used as the ob jective in the competition. For a description of the 2.7. algorithm please refer to [2]. The resulting time courses for the MI and the steepness of the MI are presented in the left panel of Fig. 1. For sub ject two and three, during the first 3.5 seconds (0.5 seconds after cue presentation) the classification is rather by chance, after 3.5 seconds a steep ascent in the classification accuracy can be observed, reflected by the raising MI. The maximum steepness for these two sub jects is obtained quite early, between 3.6 - 3.8s. In opposite, for sub ject one the maximum is achieved at 4.9 seconds, yielding a low steepness value. However, a low value is also found for the submission of all other competitors. Nevertheless, the MI constantly increases up to 0.64 Bit per trial at 7 seconds, which might indicate a delayed performance of sub ject one. The right panel in Fig. 1 provides the weights w[t] and v [t], reflecting the Bayes error of misclassification cf. (4), that were used for the temporal integration process. For sub ject two one can clearly observe a switch in the regime between the ERP and the MRP feature at 5 seconds, as indicated by a crossing of the two weighting functions. From this we conclude that the steep increase in MI for this sub ject between 3 and 5 seconds is mainly due to the MRP feature, whereas the further improvement in the MI relies primarily on the ERD feature. Sub ject one provides nearly no discriminative MRP and the classification is almost exclusively based on the ERD feature. For sub ject three the constant low weights at all time instances, reveal the weak discriminative power of the estimated class-conditional distributions. However in Fig. 1 the advantage of the integration process across time can clearly be observed, as the MI is continuously increasing and the steepness of the MI is surprisingly high even for this sub ject.\n\n\f\nFigure 1: Left panel: time courses of the mutual information (light, dashed) and the competition criterion - steepness of mutual information (thin solid) cf. (13)- for the classification of the unlabeled STs is presented. Right panel: the time course of the weights reflecting the discriminative power (cf. (4)) at every time instance for the two different features (ERD dark, solid; MRP - light dashed). In each panel the sub jects O3, S4, X11 are arranged top down.\n\nA comprehensive comparison of all submitted techniques to solve the specific task for data set IIIb of the BCI-competition is provided in [2] or available on the web 1 . Basically this evaluation reveals that the proposed algorithm outperforms all competing approaches.\n\n5\n\nConclusion\n\nWe proposed a general Bayesian framework for temporal combination of sets of simple classifiers based on different features, which is applicable to any kind of sequential data providing a binary classification problems. Moreover, any arbitrary number of features can be combined in the proposed way of temporal weighting, by utilizing the estimated discriminative power over time. Furthermore the estimation of the Bayes error of misclassification is not strictly linked to the chosen parametric form of the class-conditional distributions. For arbitrary distributions the Bayes error can be obtained for instance by statistical resampling approaches, such as Monte Carlo methods. However for the successful application in the BCI-competition 2005 we chose Gaussian distribution for the sake of simplicity concerning two issues: estimating their parameters and obtaining their Bayes error. Note that although the combination of the classifiers across time is linear, the final classification model is non-linear, as the individual classifiers at each time instance are non-linear.For a discussion about linear vs. non-linear methods in the context of BCI see [10]. More precisely due to the distinct covariance matrices of the Gaussian distributions the individual decision boundaries are of quadratic form. In particular to solve the competition task we combined classifiers based on the temporal evolution of different neuro-physiological features, i.e. ERD and MRP. The resulting on-line classification model finally turned out to succeed for the single trial on-line classification of imagined hand movement in the BCI competition 2005. Acknowledgement: This work was supported in part by the Bundesministerium fur Bildung und Forschung (BMBF) under grant FKZ 01GQ0415 and by the DFG under grant SFB 618-B4. S. Lemm thanks Stefan Harmeling for valuable discussions.\n\n1\n\nida.first.fhg.de/projects/bci/competition_iii/\n\n\f\nReferences\n[1] C. Babiloni, F. Carducci, F. Cincotti, P. M. Rossini, C. Neuper, Gert Pfurtscheller, and F. Babiloni. Human movement-related potentials vs desynchronization of EEG alpha rhythm: A high-resolution EEG study. NeuroImage, 10:658665, 1999. [2] Benjamin Blankertz, Klaus-Robert Muller, Dean Krusienski, Gerwin Schalk, Jonathan R. Wolpaw, Alois Schlgl, Gert Pfurtscheller, Jos del R. Milln, Michael o e a Schrder, and Niels Birbaumer. The BCI competition III: Validating alternative apo proachs to actual BCI problems. IEEE Trans. Neural Sys. Rehab. Eng., 14(2):153159, 2006. [3] Guido Dornhege, Benjamin Blankertz, Gabriel Curio, and Klaus-Robert Muller. Com bining features for BCI. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Inf. Proc. Systems (NIPS 02), volume 15, pages 11151122, 2003. [4] Guido Dornhege, Benjamin Blankertz, Gabriel Curio, and Klaus-Robert Muller. In crease information transfer rates in BCI by CSP extension to multi-class. In Sebastian Thrun, Lawrence Saul, and Bernhard Schlkopf, editors, Advances in Neural Inforo mation Processing Systems, volume 16, pages 733740. MIT Press, Cambridge, MA, 2004. [5] R.O. Duda, P.E. Hart, and D.G. Stork. Pattern Classification. John Wiley & Sons, New York, 2nd edition, 2001. [6] R. Hari and R. Salmelin. Human cortical oscillations: a neuromagnetic view through the skull. Trends in Neuroscience, 20:449, 1997. [7] H. Jasper and W. Penfield. Electrocorticograms in man: Effect of voluntary movement upon the electrical activity of the precentral gyrus. Arch. Psychiatrie Zeitschrift Neurol., 183:16374, 1949. [8] Steven Lemm, Christin Schfer, and Gabriel Curio. Probabilistic modeling of sensoria motor rhythms for classification of imaginary hand movements. IEEE Trans. Biomed. Eng., 51(6):10771080, 2004. [9] B.D. Mensh, J. Werfer, and H.S. Seung. Combining gamma-band power with slow cortical potentials to improve single-trial classification of electroencephalographic signals. IEEE Trans. Biomed. Eng., 51(6):10526, 2004. [10] Klaus-Robert Muller, Charles W. Anderson, and Gary E. Birch. Linear and non linear methods for brain-computer interfaces. IEEE Trans. Neural Sys. Rehab. Eng., 11(2):165169, 2003. [11] C. Neuper, A. Schlgl, and G. Pfurtscheller. Enhancement of left-right sensorimotor o EEG differences during feedback-regulated motor imagery. Journal Clin. Neurophysiol., 16:37382, 1999. [12] V. Nikouline, K. Linkenkaer-Hansen, Wikstrm; H., M. Kesniemi, E. Antonova, R. Ilo a moniemi, and J. Huttunen. Dynamics of mu-rhythm suppression caused by median nerve stimulation: a magnetoencephalographic study in human sub jects. Neurosci. Lett., 294, 2000. [13] G. Pfurtscheller and A. Arabibar. Evaluation of event-related desynchronization preceding and following voluntary self-paced movement. Electroencephalogr. Clin. Neurophysiol., 46:13846, 1979. [14] G. Pfurtscheller, C. Neuper, D. Flotzinger, and M. Pregenzer. EEG-based discrimination between imagination of right and left hand movement. Electroenceph. clin. Neurophysiol., 103:64251, 1997. [15] S. Salenius, A. Schnitzler, R. Salmelin, V. Jousmki, and R. Hari. Modulation of human a cortical rolandic rhythms during natural sensorimotor tasks. NeuroImage, 5:2218, 1997. [16] Christin Schfer, Steven Lemm, and Gabriel Curio. Binary on-line classification based a on temporally integrated information. In Claus Weihs and Wolfgang Gaul, editors, Proceedings of the 28th annual conference of the Gesel lschaft fur Klassifikation, pages 216223, 2005.\n\n\f\n[17] A. Schlgl, R. Scherer C. Keinrath, and G. Pfurtscheller. Information transfer of an o EEG-based brain-computer interface. In Proc. First Int. IEEE EMBS Conference on Neural Engineering, pages 641644, 2003. [18] A. Schnitzler, S. Salenius, R. Salmelin, V. Jousmki, and R. Hari. Involvement of a primary motor cortex in motor imagery: a neuromagnetic study. NeuroImage, 6:2018, 1997. [19] C. Torrence and G.P. Compo. A practical guide to wavelet analysis. Bul l. Am. Meterol., 79:6178, 1998. [20] Jonathan R. Wolpaw, Niels Birbaumer, Dennis J. McFarland, Gert Pfurtscheller, and Theresa M. Vaughan. Brain-computer interfaces for communication and control. Clin. Neurophysiol., 113:767791, 2002. [21] J.R. Wolpaw and D.J. McFarland. Multichannel EEG-based brain-computer communication. Electroenceph. clin. Neurophysiol., 90:4449, 1994.\n\n\f\n", "award": [], "sourceid": 3157, "authors": [{"given_name": "Steven", "family_name": "Lemm", "institution": null}, {"given_name": "Christin", "family_name": "Sch\u00e4fer", "institution": null}, {"given_name": "Gabriel", "family_name": "Curio", "institution": null}]}