{"title": "Increase Information Transfer Rates in BCI by CSP Extension to Multi-class", "book": "Advances in Neural Information Processing Systems", "page_first": 733, "page_last": 740, "abstract": "", "full_text": "Increase information transfer rates in BCI\n\nby CSP extension to multi-class\n\nGuido Dornhege1, Benjamin Blankertz1, Gabriel Curio2, Klaus-Robert M\u00fcller1,3\n\n1Fraunhofer FIRST.IDA, Kekul\u00e9str. 7, 12489 Berlin, Germany\n\n2Neurophysics Group, Dept. of Neurology, Klinikum Benjamin Franklin,\nFreie Universit\u00e4t Berlin, Hindenburgdamm 30, 12203 Berlin, Germany\n3University of Potsdam, August-Bebel-Str. 89, 14482 Potsdam, Germany\n\n{dornhege,blanker,klaus}@first.fraunhofer.de,\n\ncurio@zedat.fu-berlin.de\n\nAbstract\n\nBrain-Computer Interfaces (BCI) are an interesting emerging technology\nthat is driven by the motivation to develop an effective communication in-\nterface translating human intentions into a control signal for devices like\ncomputers or neuroprostheses. If this can be done bypassing the usual hu-\nman output pathways like peripheral nerves and muscles it can ultimately\nbecome a valuable tool for paralyzed patients. Most activity in BCI re-\nsearch is devoted to \ufb01nding suitable features and algorithms to increase\ninformation transfer rates (ITRs). The present paper studies the implica-\ntions of using more classes, e.g., left vs. right hand vs. foot, for operating\na BCI. We contribute by (1) a theoretical study showing under some mild\nassumptions that it is practically not useful to employ more than three\nor four classes, (2) two extensions of the common spatial pattern (CSP)\nalgorithm, one interestingly based on simultaneous diagonalization, and\n(3) controlled EEG experiments that underline our theoretical \ufb01ndings\nand show excellent improved ITRs.\n\n1\n\nIntroduction\n\nThe goal of a Brain-Computer Interface (BCI) is to establish a communication channel for\ntranslating human intentions \u2013 re\ufb02ected by suitable brain signals \u2013 into a control signal for,\ne.g., a computer application or a neuroprosthesis (cf. [1]). If the brain signal is measured\nnon-invasively by an electroencephalogram (EEG), if short training and preparation times\nare feasible and if it is possible to achieve high information transfer rates (ITRs), this inter-\nface can become a useful tool for disabled patients or an interesting gadget in the context\nof computer games. Recently, some approaches have been presented (cf. [1, 2]) which are\ngood candidates for successfully implementing such an interface.\nIn a BCI system a subject tries to convey her/his intentions by behaving according to well-\nde\ufb01ned paradigms, like imagination of speci\ufb01c movements. An effective discrimination\nof different brain states is important in order to implement a suitable system for human\nsubjects. Therefore appropriate features have to be chosen by signal processing techniques\naccording to the selected paradigm. These features are translated into a control signal,\n\n\feither by simple threshold criteria (cf. [1]), or by machine learning techniques where the\ncomputer learns a decision function from some training data [1, 3, 4, 5, 6].\nFor non-invasive BCI systems that are based on discrimination of voluntarily induced brain\nstates three approaches are characteristic. (1) The T\u00fcbingen Thought Translation Device\n(TTD) [7] enables subjects to learn self-regulation of slow cortical potentials (SCP), i.e.,\nelectrocortical positivity and negativity. After some training in experiments with vertical\ncursor movement as feedback navigated by the SCP from central scalp position, patients\nare able to generate binary decisions in a 4-6 second pace with an accuracy of up to 85 %.\n(2) Users of the Albany BCI system [8] are able to control a cursor movement by their os-\ncillatory brain activity into one of two or four possible targets on the computer screen and to\nachieve over 90 % hit rates after adapting to the system during many feedback sessions with\na selection rate of 4 to 5 seconds in the binary decision problem. And (3), based on event-\nrelated modulations of the pericentral m - and/or b -rhythms of sensorimotor cortices (with\na focus on motor preparation and imagination) the Graz BCI system [9] obtains accuracies\nof over 96 % in a ternary classi\ufb01cation task with a trial duration of 8 seconds by evaluation\nof adaptive auto-regressive models (AAR). Note that there are other BCI systems which\nrely on stimulus/response paradigms, e.g. P300, see [1] for an overview.\nIn [10] an approach called Common Spatial Patterns (CSP) was suggested for use in a\nBCI context. This algorithm extracts event-related desynchronization (ERD) effects, i.e.,\nevent-related attenuations in some frequency bands, e.g., m =b -rhythm. However, the CSP\nalgorithm can be used more generally, e.g., in [11] a suitable modi\ufb01cation to movement-\nrelated potentials was presented. Further in [12] a \ufb01rst multi-class extension of CSP is\npresented which is based on pairwise classi\ufb01cation and voting. In this paper we present\nfurther ways to extend this approach to many classes and compare to prior work.\nBy extending a BCI system to many classes a gain in performance can be obtained since\nthe ITR can increase even if the percentage of correct classi\ufb01cations decreases. In [13] a\n\ufb01rst study for increasing the number of classes is demonstrated based on a hidden markov\nmodel approach. The authors conclude to use three classes which attains the highest ITR.\nWe are focussing here on the same problem but using CSP extracted features and arrive at\nsimilar results. However, in a theoretical part we show that using more classes can be worth\nthe effort if a suitable accuracy of all pairwise classi\ufb01cations is available. Consequently,\nextensions to multi-class settings are worthwhile for a BCI system, if and only if a suitable\nnumber of effectivly separable human brain states can be assigned.\n\n2 How many brain states should be chosen?\n\nOut of many different brain states (classes) our task is to \ufb01nd a subset of classes which is\nmost pro\ufb01table for the user of a BCI system. In this part we only focus on the information\ntheoretical perspective. Using more classes holds the potential to increase ITR, although\nthe rate of correct classi\ufb01cations decreases. For the subsequent theoretical considerations\nwe assume gaussian distributions with equal covariance matrices for all classes which is a\nreasonable assumption for a wide range of EEG features, see section 4.3. Furthermore we\nassume equal priors between all classes. For three classes and equal pairwise classi\ufb01cations\nerrors err, bounds for the expected classi\ufb01cation error can be calculated in the following\nway: Let (X;Y ) 2 IRn (cid:2) Y (Y = f1;2;3g) be random variables and P (cid:24) N (m 1;2;3;S)\nthe\nprobability distribution. Scaling appropriately we can assume S = I. We de\ufb01ne the optimal\nclassi\ufb01er by f (cid:3) : IRn ! Y with f (cid:3) = argmin f2F P( f (X) 6= Y ), where F is some class of\nfunctions1. Similarly f (cid:3)i; j describes the optimal classi\ufb01er between classes i and j. Directly\nwe get err := P( f (cid:3)i; j(X) 6= Y ) = G(jjm i (cid:0) m\nx exp((cid:0)x2=2)dx and\n1For the moment we pay no attention to whether such a function exists. In the current set-up F\nis usually the space of all linear classi\ufb01ers, and under the probability assumptions mentioned above\nsuch a minimum exist.\n\njjj=2) with G(x) := 1p2p R\n\n\u00a5\n\f2.5\n\n2\n\n1.5\n\n1\n\n0.5\n\n2 calc\n3 sim\n4 sim\n5 sim\n6 sim\n3 range\n\n0\n\n0\n\n4\n\n6\n\n8\n\nA\n\n3\n\nR = A+B+Cl+D\nCu = Cl+D+E\n\nD\n\nE\n\nD\n\n1\n\nCl\n\n2\n\nB\n\n2\n\n20\nFigure 1: The \ufb01gure on the left visualizes a method to estimate bounds for the ITR depending on\nthe expected pairwise misclassi\ufb01cation risk for three classes. The \ufb01gure on the right shows the ITR\n[bits per decision] depending on the classi\ufb01cation error [%] for simulated data for different number\nof classes (3-6 sim) and for 2 classes the real values (2 calc). Additionally the expected range (see\n(1)) (3 range) for three classes is visualized.\n\n18\n\n10\n\n16\n\n12\n\n14\n\nj (cid:0) m ijj2 = F\n\nfor all i 6= j with some F > 0 and \ufb01nally due to\ni 6= j. Therefore we get jjm\nsymmetry and equal priors P( f (cid:3)(X) 6= Y ) = Q(jjXjj2 (cid:21) min j=2;3(jjX (cid:0)m\nj +m 1jj2=2)) where\nQ (cid:24) N (0; I). Since evaluation of probabilities for polyhedrons in the gaussian space is\nhard, we only estimate lower and upper bounds. We can directly reduce the problem to a 2\ndimensional space by shifting and rotating and by Fubini\u2019s theorem. Since jjm\nj (cid:0) m ijj2 = F\nfor all i 6= j the means lie at corners of an equilateral triangle (see Figure 1). We de\ufb01ne\nR := fx 2 IR2j jjxjj2 (cid:21) jjx(cid:0) m\nj + m 1jj2; j = 2;3g and we can see after some calculation or by\nFigure 1 (left) with the sets de\ufb01ned there, that A[ B[Cl (cid:26) R (cid:26) A[ B[Cu: Due symmetry,\nthe equilateral triangle and polar coordinates transformation we get \ufb01nally\n\nerr +\n\nexp((cid:0)F\n\n6\n\n2=6)\n\n(cid:20) P( f (cid:3)(X) 6= Y ) (cid:20) err +\n\nexp((cid:0)F\n\n6\n\n2=8)\n\n:\n\n(1)\n\nTo compare classi\ufb01cation performances involing different numbers of classes, we use\nthe ITR quanti\ufb01ed as bit rate per decision I as de\ufb01ned due to Shannon\u2019s theorem: I :=\nlog2 N + plog2(p) + (1(cid:0) p)log2((1(cid:0) p)=(N (cid:0) 1)) per decision with number of classes N\nand classi\ufb01cation accuracy p (cf. [14]). Figure 1 (right) shows the bounds in (1) for the ITR\nas a function of the expected pairwise misclassi\ufb01cation errors. Additionally the same val-\nues on simulated data (100000 data points for each class) under the assumptions described\nabove (equal pairwise performance, Gaussian distributed ...) are visualized for N = 2; :::;6\nclasses. First of all, the \ufb01gure con\ufb01rms our estimated bounds. Furthermore the \ufb01gure shows\nthat under this strong assumptions extensions to multi-class are worthwhile. However, the\ngain of using more than 4 classes is tiny if the pairwise classi\ufb01cation error is about 10 %\nor more. Under more realistic assumptions, i.e., more classes have increasing pairwise\nclassi\ufb01cation error compared to a wisely chosen subset it is improbable to increase the bit\nrate by increasing the number of classes higher than three or four. However, this depends\nstrongly on the pairwise errors. If a suitable number of different brain states that can be\ndiscriminitated well, then indeed extensions to more classes are useful.\n\n3 CSP and some multi-class extension\n\nThe CSP algorithm in its original form can be utilized for brain states that are characterized\nby a decrease or increase of a cortical rhythm with a characteristic topographic pattern.\n\n3.1 CSP in a binary problem\nLet S 1;2 be the centered covariance matrices calculated in the standard way of a trial-\nconcatenated vector of dimension [channels (cid:2) concatenated timepoints] belonging to the\nrespective label. The computation of S 1;2 needs to be adapted to the paradigm, e.g., for\nslow cortical features such as the lateralized readiness potential (cf. [11]). The original\nCSP algorithm calculates a matrix R and diagonal matrix D with elements in [0;1] with\n\nRS 1RT = D\n\nand\n\nRS 2RT = 1(cid:0) D\n\n(2)\n\nm\nm\nm\n\fwhich can easily be obtained by whitening and spectral theory. Only a few projections\nwith the highest ratio between their eigenvalues (lowest and highest ratios) are selected.\nIntuitively the CSP projections provide the scalp patterns which are most discriminative\n(see e.g. Figure 4).\n\n3.2 Multi-Class extensions\n\nUsing CSP within the classi\ufb01er (IN): This algorithm reduces a multi-class to several binary\nproblems (cf. [15]) and was suggested in [12] for CSP in a BCI context. For all com-\nbinations of two different classes the CSP patterns are calculated as described in Eq.(2).\nThe variances of the projections to CSP of every channel are used as input for an LDA-\nclassi\ufb01er for each 2-class combination. New trials are projected on these CSP patterns and\nare assigned to the class for which most classi\ufb01ers are voting.\nOne versus rest CSP (OVR): We suggest a subtle modi\ufb01cation of the approach above which\npermits to compute the CSP approach before the classi\ufb01cation. We compute spatial pat-\nterns for each class against all others2. Then we project the EEG signals on all these CSP\npatterns, calculate the variances as before and then perform an LDA multi-class classi\ufb01ca-\ntion. The approach OVR appears rather similar to the approach IN, but there is in fact a\nlarge practical difference (additionally to the one-versus-rest strategy as opposed to pair-\nwise binary subproblems). In the approach IN classi\ufb01cation is only done binary on the\nCSP patterns according to the binary choice. OVR does multi-class classi\ufb01cation on all\nprojected signals.\nSimultaneous diagonalization (SIM): The main trick in the binary case is that the CSP\nalgorithm \ufb01nds a simultaneous diagonalization of both covariance matrices whose eigen-\nvalues sum to one. Thus a possible extension to many classes, i.e., many covariances\n(S\ni)i=1;:::;N is to \ufb01nd a matrix R and diagonal matrices (Di)i=1;:::N with elements in [0;1]\nand with RS\ni=1 Di = I. Such a decomposition can only\nbe approximated for N > 2. There are several algorithms for approximate simultaneous\ndiagonalization (cf. [16, 17]) and we are using the algorithm described in [18] due to its\nspeed and reliability. As opposed to the two class problem there is no canonical way to\nchoose the relevant CSP patterns. We explored several options such as using the highest\nor lowest eigenvalues. Finally, the best strategy was based on the assumption that two\ndifferent eigenvalues for the same pattern have the same effect if their ratios to the mean\nof the eigenvalues of the other classes are multiplicatively inverse to each other, i.e., their\n\niRT = Di for all i = 1; :::; N and (cid:229) N\n\nproduct is 1. Thus all eigenvalues l are mapped to max(l ; (1(cid:0) l )=(1(cid:0) l + (N (cid:0) 1)2l ))\n\nand a speci\ufb01ed number m of highest eigenvalues for each class are used as CSP patterns.\nIt should be mentioned that each pattern is only used once, namely for the class which has\nthe highest modi\ufb01ed eigenvalue. If a second class would choose this pattern it is left out\nfor this class and the next one is chosen. Finally variances are computed on the projected\ntrials as before and conventional LDA multi-class classi\ufb01cation is done.\n\n4 Data acquisition and analysis methods\n\n4.1 Experiments\n\nWe recorded brain activity from 4 subjects (codes aa, af, ak and ar) with multi-channel\nEEG ampli\ufb01ers using 64 (128 for aa) channels band-pass \ufb01ltered between 0.05 and 200 Hz\nand sampled at 1000 Hz. For of\ufb02ine analysis all signals were downsampled to 100 Hz.\nSurface EMG at both forearms and one leg, as well as horizontal and vertical EOG signals,\nwere recorded to check for muscle activation and eye movements, but no trial was rejected.\n\n2Note that this can be done similarly with pairwise patterns, but in our studies no substantical\ndifference was observable and therefore one-versus-rest is favourable, since it chooses less patterns.\n\n\fThe subjects in this experiment were sitting in a comfortable chair with arms lying relaxed\non the armrests. All 4.5 seconds one of 6 different letters was appearing on the computer\nscreen for 3 seconds. During this period the subject should imagine one of 6 different ac-\ntions according to the displayed letter: imagination of left or right hand or f oot movement,\nor imagination of a visual, auditory or tactile sensation. Subject aa took only part in an\nexperiment with the 3 classes l, r and f. 200 (resp. 160 for aa) trials for each class were\nrecorded.\nThe aim of classi\ufb01cation in these experiments is to discriminate trials of different classes\nusing the whole period of imagination. A further reasonable objective to detect a new\nbrainstate as fast as possible was not an object of this particular study. Note that the classes\nv, a and t were originally not intended to be BCI paradigms. Rather, these experiments were\nincluded to explore multi-class single-trial detection for brain states related to different\nsensory modalities for which it can reasonably be assumed that the regional activations can\nbe well differentiated at a macroscopic scale of several centimeters.\n\n4.2 Feature Extraction\nDue to the fact that we focus on desynchronization effects (here the m -rhythm) we apply\n\ufb01rst a causal frequency \ufb01lter of 8\u201315 Hz to the signals. Further, each trial consists of a\ntwo second window starting 500 ms after the visual stimulus. Then, the CSP algorithm is\napplied and \ufb01nally variances of the projected trials were calculated to acquire the feature\nvectors. Alternatively, to see how effective the CSP algorithm is, the projection is left out\nfor the binary classi\ufb01cation task and we use instead techniques like Laplace \ufb01ltering or\ncommon average reference (CAR) with a regularized LDA classi\ufb01er on the variances.\nThe frequency band and the time period should be chosen individually by closer analysis\nof each data set. However, we are not focussing on this effect here, therefore we choose a\nsetting which works well for all subjects. The number of chosen CSP patterns is a further\nvariable. Extended search for different values can be done, but is omitted here. To have\nsimilar number of patterns for each algorithm we choose for IN 2 patterns from each side in\neach pairwise classi\ufb01cation (resulting in 2N(N(cid:0)1) patterns), for OVR 2 patterns from each\nside in each one-versus rest choice and for SIM 4 patterns for each class (both resulting in\n4N patterns).\n\n4.3 Classi\ufb01cation and Validation\n\nAccording to our studies the assumption that the features we are using are Gaussian dis-\ntributed with equal covariance matrices holds well [2]. In this case Linear Discriminant\nAnalysis (LDA) is optimal for classi\ufb01cation in the sense that it minimizes the risk of mis-\nclassi\ufb01cations. Due to the low dimensionality of the CSP features regularization is not\nrequired.\nTo assess the classi\ufb01cation performance, the generalization error was estimated by 10(cid:2)10-\nfold cross-validation. Since the CSP algorithm depends on the class labels, the calculation\nof this projection is done in the cross-validation on each training set. Doing it on the whole\ndata set beforehand can result in over\ufb01tting, i.e., underestimating the generalization error.\nFor the purpose of this paper the best con\ufb01guration of classes should be found. The most\nsophisticated way in BCI context would have consisted in doing many experiment with\ndifferent sets of classes. Unfortunately this is very time consuming and not of interest\nfor the BCI user. A more useful way is to do in a preliminary step experiments with\nmany classes and choose within an of\ufb02ine analysis which is the best subset by testing all\ncombinations. With the best chosen class con\ufb01guration the experiment should be repeated\nto con\ufb01rm the results. However, in this paper we present results of this simpler experiment,\nin fact following the setting in [13].\n\n\faa\naf\nak\nar\n\n0.2\n\n0.4\n\n0.6\n\n0.6\n\n0.5\n\n0\n\n0\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\nCSP\n\nR\nA\nC\n\n \n,\n\nE\nC\nA\nL\nP\nA\nL\n\nIn the scatter plot\nFigure 2:\n[bits per decision]\nthe ITRs\nfor all 2-class combinations for\nall subjects obtained by CSP\nare shown on the x-axis while\nthose by LAPLACE (dark points)\nresp. CAR (light points) are on\nthe y-axis. That means for marks\nbelow the diagonal CSP outper-\nforms LAPLACE resp. CAR.\n\n5 Results\nIn Figure 2 the bit rates for all binary combinations of\ntwo classes and for all subjects are shown. The results\nfor the CSP algorithm are contrasted in the plot with the\nresults of LAPLACE/CAR in such a way that for points\nbelow the diagonal CSP is better and for points above\nthe other algorithms are better. We can conclude that it\nis usually advantageous to use the CSP algorithm. Fur-\nthermore it is observable that the pairwise classi\ufb01cation\nperformances differ strongly. According to our theoret-\nical considerations we should therefore assume that in\nthe multi-class case a con\ufb01guration with 3 classes will\nperform best.\nFigure 3 shows the ITRs for all multi-class con\ufb01gu-\nrations (N=3; : : : ;6) for different subjects. Results for\nbaseline method IN are compared to the new methods\nSIM and OVR. The latter methods are superior for those\ncon\ufb01gurations whose results are below the diagonal in the scatter plot. For an overview\nthe upper plots show histograms of the differences in ITR between SIM/OVR and IN and a\ngaussian approximation. We can conclude from these \ufb01gures that no algorithm is generally\nthe best. SIM shows the best mean performance for subjects ak and ar but the performance\nfalls off for subject af. Since for aa only one three class combination is available, we omit\na visualization. However, SIM performs again best for this subject.\nStatistical tests of signi\ufb01cance are omitted since the classi\ufb01cation results are generally not\nindependent, e.g., classi\ufb01cation of {l,r,f } and {l,a,t} are dependent since the trials of class\nl are involved in both. For a given number of classes Figure 4 shows the ITR obtained for\nthe optimal subset of brain states by the best of the presented algorithms. As conjectured\nfrom \ufb02uctuations in pairwise discriminability, the bit rates decrease when using more than\nthree classes. In three out of four subjects the peak ITR is obtained with three classes,\nonly for subject aa pairwise classi\ufb01cation is better. Here one further strategy is helpful.\nAdditionally to the variance, autoregressive parameters can be calculated on the projections\non the CSP patterns \ufb01ltered here at 7\u201330 Hz and used for classi\ufb01cation. In this case the\npairwise classi\ufb01cation errors are more balanced such that we acquire \ufb01nally an ITR of 0.76\n\nOVR\nSIM\n\nOVR\nSIM\n\nOVR\nSIM\n\nN\n\nI\n\nOVR,SIM\n\n0.6\n\n0.55\n\n0.5\n\n0.45\n\n0.4\n\n0.35\n\n0.3\n\n0.25\n\naf\n\nak\n\nar\n\n\u22120.1\n\n0\n\n0.1\n\n\u22120.1\n\n0\n\n0.1\n\n\u22120.1\n\n0\n\n0.1\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n0\n\n0.3\n\n0.4\n\n0.5\n\n0.6\n\n0.25\n\n0.2\n\n0.15\n\n0.1\n\n0.05\n\n0.2\n\n0.4\n\n0.05\n\n0.1\n\n0.15\n\n0.2\n\n0.25\n\nFigure 3: In the scatter plot the ITRs [bits per decision] obtained by the baseline method IN are shown\non the y-axis while those by SIM (+) and OVR ((cid:14)) are on the x-axis. That means for marks below the\ndiagonal SIM resp. OVR outperforms IN. For an overall overview the upper plots show histograms\nof the differences in ITR between SIM/OVR and IN and shows a gaussian approximation of them.\nHere positive values belong to good performances of SIM and OVR.\n\n\f0.6\n\n0.4\n\n0.2\n\n0\n\naa\n\naf\n\nak\n\nar\n\n2\n3\n4\n5\n6\n\nt\nf\n\ne\n\nl\n\nt\n\nh\ng\ni\nr\n\nt\n\no\no\n\nf\n\nFigure 4: The \ufb01gure on the left shows the ITR per trial for different number of classes with the best\nalgorithm described above. The \ufb01gure on the right visualizes the \ufb01rst pattern chosen by SIM for each\nclass for aa.\n\nper decision, whereas the best binary combination has 0.6 bits per decision. The worth of\nusing AR for this subject are caused by different frequency bands in which discriminative\ninformations are. For the other subjects similar gains could not be observed by using AR\nparameters.\nFinally the CSP algorithm contains some further feature, namely that the spatial patterns\ncan be plotted as scalp topographies. In Figure 4 the \ufb01rst pattern for each class of algorithm\nSIM is shown for subject aa. Evidently, this algorithm can reproduce neurophysiological\nprior knowledge about the location of ERD effects because for each activated limb the\nappropriate region of motor cortex is activated, e.g., a left (right) lateral site for the right\n(left) hand and an area closer to the central midline for the foot.\nPsychological perspective.\nIn principle, multi-class decisions can be derived from a\ndecision space natural to human subjects. In a BCI context such set of decisions will be\nperformed most \u2019intuitively\u2019, i.e., without a need for prolonged training, if the differential\nbrain states are naturally related to a set of intended actions. This is the case, e.g., for\nmovements of different body parts which have a somatotopically ordered lay-out in the\nprimary motor cortex resulting in spatially discriminable patterns of EEG signals, such as\nreadiness potentials or event-related desynchonizations speci\ufb01c for \ufb01nger, elbow or shoul-\nder movement intentions. In contrast, having to imagine a tune in order to move a cursor\nupwards vs imaging a visual scene to induce a downward movement will produce spatially\ndiscriminable patterns of EEG signals related to either auditory or visual imagery, but its\naction-effect-contingency would be counter-intuitive. While humans are able to adapt and\nto learn such complex tasks, this could take weeks of training before it would be performed\nfast, reliably and \u2019automatically\u2019. Another important aspect of multi-class settings is that\nthe use of more classes which is discriminated by the BCI device only at lower accuracy is\nlikely to confuse the user.\n\n6 Concluding discussion\n\nCurrent BCI research strives for enhanced information transfer rates. Several options are\navailable: (1) training of the BCI users, which can be somewhat tedious if up to 300 hours\nof training would be necessary, (2) invasive BCI techniques, which we consider not appli-\ncable for healthy human test subjects, (3) improved machine learning and signal processing\nmethods where, e.g., new \ufb01ltering, feature extraction and sophisticated classi\ufb01ers are con-\nstantly tuned and improved3, (4) faster trial speeds and \ufb01nally (5) more classes among\nwhich the BCI user is choosing. This work analysed the theoretical and practical implica-\ntions of using more than two classes, and also psychological issues were shortly discussed.\nIn essence we found that higher a ITR is achieved with three classes, however, it seems\nunlikely that it can be increased by moving above four classes. This \ufb01nding is con\ufb01rmed in\nEEG experiments. As a further, more algorithmic, contribution we suggested two modi\ufb01-\ncations of the CSP method for the multi-class case. As a side remark: our multi-class CSP\nalgorithms also allow to gain a signi\ufb01cant speed up in a real-time feedback experiment as\n\ufb01ltering operations only need to be performed on very few CSP components (as opposed\nto on all channels). Since this corresponds to an implicit dimensionality reduction, good\n\n3See 1st and 2nd BCI competition: http://ida.first.fraunhofer.de/~blanker/competition/\n\n\fresults can be also achieved with CSP using less patterns/trials.\nComparing the results of SIM, OVR and IN we \ufb01nd that for most of the subjects SIM\nor OVR provide better results. Assuringly the algorithms SIM, OVR and IN allow to ex-\ntract scalp pattern for the classi\ufb01cation that match well with neurophysiological textbook\nknowledge (cf. Figure 4). In this paper the bene\ufb01cial role of a third class was con\ufb01rmed\nby an of\ufb02ine analysis. Future studies will therefore target on online experiments with more\nthan two classes; \ufb01rst experimental results are promising. Another line of study will ex-\nplore information from complementary neurophysiological effects in the spirit of [19] in\ncombination with multi-class paradigms.\nFinally it would be useful to explore con\ufb01gurations with more than two classes which\nare more natural and also more userfriendly from the psychological perspective discussed\nabove.\n\nAcknowledgments We thank S. Harmeling, M. Kawanabe, A. Ziehe, G. R\u00e4tsch, S. Mika,\nP. Laskov, D. Tax, M. Kirsch, C. Sch\u00e4fer and T. Zander for helpful discussions. The studies were\nsupported by BMBF-grants FKZ 01IBB02A and FKZ 01IBB02B.\n\nReferences\n[1] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T. M. Vaughan, \u201cBrain-computer interfaces for commu-\n\nnication and control\u201d, Clin. Neurophysiol., 113: 767\u2013791, 2002.\n\n[2] B. Blankertz, G. Dornhege, C. Sch\u00e4fer, R. Krepki, J. Kohlmorgen, K.-R. M\u00fcller, V. Kunzmann, F. Losch, and G. Curio,\n\u201cBoosting Bit Rates and Error Detection for the Classi\ufb01cation of Fast-Paced Motor Commands Based on Single-Trial EEG\nAnalysis\u201d, IEEE Trans. Neural Sys. Rehab. Eng., 11(2): 127\u2013131, 2003.\n\n[3] B. Blankertz, G. Curio, and K.-R. M\u00fcller, \u201cClassifying Single Trial EEG: Towards Brain Computer Interfacing\u201d, in: T. G.\n\nDiettrich, S. Becker, and Z. Ghahramani, eds., Advances in Neural Inf. Proc. Systems (NIPS 01), vol. 14, 157\u2013164, 2002.\n\n[4] L. Trejo, K. Wheeler, C. Jorgensen, R. Rosipal, S. Clanton, B. Matthews, A. Hibbs, R. Matthews, and M. Krupka, \u201cMulti-\n\nmodal Neuroelectric Interface Development\u201d, IEEE Trans. Neural Sys. Rehab. Eng., 2003, accepted.\n\n[5] L. Parra, C. Alvino, A. C. Tang, B. A. Pearlmutter, N. Yeung, A. Osman, and P. Sajda, \u201cLinear spatial integration for single\n\ntrial detection in encephalography\u201d, NeuroImage, 2002, to appear.\n\n[6] W. D. Penny, S. J. Roberts, E. A. Curran, and M. J. Stokes, \u201cEEG-Based Communication: A Pattern Recognition Approach\u201d,\n\nIEEE Trans. Rehab. Eng., 8(2): 214\u2013215, 2000.\n\n[7] N. Birbaumer, N. Ghanayim, T. Hinterberger, I. Iversen, B. Kotchoubey, A. K\u00fcbler, J. Perelmouter, E. Taub, and H. Flor, \u201cA\n\nspelling device for the paralysed\u201d, Nature, 398: 297\u2013298, 1999.\n\n[8] J. R. Wolpaw, D. J. McFarland, and T. M. Vaughan, \u201cBrain-Computer Interface Research at the Wadsworth Center\u201d, IEEE\n\nTrans. Rehab. Eng., 8(2): 222\u2013226, 2000.\n\n[9] B. O. Peters, G. Pfurtscheller, and H. Flyvbjerg, \u201cAutomatic Differentiation of Multichannel EEG Signals\u201d, IEEE Trans.\n\nBiomed. Eng., 48(1): 111\u2013116, 2001.\n\n[10] H. Ramoser, J. M\u00fcller-Gerking, and G. Pfurtscheller, \u201cOptimal spatial \ufb01ltering of single trial EEG during imagined hand\n\nmovement\u201d, IEEE Trans. Rehab. Eng., 8(4): 441\u2013446, 2000.\n\n[11] G. Dornhege, B. Blankertz, and G. Curio, \u201cSpeeding up classi\ufb01cation of multi-channel Brain-Computer Interfaces: Common\nspatial patterns for slow cortical potentials\u201d, in: Proceedings of the 1st International IEEE EMBS Conference on Neural\nEngineering. Capri 2003, 591\u2013594, 2003.\n\n[12] J. M\u00fcller-Gerking, G. Pfurtscheller, and H. Flyvbjerg, \u201cDesigning optimal spatial \ufb01lters for single-trial EEG classi\ufb01cation\n\nin a movement task\u201d, Clin. Neurophysiol., 110: 787\u2013798, 1999.\n\n[13] B. Obermaier, C. Neuper, C. Guger, and G. Pfurtscheller, \u201cInformation Transfer Rate in a Five-Classes Brain-Computer\n\nInterface\u201d, IEEE Trans. Neural Sys. Rehab. Eng., 9(3): 283\u2013288, 2001.\n\n[14] J. R. Wolpaw, N. Birbaumer, W. J. Heetderks, D. J. McFarland, P. H. Peckham, G. Schalk, E. Donchin, L. A. Quatrano, C. J.\nRobinson, and T. M. Vaughan, \u201cBrain-Computer Interface Technology: A review of the First International Meeting\u201d, IEEE\nTrans. Rehab. Eng., 8(2): 164\u2013173, 2000.\n\n[15] E. Allwein, R. Schapire, and Y. Singer, \u201cReducing multiclass to binary: A unifying approach for margin classi\ufb01ers\u201d, Journal\n\nof Machine Learning Research, 1: 113\u2013141, 2000.\n\n[16] J.-F. Cardoso and A. Souloumiac, \u201cJacobi angles for simultaneous diagonalization\u201d, SIAM J.Mat.Anal.Appl., 17(1): 161 ff.,\n\n1996.\n\n[17] D.-T. Pham, \u201cJoint Approximate Diagonalization of Positive De\ufb01nite Matrices\u201d, SIAM J. on Matrix Anal. and Appl., 22(4):\n\n1136\u20131152, 2001.\n\n[18] A. Ziehe, P. Laskov, K.-R. M\u00fcller, and G. Nolte, \u201cA Linear Least-Squares Algorithm for Joint Diagonalization\u201d, in: Proc.\n4th International Symposium on Independent Component Analysis and Blind Signal Separation (ICA2003), 469\u2013474, Nara,\nJapan, 2003.\n\n[19] G. Dornhege, B. Blankertz, G. Curio, and K.-R. M\u00fcller, \u201cCombining Features for BCI\u201d, in: S. Becker, S. Thrun, and\n\nK. Obermayer, eds., Advances in Neural Inf. Proc. Systems (NIPS 02), vol. 15, MIT Press: Cambridge, MA, 2003.\n\n\f", "award": [], "sourceid": 2384, "authors": [{"given_name": "Guido", "family_name": "Dornhege", "institution": null}, {"given_name": "Benjamin", "family_name": "Blankertz", "institution": null}, {"given_name": "Gabriel", "family_name": "Curio", "institution": null}, {"given_name": "Klaus-Robert", "family_name": "M\u00fcller", "institution": null}]}