{"title": "Classification of Electroencephalogram using Artificial Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 1151, "page_last": 1158, "abstract": null, "full_text": "Classification of Electroencephalogram using \n\nArtificial Neural Networks \n\nA C Tsoi*, D S C So*, A Sergejew** \n*Department of Electrical Engineering \n**Department of Psychiatry \nUniversity of Queensland \nSt Lucia, Queensland 4072 \nAustralia \n\nAbstract \n\nIn this paper, we will consider the problem of classifying electroencephalo(cid:173)\ngram (EEG) signals of normal subjects, and subjects suffering from psychi(cid:173)\natric disorder, e.g., obsessive compulsive disorder, schizophrenia, using a \nclass of artificial neural networks, viz., multi-layer perceptron. It is shown \nthat the multilayer perceptron is capable of classifying unseen test EEG \nsignals to a high degree of accuracy. \n\n1 \n\nIntroduction \n\nThe spontaneous electrical activity of the brain was first observed by Caton in 1875. \nAlthough considerable investigations on the electrical activity of the non-human \nbrain have been undertaken, it was not until 1929 that a German neurologist Hans \nBerger first published studies on the electroencephalogram (EEG) recorded on the \nscalp of human. He lay the foundation of clinical and experimental applications of \nEEG between 1929 and 1938. \n\nSince then EEG signals have been used in both clinical and experimental work \nto discover the state which the brain is in (see e.g., Herrmann, 1982, Kolb and \nWhishaw, 1990, Lindsay and Holmes, 1984). It has served as a direct indication of \nany brain activities. It is routinely being used in clinical diagnosis of epilepsy (see \ne.g., Basar, 1980; Cooper, 1980). \n\nDespite advances in technology, the classification of EEG signals at present requires \na trained personnel who either \"eyeballs\" the direct EEG recordings over time, \n\n1151 \n\n\f1152 \n\nTsoi, So, and Sergejew \n\nor studies the contour maps representing the potentials generated from the \"raw\" \nelectrical signal (see e.g., Cooper, 1980). This is both a highly skillful job, as well as \na laborious task for a neurologist. With the current advances in computers, a logical \nquestion to ask: can we use the computer to perform an automa'(.ic classification of \nEEG signals into different classes denoting the psychiatric states of the subjects? \n\nThis type of classification studies is not new. In fact, in the late 1960's there were \na number of attempts in performing the automatic classification using discriminant \nanalysis techniques. However, this work was largely abandoned as most researchers \nconcluded that classification based on discriminant techniques does not generalise \nwell, i.e., while it has very good classification accuracies in classifying the data which \nis used to train the automatic classification system, it may not have high accuracy \nin classifying the unseen data which are not used to train the system in the first \ninstance. \n\nRecently, a class of classification techniques, called artificial neural network (ANN), \nbased on nonlinear models, has become very popular (see e.g., Touretzky, 1989, \n1990, Lippmann et aI, 1991). This type of networks claims to be inspired by bi(cid:173)\nological neurons, and their many inter-connections. This type of artificial neural \nnetworks has limited pattern recognition capabilities. Among the many applications \nwhich have been applied so far are sonar signal classification (see e.g., Touretzky, \n1989), handwritten character recognition (see .e.g., Touretzky, 1990), facial expres(cid:173)\nsion recognition (see e.g., Lippmann et a1. 1991). \n\nIn this paper, we will investigate the possibility of using an ANN for EEG classifica(cid:173)\ntions. While it is possible to extract features from the time series using either time \ndomain or frequency domain techniques, from some preliminary work, it is found \nthat the time domain techniques give much better results. \n\nThe structure of this paper is as follows: In section 2, we will give a brief discussion \non a popular class of ANNs, viz., multi-layer perceptrons (MLP). In section 3, we \nwill discuss various feature extractions using time domain techniques. In section 4, \nwe will present results in classifying a set of unseen EEG signals. \n\n2 Multi-layer Perceptrons \n\nArtificial neural network (ANN) consists of a number of artificial neurons inter(cid:173)\nconnected together by synaptic weights to form a network (see e.g, Lippmann, \n1987). Each neuron is modeled by the following mechanical model: \n\nn \n\ny = f(L WiXi + 0) \n\ni=l \n\n(1) \n\nwhere y is the output of the neuron, Wi, i = 1,2, ... , n are the synaptic weights, \nXi, i = 1,2 ... , n are the inputs, and 0 is a threshold function. The nonlinear \nfunction f(.) can be a sigmoid function, or a hyperbolic tangent function. An ANN \nis a network of inter-connected neurons by synapses (Hertz, Krogh and Palmer, \n1991). \n\nThere are many possible ANN architectures (Hertz, Krogh, Palmer, 1991). A pop-\n\n\fClassification of Electroencephalogram Using Artificial Neural Networks \n\n1153 \n\nular architecture is the multi-layer perceptron (MLP) (see e.g., Lippmann, 1987). \nIn this class of ANN, signal travels only in a forward direction. Hence it is also \nknown as a feedforward network. Mathematically, it can be described as follows: \n\nY = !(Az + 0ll) \nz=!(Bu+Oz ) \n\n(2) \n(3) \n\nwhere y is a m x 1 vector, representing the output of the output layer neurons; z \nis a p x 1 vector, representing the outputs of the hidden layer neurons; u is a n x 1 \nvector, representing the input feature vector; OJ! is a m x 1 vector, known as the \nthreshold vector for the output layer neurons; Oz is a p x 1 vector, representing \nthe threshold vector for the hidden layer neurons; A and B are matrices of m x p \nand p x n respectively. The matrices A, and B are the synaptic weights connecting \nthe hidden layer neuron to the output layer neuron; and the input layer neurons, \nand the hidden layer neurons respectively. For simplicity sake, we will assume the \nnonlinearity function to be a sigmoid function, i.e., \n\n1 \n\nf(a)=I+e- a \n\n(4) \n\nThe unknown parameters A, B, OJ!, Oz can be obtained by minimizing an error cri(cid:173)\nterion: \n\nJ = L(di - Yi)2 \n\np \n\n.=1 \n\n(5) \n\nwhere P is the total number of examplars, di , i = 1,2, ... , P are the desired outputs \nwhich we wish the MLP to learn. \nBy differentiating the error criterion J with respect to the unknown parameters, \nlearning algorithms can be obtained. \n\nThe learning rules are as follows: \n\nwhere Anew is the next estimate of the matrix A, T denotes the transpose of a \nvector or a matrix. TJ is a learning constant. A(y) is a m x m diagonal matrix, \nwhose dia~onal elements are / (Y')' i = 1,2, ... , m. The vector e is m x 1, and it is \ngiven by e = [(d 1 - yd, (d2 - Y2), ... , (dm - Ym)]T. \nThe updating equation for the B matrix is given by the following \n\n(6) \n\nwhere 6 is a p x 1 vector, given by \n\n(7) \n\n\f1154 \n\nTsoi, So, and Sergejew \n\nfJ = AT A(y)e \n\nand the other parameters are as defined above. \n\nThe threshold vectors can be obtained as follows: \n\nand \n\n(8) \n\n(9) \n\n(10) \n\nThus it is observed that once a set of initial conditions for the unknown parameters \nare given, this algorithm will find a set of parameters which will converge to a value, \nrepresenting possibly a local minimum of the error criterion. \n\n3 Pre-processing of the EEG signal \n\nA cursory glance at a typical EEG signal of a normal subject, or a psychiatrically ill \nsubject would convince anyone that one cannot hope to distinguish the signal just \nfrom the raw data alone. Consequently, one would need to perform considerable \nfeature extraction (data pre-processing) before classification can be made. There \nare two types of simple feature extraction techniques, viz., frequency domain and \ntime domain (see e.g., Kay, 1988, Marple, 1987). In the frequency domain, one \nperforms a fast Fourier transform (FFT) on the data. Often it is advantageous \nto modify the signal by a window function. This will reduce the sidelobe leakage \n(Kay, and Marple, 1981, Harris, 1978). it is possible to use the average spectrum, \nobtained by averaging the spectrum over a number of frames, as the input feature \nvector to the MLP. \nIn the time domain, one way to pre-process the data is to fit a parametric model to \nthe underlying data. There are a number of parametric models, e.g., autoregressive \n(AR) model, an autoregressive moving average (ARMA) model (see e.g., Kay, 1988, \nMarple, 1987). \n\nThe autoregressive model can be described as follows: \n\nN \n\nSe = L OjSe_j + fe \n\nj=1 \n\n(11) \n\nwhere Se is the signal at time t; ft is assumed to be a zero mean Gaussian vari(cid:173)\nable with variance (T2. The unknown parameters OJ, j = 1,2, ... , N describe the \nspectrum of the signal. They can be obtained by using standard methods, e.g., \nYule-Walker equations, or Levinson algorithm (Kay, 1988, Marple, 1987). \n\nThe autoregressive moving average (ARMA) model can be seen as a parsimonious \nmodel for an AR model with a large N. Hence, as long as we are not concerned \n\n\fClassification of Electroencephalogram Using Artificial Neural Networks \n\n1155 \n\nabout the interpretation of the AR model obtained, there is little advantage to \nuse the more complicated ARMA model. Subsequently, in this paper, we will only \nconsider the AR models. \n\nOnce the AR parameters are determined, then they can be used as the input fea(cid:173)\ntures to the MLP. It is known that the AR parametric model basically produces \na smoothed spectral envelope (Kay, 1988, Marple, 1987). Thus, the model param(cid:173)\neters of AR is another way to convey the spectral information to the MLP. This \ninformation is different in quality to that given by the FFT technique in that the \nFFT transforms both signal and noise alike, while the parametric models tend to \nfavor the signal more and is more effective in suppressing the noise effect. \n\nIn some preliminary work, we find that the frequency domain extracted features do \nnot give rise to good classification results using MLP. Henceforth we will consider \nonly the AR parameters as input feature vectors. \n\n4 Classification Results \n\nIn this section, we will summarise the results of the experiments in using the AR \nparametric method of feature extraction as input parameters to the MLP. \n\nWe obtained EEG data pertaining to normal subjects, subjects who have been di(cid:173)\nagnosed as suffering from severe obsessive compulsive disorder (OCD), and subjects \nwho have been diagnosed as suffering from severe schizophrenia. Both the OCD and \nthe schizophrenic subjects are under medication. The subjects are chosen so that \ntheir medication as well as their medical conditions are at a steady state, i.e., they \nhave not changed over a long period of time. The diagnosis is made by a number of \ntrained neurologists. The data files are chosen only if the diagnosis from the experts \nconcur. \n\nWe use the standard 10-20 recording system (Cooper, 1980), i.e., there are 19 \nchannels of EEG recording, each sampled at 128 Hz. The recording were obtained \nwhile the subject is at rest. Some data screening has been performed to screen out \nthe segment of data which contains any artifact. In addition, the data is anti-aliased \nfirst by a low pass filter before being sampled. The sampled data is then low pass \nfiltered at 30 Hz to get rid of any higher frequency components. \nWe have chosen one channel, viz., the Cz channel (the channel which is the recording \nof the signal at the azimuth of the scalp). This channel can be assumed to be \nrepresentative of the brain state from the overall EEG recording of the scalp. 1 \nThis time series is employed for feature extraction purposes. \n\nFor time domain feature extraction, we first convert the time series into a zero \nmean one. Then a data frame of one second duration is chosen 2 as the basic time \nsegmentation of the series. An AR model is fitted to this one second time frame to \n\n1 From some preliminary work, it can be shown that this channel can be considered as \na linear combination of the other channels, in the sense that the prediction error variance \nis small. \n\n2It has been found that the EEG signal is approximately stationary for signal length of \none second. Hence employing a data frame width of one second ensures that the underlying \nassumptions in the AR modelling technique are valid (Marple, 1988) \n\n\f1156 \n\nTsoi, So, and Sergejew \n\nextract a feature vector formed by the resulting AR coefficients. \n\nAn average feature vector is acquired from the first 250 seconds, as in practice, the \nfirst 250 seconds usually represent a state of calm in the patient, and therefore the \nEEG is less noisy. After the first 250 seconds, the patient may enter an unstable \ncondition, such as breathing faster and muscle contraction which can introduce \nartifacts. We use an AR model of length between 8 to 15. \nWe have chosen 15 such data file to form our training data set. This consists of 5 \ndata files from normal subjects, 5 from OeD subjects, and 5 from subjects suffering \nfrom schizophrenia. \n\nIn the time domain extracted feature vectors, we use a MLP with 8 input neurons, \n15 hidden layer neurons, and 3 output neurons. The MLP's are trained accordingly. \nWe use a learning gain of 0.01. Once trained, the network is used to classify unseen \ndata files. These unseen data files were pre-classified by human experts. Thus the \ndesired classification of the unseen data files are known. This can then be used to \ncheck the usefulness of the MLP in generalising to unseen data files. \n\nThe results 3 are shown in table 1. \n\nThe unseen data set consists of 6 normal subjects, 8 schizophrenic subjects, and \n10 obsessive compulsive disorder subjects. It can be observed that the network \ncorrectly classifies all the normal cases, makes one mistake in classifying the \nschizophrena cases, and one mistake in classifying the OeD cases. \nAlso we have experimented on varying the number of hidden neurons. It is found \nthat the classification accuracy does not vary much with the variation of hidden \nlayer neurons from 15 to 50. \n\nWe have also applied the MLP on the frame by frame data, i.e., before they are being \naveraged over the 250 second interval. However, it is found that the classification \nresults are not as good as the ones presented. We were puzzled by this result as \nintuitively, we would expect the frame by frame results to be better than the ones \npresented. \nA plausible explanation for this puzzle is given as follows: \nthe EEG data is in \ngeneral quite noisy. In the frame by frame analysis, the features extracted may \nvary considerably over a short time interval, while in the approach taken here, the \nnoise effect is smoothed out by the averaging process. \n\nOne may ask: why would the methods presented work at all? In traditional EEG \nanalysis (Lindsay & Holmes, 1984), FFT technique is used to extract the frame \nby frame frequency responses. The averaged frequency response is then obtained \nover this interval. Traditionally only four dominant frequencies are observed, viz., \nthe \"alpha\", \"beta\", \"delta\", and \"theta\" frequencies. It is a basic result in EEG \nresearch that these frequencies describe the underlying state of the subject. For \nexample, it is known that the \"alpha\" wave indicates that the subject is at rest. An \nEEG technologist uses data in this form to assist in the diagnosis of the subject. \nOn the other hand, it is relatively well known in signal processing literature (Kay, \n\n3The results shown are typical results. We have used different data files for training \nand testing. In most cases, the classification errors on the unseen data files are small, \nsimilar to those presented here. \n\n\fClassification of Electroencephalogram Using Artificial Neural Networks \n\n1157 \n\nactivation of activation of activation of predicted \nnormal \n\nocd \n0.201 \n0.103 \n0.086 \n0.020 \n0.000 \n0.065 \n0.042 \n0.163 \n0.050 \n0.004 \n0.061 \n0.014 \n0.000 \n0.921 \n0.922 \n0.940 \n0.993 \n0.997 \n0.889 \n0.946 \n0.985 \n0.003 \n0.940 \n0.585 \n\nclasses \nnormal \nnormal \nnormal \nnormal \nnormal \nnormal \nschiz \nschiz \nschiz \nschiz \nschiz \nschiz \nschiz \nocd \nocd \nocd \nocd \nocd \nocd \nocd \nocd \nschiz \nocd \nocd \n\noriginal \nclasses \nnormall 0.905 \nnormal2 0.963 \nnormal3 0.896 \nnormal4 0.870 \nnormal5 0.760 \nnorma16 0.752 \n0.000 \nschiz1 \n0.000 \nschiz2 \n0.002 \nschiz3 \nschiz4 \n0.015 \nschiz5 \n0.000 \nschiz6 \n0.377 \nschiz7 \n0.062 \nschiz8 \n0.006 \n0.017 \nocdl \nocd2 \n0.027 \nocd3 \n0.000 \nocd4 \n0.000 \nocd5 \n0.015 \n0.000 \nocd6 \n0.002 \nocd7 \n0.006 \nocd8 \nocd9 \n0.045 \nocdlO \n0.085 \n\nschiz \n0.008 \n0.006 \n0.021 \n0.057 \n0.237 \n0.177 \n0.981 \n0.941 \n0.845 \n0.989 \n0.932 \n0.695 \n0.898 \n0.086 \n0.134 \n0.007 \n0.033 \n0.014 \n0.138 \n0.150 \n0.034 \n0.960 \n0.005 \n0.046 \n\nTable 1: Classification of unseen EEG data files \n\n1988, Marple, 1987) to view the AR model as indicative of the underlying frequency \ncontent of the signal. In fact, an 8th order AR model indicates that the signal \ncan be considered to consist of 4 underlying frequencies. Thus, intuitively, the \n8th order AR model averaged over the first 250 seconds represents the underlying \ndominant frequencies in the signal. Given this interpretation, it is not surprising \nthat the results are so good. The features extracted are similar to those used in the \ndiagnosis of the subjects. The classification technique, which in this case, the MLP, \nis known to have good generalisation capabilities (Hertz, Krogh, Palmer, 1991). \nThis contrasts the techniques used in previous attempts in the 1960's, e.g., the \ndiscriminant analysis, which is known to have poor generalisation capabilities. Thus, \none of the reasons why this approach works may be attributed to the generalisation \ncapabilities of the MLP. \n\n5 Conclusions \n\nIn this paper, a method for classifying EEG data obtained from subjects who are \nnormal, OCD or schizophrenia has been obtained by using the AR parameters as \n\n\f1158 \n\nTsoi, So, and Sergejew \n\ninput feature vectors. \ncapabili ties. \n\nIt is found that such a network has good generalisation \n\n6 Acknowledgments \n\nThe first and third author wish to acknowledge partial financial support from the \nAustralian National Health and Medical Research Council. In addition, the first \nauthor wishes to acknowledge partial financial support from the Australian Research \nCouncil. \n\n7 References \n\nBasar, E. (1980). EEG-Brain Dynamics - Relation between EEG and Brain Evoked \nPotentials. Elsevier/North Holland Biomedical Press. \nCooper, R. (1980). EEG Technology. Butterworths. Third Editions. \nHarris, F.J. (1978). \"On the Use of windows for Harmonic Analysis with the \nDiscrete Fourier Transform\". Proceedings IEEE. Vol. 66, pp 51-83. \nHerrmann, W.M. (1982). Electroencephalography in Drug Research. Butterworths. \nHertz, J. Krogh, A, Palmer, R. (1991) Introduction to The Theory of Neural Com(cid:173)\nputation. Addison Wesley, Redwood City, Calif. \nKay, S.M., Marple, S.L., Jr. (1981). \"Spectrum Analysis - A Modern Perspective\". \nProceeding IEEE. Vol. 69, No. 11, Nov. pp 1380 - 1417. \nKay, S.M. (1988) Modern Spectral Estimation - Theory and Applications Prentice \nhall. \nKolb, B., Whishaw, I.Q. (1990). Fundamentals of Human Neuropsychology. Free(cid:173)\nman, New York. \nLindsay, D.F., Holmes, J.E. (1984). Basic Human Neurophysiology. Elsevier. \n\nLippmann, R.P. (1987) \" An introduction to computing with neural nets\" IEEE \nAcoustics Speech and Signal Processing Magazine. Vol. 4, No.2, pp 4-22. \nLippmann, R.P., Moody, J., Touretzky, D.S. (Ed.) (1991). Advances in Neural \nInformation Processing Systems 9. Morgan Kaufmann, San Mateo, Calif. \nMarple, S.L., Jr. (1987). Digital Spectral Analysis with Applications. Prentice Hall. \nTouretzky, D.S. (Ed.) (1989). Advances in Neural Information Processing Systems \n1. Morgan Kaufmann, San Mateo, Calif. \n\nTouretzky, D.S. (Ed.) (1990). Advances in Neural Information Processing Systems \n2. Morgan Kaufmann, San Mateo, Calif. \n\n\fPART XII \n\nWORKSHOPS \n\n\f\f", "award": [], "sourceid": 855, "authors": [{"given_name": "A C", "family_name": "Tsoi", "institution": null}, {"given_name": "D S C", "family_name": "So", "institution": null}, {"given_name": "A", "family_name": "Sergejew", "institution": null}]}