{"title": "Using a Translation-Invariant Neural Network to Diagnose Heart Arrhythmia", "book": "Advances in Neural Information Processing Systems", "page_first": 240, "page_last": 247, "abstract": null, "full_text": "240 \n\nLee \n\nUsing A Translation-Invariant Neural Network \n\nTo Diagnose Heart Arrhythmia \n\nThe lohns Hopkins University \n\nApplied Physics Laboratory \n\nSusan Ciarrocca Lee \n\nLaurel. Maryland 20707 \n\nABSTRACT \n\nDistinctive electrocardiogram (EeG) patterns are created when the heart \nis beating normally and when a dangerous arrhythmia is present. Some \ndevices which monitor the EeG and react to arrhythmias parameterize \nthe ECG signal and make a diagnosis based on the parameters. The \nauthor discusses the use of a neural network to classify the EeG signals \ndirectly. without parameterization. The input to such a network must \nbe translation-invariant. since the distinctive features of the EeG may \nappear anywhere in an arbritrarily-chosen EeG segment. The input \nmust also be insensitive to the episode-to-episode and patient-to-patient \nvariability in the rhythm pattern. \n\n1 INTRODUCTION \nFigure 1 shows internally-recorded transcardiac ECG signals for one patient. The top \ntrace is an example of normal sinus rhythm (NSR). The others are examples of two \narrhythmias: ventricular tachycardia (V1) and ventricular fibrillation (VF). Visually. the \npatterns are quite distinctive. Two problems make recognition of these patterns with a \nneural net interesting. \nThe first problem is illustrated in Figure 2. All traces in Figure 2 are one second samples \nof NSR. but the location of the QRS complex relative to the start of the sample is \nshifted. Ideally. one would like a neural network to recognize each of these presentations \nas NSR. without preprocessing the data to \"center\" it. The second problem can be \ndiscerned by examining the two VT traces in Figure 1. Although quite similar. the two \npatterns are not exactly the same. Substantial variation in signal shape and repetition rate \nfor NSR and VT (VF is inherently random) can be expected. even among rhythms \ngenerated by a single patient. Patient-to-patient variations are even greater. The neural \n\n\fUsing A Translation-Invariant Neural Network \n\n241 \n\nnetwork must ignore variations within rhythm types, while retaining the distinctions \nbetween rhythms. This paper discusses a simple transformation of the ECG time series \ninput which is both translation-invariant and fairly insensitive to rate and shape changes \nwithin rhythm types. \n\no 12 3 4 \n\nTIME (SECONDS) \n\n6 \n\no \n\n0.2 \n\n0.4 \n\nTIME (SECONDS) \n\n0.6 \n\n0.8 \n\nFigure 1: ECG Rhythm Examples \n\nFigure 2: Five Examples ofNSR \n\n2 DISCUSSION \nIf test input to a first order neural network is rescaled, rotated, or translated with respect to \nthe training data, it generally will not be recognized. A second or higher order network \ncan be made invariant to these transformations by constraining the weights to meet \ncertain requirements[Giles, 1988]. The input to the jth hidden unit in a second order \nnetwork with N inputs is: \n\nN-l N-i \n\nN \nL wili + L L w(i,i+k)jXixi+k \ni=1 \n\ni=1 k=1 \n\n(1) \n\nTranslation invariance is introduced by constraining the weights on the fIrst order inputs \nto be independent of input position, and the second order weights to depend only on the \ndifference between indices (k), rather than on the index pairs (i,i+k)[Giles, 1988]. \nRewriting equation (1) with these constraints gives: \n\n\f242 \n\nLee \n\nN \n\nN-l \n\nN-k \n\nWj L xi + L Wkj L xi~+k \n\ni=l \n\nk=l \n\ni=l \n\n(2) \n\nThis is equivalent to a fIrst order neural network where the original inputs, xi' have been \nreplaced by new inputs, Yi' consisting of the following sums: \n\nN \n\nN-k \n\nYk = L xixi+k' k=1,2, ... .N-l \n\n(3) \n\ni=l \n\nWhile a network with inputs in the form of equation (3) is translation invariant, it is \nquite sensitive to shape and rate variations in the ECG input data. For ECG recognition, \na better function to compute is: \n\nN \n\nN-k \n\nYo = L ABS(xi) , Yk = L ABS(xi - ~+k) , \n\nk=1,2, ... ,N-l \n\n(4) \n\ni=l \n\ni=l \n\nBoth equations (3) and (4) produce translation-invariant outputs, as long as the input time \nseries contains a \"shape\" which occupies only part of the input window, for example, the \nsingle cycle of the sine function in Figure 3a. A periodic time series, like the sine wave \nin Figure 3b, will not produce a truly translation-invariant output. Fortunately, the \ntranslation sensitivity introduced by applying equations (3) or (4) to periodic time series \nis small for small k, and only becomes important when k becomes large. One can see \nthis by considering the extreme case, when k=N-l, and the fInal \"sum\" in equation (4) \nbecomes the absolute value of the difference between the fIrst and the last point in the \ninput time series; clearly, this value will vary as the sine wave in Figure 3b is moved \nthrough the input window. If the upper limit on the sum over k gets no larger than N/2, \n\n) \n\n(.) \n\n(b) \n\nFigure 3: Examples of signals which will (a) and will not (b) have invariant transforms \n\n\fUsing A Translation-Invariant Neural Network \n\n243 \n\nequations (3) and (4) provide a neural network input which is nearly translation-invariant \nfor realistic time series. Additionally, the output of equation (4) can be used to \ndiscriminate among NSR, VT, and VF, but is not unduly sensitive to variations within \neach rhythm type. \nThe ECG signals used in this experiment were drawn from a data set of internally recorded \ntranscardiac ECG signals digitized at 100 Hz. The data set comprised 203 10-45 second \nsegments obtained from 52 different patients. At least one segment of NSR and one \nsegment of an arrhythmia was available for each patient. In addition, an \"exercise\" NSR \nat 150 BPM was artificially constructed by cutting baseline out of the natural resting \nNSR segment. Arrhythmia detection systems which parameterize the ECG can have \ndifficulty distinguishing high rate NSR's from slow arrhythmias. \n\nTo obtain a training data set for the neural network, short pieces were extracted from the \noriginal rhythm segments. Since the rhythms are basically periodic, it was possible to \nchose the endpoints so that the short, extracted piece could be be repeated to produce a \nfacsimile of the original signal. The upper trace in Figure 4 shows an original VT \nsegment. The boxed area is the extracted piece. The lower trace shows the extracted piece \nchained end-to-end to construct a segment as long as the original. The segments \n\n,----------------\n\n, \n\n~ULL ARRHYTHMIA S~OM~NT \nI \nI \n\n- - - - - - - - - - - - - -CONSTRUCTED TRAININO SI!OMI!NT \n\nI \nI \n\n6 \n\ne \ne \nTIMI!(SECONOS) \n\n,. \n\n9 \n\n18 \n\n11 \n\n12 \n\n13 \n\n14 \n\nFigure 4: Original and Artificially-Constructed Training Segments \n\n\f244 \n\nLee \n\nconstructed from the short. extracted pieces were used as training input Typically. the \ntraining data segment contained less than 25% of the original data. \nThe length of the input window was arbitrarily set at 1.35 seconds (135 points); by \nchoosing this window. all NSR inputs were guaranteed to include at least one QRS \ncomplex. The upper limit on the sum over k in equation (4) was set to 50. The \nresulting 51 inputs were presented to a standard back propagation network with seven \nhidden units and four outputs. Although one output is sufficient to discriminate between \nNSR and an arrhythmia. the networks were trained to differentiate among two types of VT \n(generally distinguished by rate). and VF as well. \nA separate training set was constructed and a separate network was trained for each patient. \nThe weights thus derived for a given patient were then tested on that patient's original \nrhythm segments. To test the translation in variance of the network. every possible \npresentation of an input rhythm segment was tested. To do this. a sliding window of 135 \npoints was moved through the input data stream one point (1/100th of a second) at a \ntime. At each point. the output of equation (4) (appropriately normalized) was presented \nto the network. and the resulting diagnosis recorded. \n\n3 RESULTS \nA percentage of correct diagnoses was calculated for each segment of data. For a segment \nT seconds long. there are 100x(T-1.35) different presentations of the rhythm. \nPresentations which included countershock. burst pacing. gain changes on the recording \nequipment. post-shock rhythms. etc. were excluded. since the network had not been \ntrained to recognize these phenomena. The percentage correct was then calculated for the \nremaining presentations as: \n\nl00x(Number of correct diagnoses )/(Number of presentations) \n\nThe percentage of correct diagnoses for each patient was calculated similarly. except that \nall segments for a particular patient were included in the count. Table 1 presents these \nresults. \n\nTable 1: Results \n\nPatients \n\nSegments \n\n100% Correct \n99%-90% Correct \n90%-80% Correct \n80%-70% Correct \n<70% Correct \nCould Not Be Trained \n\nTotal \n\n29 \n19 \n3 \n0 \n0 \n1 \n\n52 \n\n163 \n23 \n6 \n4 \n1 \n6 \n\n203 \n\n\fUsing A Translation-Invariant Neural Network \n\n245 \n\nThe network could not be trained for one patient. This patient had two arrhythmia \nsegments. one identified as VT and the other as VF. Visually. the two traces were \nextremely similiar; after twenty thousand iterations, the network could not distinguish \nthem. The network could certainly have been trained to distinguish between NSR and \nthose two rhythms, but this was not attempted. \nThe number of segments for which all possible presentations of the rhythm were \ndiagnosed correctly clearly establishes the translation invariance of the input. The \nnetwork was also quite successful in distinguishing among NSR and various arrhythmias. \nUnfortunately, for application in inplantable defibrillators or even critical care \nmonitoring, the network must be more nearly perfect. \nThe errors the network made could be separated into two broad classes. First, short \nsegments of very erratic arrhythmias were misdiagnosed as NSR. Figure 5 illustrates this \ntype of error. The error occurs because NSR is mainly characterized by a lack of \ncorrelation. Typically. the misdiagnosed segment is quite short. 1 second or less. This \ntype of error might be avoided by using longer (longer than 1.35 second) input windows \nwhich could bridge the erratic segments. Also, a more responsive automatic gain control \non the signal might help. since the erratic segments generally had a smaller amplitude \n\nTRANSCARDAIC ~CQ \n\nN~TWORK OIAQNOSIS \n\nVP \n\nVT NO. 2 \n\nVT NO. 1 \n\nNSR \n\nCAN'T 10 \n\ne \n\n1 \n\n2 \n\n3 \n\nI \n466 \n\nI \n\nTIME (S~CONDS) \n\nI \n\n., \n\ne \n\n18 \n\nFigure 5: Ventricular Fibrillation Segment Misdiagnosed as NSR \n\n\f246 \n\nLee \n\nthan the surrounding segments. The network response to input windows containing large \nshifts in the amplitude of the input signal (for example, countershock and gain changes) \nwas usually NSR. \n\nThe second class of errors occurred when the network misdiagnosed rhythms which were \nnot included in the training set. For example, one patient had a few beats of a very slow \nVT in his NSR segment. This slow VT was not extracted for training. Only a fast (200 \nBPM) VT and VF were presented to this network as possible arrhythmias. Consequently, \nduring testing. the network identified the slow VT as NSR. The network did identify \nsome rhythms it was not trained on, but only if these rhythms did not vary too much \nfrom the training rhythms. Generally, the rate of the \"unknown\" rhythm had to be within \n20 BPM of a training rhythm to be recognized. Morphology is also important, in that \nvery regular rhythms, such as the top trace in Figure 6, and noisier rhythms, like the \nbottom trace, appear quite different to the network. \n\nI \ne \n\nI \n\ne.5 \n\nI \n1 \n\nI \n\n1.6 \n\nI \n2 \n\nI \n\n2.6 \n\nI \n\nI \n3 \nTIME