{"title": "Adaptive Neural Net Preprocessing for Signal Detection in Non-Gaussian Noise", "book": "Advances in Neural Information Processing Systems", "page_first": 124, "page_last": 132, "abstract": null, "full_text": "124 \n\nADAPTIVE NEURAL NET PREPROCESSING \n\nFOR SIGNAL DETECTION \nIN NON-GAUSSIAN NOISE1 \n\nRichard P. Lippmann and Paul Beckman \n\nMIT Lincoln Laboratory \n\nLexington, MA 02173 \n\nABSTRACT \n\nA nonlinearity is required before matched filtering in mInimum error \nreceivers when additive noise is present which is impulsive and highly \nnon-Gaussian. Experiments were performed to determine whether the \ncorrect clipping nonlinearity could be provided by a single-input single(cid:173)\noutput multi-layer perceptron trained with back propagation. It was \nfound that a multi-layer perceptron with one input and output node, 20 \nnodes in the first hidden layer, and 5 nodes in the second hidden layer \ncould be trained to provide a clipping nonlinearity with fewer than 5,000 \npresentations of noiseless and corrupted waveform samples. A network \ntrained at a relatively high signal-to-noise (SIN) ratio and then used \nas a front end for a linear matched filter detector greatly reduced the \nprobability of error. The clipping nonlinearity formed by this network \nwas similar to that used in current receivers designed for impulsive noise \nand provided similar substantial improvements in performance. \n\nINTRODUCTION \n\nThe most widely used neural net, the adaptive linear combiner (ALe). is a single(cid:173)\nlayer perceptron with linear input and output nodes. It is typically trained using the \nLMS algorithm and forms one of the most common components of adaptive filters. \nALes are used in high-speed modems to construct equalization filters, in telephone \nlinks as echo cancelers, and in many other signal processing applications where linear \nfiltering is required [9]. The purpose of this study was to determine whether multi(cid:173)\nlayer perceptrons with linear input and output nodes but with sigmoidal hidden \nnodes could be as effective for adaptive nonlinear filtering as ALes are for linear \nfiltering. \n\n1 This work wa.s sponsored by the Defense Advanced Research Projects Agency and the Depart(cid:173)\n\nment of the Air Force . The views expressed are those of the authors and do not reflect the policy \nor position of the U . S. Government. \n\n\fAdaptive Neural Net Preprocessing for Signal Detection \n\n125 \n\nThe task explored in this paper is signal detection with impulsive noise where an \nadaptive nonlinearity is required for optimal performance. Impulsive noise occurs \nin underwater acoustics and in extremely low frequency communications channels \nwhere impulses caused by lightning strikes propagate many thousands of miles [2]. \nThis task was selected because a nonlinearity is required in the optimal receiver, the \nstructure of the optimal receiver is known, and the resulting signal detection error \nrate provides an objective measure of performance. The only other previous studies \nof the use of multi-layer perceptrons for adaptive nonlinear filtering that we are \naware of [6,8] appear promising but provide no objective performance comparisons. \n\nIn the following we first present examples which illustrate that multi-layer percep(cid:173)\ntrons trained with back-propagation can rapidly form clipping and other nonlinear(cid:173)\nities useful for signal processing with deterministic training. The signal detection \ntask is then described and theory is presented which illustrates the need for nOll(cid:173)\nlinear processing with non-Gaussian noise. Nonlinearities formed when the input \nto a net is a corrupted signal and the desired output is the uncorrupted signal are \nthen presented for no noise, impulsive noise, and Gaussian noise. Finally, signal \ndetection performance results are presented that demonstrate large improvements \nin performance with an adaptive nonlinearity and impulsive noise. \n\nFORMING DETERMINISTIC NONLINEARITIES \n\nA theorem proven by Kohnogorov and described in [5] demonstrates that single(cid:173)\ninput single-output continuous nonlinearities can be formed by a multi-layer percep(cid:173)\ntron with two layers of hidden nodes. This proof, however, requires complex nonlin(cid:173)\near functions in the hidden nodes that are very sensitive to the desired input/output \nfunction and may be difficult to realize. \"More recently, Lapedes [4] presented an \nintuitive description of how multi-layer perceptrons with sigmoidal nonlinearities \ncould produce continuous nonlinear mappings. A careful mathematical proof was \nrecently developed by Cybenko [1] which demonstrated that continuous nonlinear \nmappings can be formed using sigmoidal nonlinearities and a multi-layer perceptron \nwith one layer of hidden nodes. This proof, however, is not constructive and does \nnot indicate how many nodes are required in the hidden layer. The purpose of our \nstudy was to determine whether multi-layer perceptrons with sigmoidal nonlineari(cid:173)\nties and trained using back-propagation could adaptively and rapidly form clipping \nnonlinearities. \n\nInitial experiments were performed to determine the difficulty of learning complex \nmappings using multi-layer perceptrons trained using back-propagation. Networks \nwith 1 and 2 hidden layers and from 1 to 50 hidden nodes per layer were evalu(cid:173)\nated. Input and output nodes were linear and all other nodes included sigmoidal \nnonlinearities. Best overall performance was provjded by the three-layer perceptron \nshown in Fig. 1. It has 20 nodes in the first and 5 nodes in the second hidden layer. \nThis network could form a wide variety of mappings and required only slightly more \ntraining than other networks. It was used in all experiments. \n\n\f126 \n\nLippmann and Beckman \n\ny - OUTPUT \n\n(LInear Sum) \n\n(20 Nodes) \n\nFigure 1: The multi-layer perceptron with linear input and output nodes that was \nused in all experiments. \n\nx - INPUT \n\nThe three-layer network shown in Fig. 1 was used to form clipping and other \n\ndeterministic nonlinearities. Results in Fig. 2 demonstrate that a clipping nonlinear(cid:173)\nity ('auld be formed with fewer than 1,000 input samples. Input/output point pairs \nwere determined by selecting the input at random over the range plotted and using \ntlte deterministic clipping function shown as a solid line in Fig. 2. Back-propagation \ntraining [7] was used with the gain term (11) equal to 0.1 and the momentum term \n(0') equal to 0.5. These values provide good convergence rates for the clipping func(cid:173)\ntion and all other functions tested. Initial connection weights were set to small \nrandom values. \n\nThe multi-layer percept ron from Fig. 1 was also used to form the four nonlinear \n\nfunctions shown in Fig. 3. The \"Hole Punch\" is useful in nonlinear signal process(cid:173)\ning. It performs much the same function as the clipper but completely eliminates \namplitudes above a certain threshold le\\'el. Accurate approximation of this function \nrequired more than 50,000 input samples. The \"Step\" has one sharp edge and could \nbe roughly approximated after 2,000 input samples. The \"Double Pulse\" requires \napproximation of two close \"pulses\" and is the nonlinear function analogy of the \ndisjoint region problem studied in [3]. In this examplf>, back-propagation training \napproximated the rightmost pulse first after 5,000 input samples. Both pulses were \nthen approximated fairly well after 50,000 input samples. The \"Gaussian Pulse\" \nis a smooth curve that could be approximated well after only 2,000 input samples. \nThese results demonstrate that back-propagation training with sigmoidal 1I0nlin(cid:173)\nearities can form many different nonlinear functions. Qualitative results on training \ntimes are similar to those reported in [.1]. \nIn this previous study it was de mOll-\n\n\fAdaptive Neural Net Preprocessing for Signal Detection \n\n127 \n\nBEFORE TRAINING \n\n40 TRIALS \n\n1000 TRIALS \n\nDESIRED \n.... 1 ACTUAL \n.!: \n\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022. \n~ 0 \n~ ..... . \n0-1 \n\n-2_2 \n\n-1 \n\n0 \n\n2 \n\n-2 \n\n0 \n\n-1 \nINPUT (I() \n\nt \n\n2 \n\n-2 \n\n-1 \n\n0 \n\n2 \n\nRMS ERROR \n\nO. \n\nII: \n0 \nII: \nII: \nu.I \n(f) \n:IE \nex: \n\n200 \n\n400 \n\n606 \nTRIALS \n\n800 \n\n1000 \n\nFigure 2: Clipping nonlinearities formed using back-propagation training and the \nmulti-layer perceptron from Fig. 1 (top) and the rms error produced by these Ilon(cid:173)\nlinearities versus training time (bottom). \n\nstrated that simple half-plane decision regions could be formed for classification \nproblems with little training while complex disjoint decision regions required long \ntraining times. These new results suggest that complex nonlinearities with many \nsharp discontinuities require much more training time than simple smooth curves. \n\nTHE SIGNAL DETECTION TASK \n\nThe signal detection task was to discriminate between two equally likely input sig(cid:173)\nnals as shown in Fig. 4. One signal (so(t)) corresponds to no input and the other \nsignal (Sl(t)) was a sinewa\\'c pulse with fixed duration and known amplitude, fre(cid:173)\nquency, and phase. Noise was added to these inputs, the resultant signal was passed \nthrough a memoryless nonlinearity, and a matched filter was then used to select hy(cid:173)\npothesis Ho corresponding to no input or HI corresponding to the sinewave pulse. \nThe matched filter multiplied the output of the nonlinearity by the known time(cid:173)\naligned signal waveform, integrated this product over time, and decided HI if the \nresult was greater than a threshold and Ho otherwise. The threshold was selected \nto provide a minimum overall error rate. The optimum nonlinearity Ilsed in the de(cid:173)\ntector depends on the noise distribu tion. If the signal levels are small relati\\'e to the \nnoise levels, then the optimum nonlinearity is approximated by f (J') = t;~ In{ (In (J')). \nwhere r .. (x) is the instantaneous probability density function of the noise (2]- This \nfunction is linear for Gaussian noise but has a clipping shape for impulsi\\'e noise. \n\n\f128 \n\nLippmann and Beckman \n\nHOLE PUNCH \n\n2r---~---r--~---. \n\nSTEP \n\nI \n\nI \n\n\"-. \n\nf-\n\nf-\n\n-, \nl/ \n\u2022 I \n/ \nI \u00b7\u00b7\u00b7\u00b7~\u00b7:\u00b7~5 \n\n.-\n.......... \n.............. \n-\nN.5oo \n-----\nNco 2.000 \n-1 - - ' \n\n2 \n\nN.5.ooo \n-----\n---_. \nNco 50.000 \n2 \n\n0 \n\n-, \n\n............... i.~j \n. \n\n1 \n\n., \n\n\u00b72 \n\n1 \no \n\nDOUBLE PULSE \n\nGAUSSIAN PULSE \n\n0 \n\n., \n\n~ \u00b72 \n\u00b72 \nI-\n:::l \nno \nI-\n::l \n0 \n\n2 \n\n\u00b72 \n\n\u00b71 \n\no \n\n2 \n\nINPUT (xl \n\nFigure 3: Four deterministic nonlinearities formed using the multi-layer perceptron \nfrom Fig. 1. Desired functions are plotted as solid lines while functions formed \nusing back-propagation with different numbers of input samples are plotted using \ndots and dashes. \n\nExamples of the signal, impulsive noise and Gaussian noise are presented in Fig. 5. \nThe signal had a fixed duration of 250 samples and peak amplitude of 1.0. The \nimpulsive noise was defined by its amplitude distribution and inter-arrival time. \nAmplit udes had a zero mean, Laplacian distribution with a standard de\\'iation (IJ) \nof 14.1 in all experiments. The standard deviation was reduced to 2.8 in Fig. 5 \nfor illustrative purposes. Inter-arrival times (L\\T) between noise impulses had a \nPoisson distribution. The mean inter-arrival time was varied in experiments to \nobtain different SIN ratios after adding noise. For example varying inter-arrival \ntimes from 500 to 2 samples results in SIN ratios that vary from roughly 1 dB to \n- 24 dB. Additive Gaussian noise had zero mean and a standard oeviation (IJ) of \n0.1 in all experiments. \n\nADAPTIVE TRAINING WITH NOISE \n\nThe three-layer perceptron was traineq as shown in Fig. 6 using the signal plus Iloist> \nas the input and the uncorrupted signal as the desired output. Network weights \nwere adapted after every sample input using back-propagation training. Adaptive \nnonlinearitics formed during training are shown in Fig. 7. These are similar to those \n\n\fAdaptive Neural Net Preprocessing for Signal Detection \n\n129 \n\nMEMORYlESS \nNONLINEARITY \n\nSO ( I I - - (cid:173)\n\nS,II)~ \n\n~-...o{ \n\nNOISE \n\nN(tl \n\n'I \u2022 I(x) \n\nMATCHED \n\nFILTER \n\nDETECTOR \n\nFigure 4: The signal detection task was to discriminate between a sinewa\\\u00b7e pulse \nand a no-input condition with additive impulsive noise. \n\nUNCORRUPTED SIGNAL \n\nIMPULSE NOISE \n\nGAUSSIAN NOISE \n\nct = 2.8 \n.H~ 12 \n\nmaO \n(J \u2022 O. t \n\no 50 100 150 200 250 0 50 100 150 200 2500 50 100 150 200 250 \n\nSAMPLES \n\nFigure 5: The input to the nonlinearity with no noise, additive impulsive noise, and \nadditive Gaussian noise. \n\nrequired by theory. No noise results in nonlinearity that is linear over the range \nof the input sinewave (-1 to + 1) after fewer than 3,000 input samples. Impulsive \nnoise at a high SIN ratio (6.T = 125 or SIN = -5 dB) results in a nonlinearity \nthat clips above the signal level after roughly 5,000 input samples and then slowly \nforms a \"Hole Punch\" nonlinearity as the number of training samples increases. \nGaussian noise noise results in a nonlinearity that is roughly linear over the range \nof the input sinewave after fewer than 5,000 input samples. \n\nSIGNAL DETECTION PERFORMANCE \n\nSignal detection performance was measured using a matched filter detector and the \nnonlinearity shown in the center of Fig. 7 for 10,000 input training samples. The \nerror rate with a minimum-error matched filter is plotted in Fig. 8 for impulsive \nlIoise at SIN ratios ranging from roughly 5 dB to -24 dB. This error rate was \nestimated from 2,000 signal detection trials. Signal detection performance always \nimproved with the nonlinearity and sometimes the improvement was dramatic. The \nerror rate provided with the adaptively-formed nonlinearity is essentially identical \n\n\f130 \n\nLippmann and Beckman \n\n5(1) \n\nX MULTI-LAYER \n) - -.... --1 PERCEPTRON \n\nY \n\nDESIRED \nOUTPUT \n\n-\n\n+ \nI. \nE \n\nNOISE \n\nBACK-PROPAGATION 1 -_ ..... \n\nALGORITHM \n\nFigure 6: The procedure used for adaptive training. \n\nNO NOISE \n\nIMPULSE NOISE \n\nGAUSSIAN NOISE \n\n2 \n\n~ ... ::;) \nQ. ... ::;) \n-, \n\n0 \n\n0 \n\n\u00b72 \n\n-2 \n\n-, \n\nN.1.000 \nN . 2.ooo \nN.3.ooo \n\n0 \n\n2 \n\n\u00b72 \n\nN. '00,000 \n\n., 0 \n\n2 \n\n\u00b72 \n\n., \n\nN.5,000 \n\n0 \n\n2 \n\nINPUT (x) \n\nFigure 7: Nonlinearities formed with adaptive training with no additive noise, with \nadditive impulsive noise at a SIN level of -5 dB, and with additive Gaussian noise. \n\nto that provided by a clipping nonlinearity that clips above the signal level. This \nerror rate is roughly zero down to - 24 dB and then rises rapidly with higher levels \nof impulsive noise. This rapid increase in error rate below -24 dB is not shown in \nFig. 8. The error rate with linear processing rises slowly as the SIN ratio drops and \nreaches roughly 36% when the SIN ratio is -24 dB. \n\nFurther exploratory experiments demonstrated that the nonlinearity formed by \nback-propagation was not robust to the SIN ratio used during training. A clipping \nnonlinearity is only formed when the number of samples of uncorrupted sinewave \ninput is high enough to form the linear function of the curve and the number of \nsamples of noise pulses is low, but sufficient to form the non~ill('ar clipping section \nof the nonlinearity. At high noise levels the resulting nonlinearity is not linear Over \nthe range of the input signal. It instead resembles a curve that interpolates between \na flat horizontal input-output curve and the desired clipping curve. \n\nSUMMARY AND DISCUSSION \n\nIn summary, it was first demonstrated that multi-layer perccptrons with linear \n\n\fAdaptive Neural Net Preprocessing for Signal Detection \n\n131 \n\n40 \n\nw \n~ < a: \na: \no 20 \na: \na: \nw \n\n10 \n\nLINEAR \n\n~ PROCESSING \n\nADAPTIVE \nNONLINEAR \nPROCESSING \n\no \u2022\u2022\u2022 ..., \u2022\u2022\u2022\u2022\u2022\u2022 .I ..... .I \u2022\u2022\u2022.\u2022 ~. ~ ____ -~. \n\n-Z5 ' \n\n-20 \n\n-15 \n\n-10 \n\n-5 \n\nSIN RATIO \n\no \n\n5 \n\nFigure 8: The signal detection error rate with impulsive noise when the SIN ratio \nafter adding the noise ranges from 5 dB to - 24 dB. \n\ninput and output nodes could approximate prespecified clipping nonlinearities re(cid:173)\nquired for signal detection with impulsive noise with fewer than 1,000 trials of \nback-propagation training. More complex nonlinearities could also be formed but \nrequired longer training times. Clipping nonunearities were also formed adaptively \nusing a multi-layer perceptron with the corrupted signal as the input and the noise(cid:173)\nfree signal as the desired output. Nonlinearities learned using this approach at high \nS / N ratios were similar to those required by theory and improved signal detection \nperformance dramatically at low SIN ratios. Further work is necessary to further \nexplore the utility of this technique for forming adaptive nonlinearities. This work \nshould explore the robustness of the nonlinearity formed to variations in the input \nS / N ratio. \npropagation training for other adaptive nonlinear signal processing tasks such as \nsystem identification, noise removal, and channel modeling. \n\nIt should also explore the use of multi-layer perccptrons and back(cid:173)\n\n\f132 \n\nLippmann and Beckman \n\nReferences \n\n[1] G. Cybenko. Approximation by superpositions of a sigmoidal function. Re(cid:173)\n\nsearch note, Department of Computer Science, Tufts University, October 1988. \n\n[2] J. E. Evans and A. S Griffiths. Design of a sanguine noise processor based \nupon world-wide extremely low frequency (elf) recordings. IEEE Transactions \non Communications, COM-22:528-539, April 1974. \n\n[3] W. M. Huang and R. P. Lippmann. Neural net and traditional classifiers. In \nD. Anderson, editor, Neural Information Processing Systems, pages 387-396, \nNew York, 1988. American Institute of Physics. \n\n[4] A. Lapedes and R. Farber. How neural nets work. In D. Anderson, editor, Neu(cid:173)\n\nral Information Processing Systems, pages 442-456, New York, 1988. American \nInstitute of Physics. \n\n[5] G. G. Lorentz. The 13th problem of Hilbert. In F. E. Browder, editor, Afath(cid:173)\n\nematical Developments Arising from Hilbert Problems. American Mathematical \nSociety, Providence, R.I., 1976. \n\n[6] D. Palmer and D. DeSieno. Removing random noise from ekg signals using a \n\nback propagation network, 1987. HNC Inc., San Diego, CA. \n\n[7] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal represen(cid:173)\n\ntations by error propagation. In D. E. Rumelhart and J. L. McClelland, editors, \nParallel Distributed Processing, volume 1: Foundations, chapter 8. MIT Press, \nCambridge, MA, 1986. \n\n[8] S. Tamura and A. Wailbel. Noise reduction using connectionist models. In Pro(cid:173)\nceedings IEEE International Conference on Acoustics, Speech and Signal Pro(cid:173)\ncessing, volume 1: Speech Processing, pages 553-556, April 1988. \n\n[9] B. Widrow and S. D. Stearns. Adaptive Signal Processing. Prentice-Hall, NJ, \n\n1985. \n\n\f", "award": [], "sourceid": 148, "authors": [{"given_name": "Richard", "family_name": "Lippmann", "institution": null}, {"given_name": "Paul", "family_name": "Beckman", "institution": null}]}