{"title": "A Neural Network for Real-Time Signal Processing", "book": "Advances in Neural Information Processing Systems", "page_first": 248, "page_last": 255, "abstract": null, "full_text": "248 MalkofT \n\nA Neural Network for Real-Time Signal Processing \n\nDonald B. Malkoff \n\nGeneral Electric / Advanced Technology Laboratories \n\nMoorestown Corporate Center \n\nBuilding 145-2, Route 38 \nMoorestown, NJ 08057 \n\nABSTRACT \n\nThis paper describes a neural network algorithm that (1) performs \ntemporal pattern matching in real-time, (2) is trained on-line, with \na single pass, (3) requires only a single template for training of each \nrepresentative class, (4) is continuously adaptable to changes in \nbackground noise, (5) deals with transient signals having low signal(cid:173)\nto-noise ratios, (6) works in the presence of non-Gaussian noise, (7) \nmakes use of context dependencies and (8) outputs Bayesian proba(cid:173)\nbility estimates. The algorithm has been adapted to the problem of \npassive sonar signal detection and classification. It runs on a Con(cid:173)\nnection Machine and correctly classifies, within 500 ms of onset, \nsignals embedded in noise and subject to considerable uncertainty. \n\nINTRODUCTION \n\n1 \nThis paper describes a neural network algorithm, STOCHASM, that was developed \nfor the purpose of real-time signal detection and classification. Of prime concern \nwas capability for dealing with transient signals having low signal-to-noise ratios \n(SNR). \nThe algorithm was first developed in 1986 for real-time fault detection and diagnosis \nof malfunctions in ship gas turbine propulsion systems (Malkoff, 1987). It subse(cid:173)\nquently was adapted for passive sonar signal detection and classification. Recently, \nversions for information fusion and radar classification have been developed. \n\nCharacteristics of the algorithm that are of particular merit include the following: \n\n\fA Neural Network for Real-Time Signal Processing \n\n249 \n\n\u2022 It performs well in the presence of either Gaussian or non-Gaussian noise, \n\neven where the noise characteristics are changing. \n\n\u2022 Improved classifications result from temporal pattern matching in real-time, \n\nand by taking advantage of input data context dependencies. \n\n\u2022 The network is trained on-line. Single exposures of target data require one \npass through the network. Target templates, once formed, can be updated \non-line. \n\n\u2022 Outputs consist of numerical estimates of closeness for each of the template \n\nclasses, rather than nearest-neighbor \"all-or-none\" conclusions. \n\n\u2022 The algorithm is implemented in parallel code on a Connection Machine. \n\nSimulated signals, embedded in noise and subject to considerable uncertainty, are \nclassified within 500 ms of onset. \n\n2 GENERAL OVERVIEW OF THE NETWORK \n2.1 REPRESENTATION OF THE INPUTS \n\nSonar signals used for training and testing the neural network consist of pairs of \nsimulated chirp signals that are superimposed and bounded by a Gaussian enve(cid:173)\nlope. The signals are subject to random fluctuations and embedded in white noise. \nThere is considerable overlapping (similarity) of the signal templates. Real data \nhas recently become available for the radar domain. \n\nOnce generated, the time series of the sonar signal is subject to special transforma(cid:173)\ntions. The outputs of these transformations are the values which are input to the \nneural network. In addition, several higher-level signal features, for example, zero \ncrossing data, may be simultaneously input to the same network, for purposes of \ninformation fusion. The transformations differ from those used in traditional sig(cid:173)\nnal processing. They contribute to the real-time performance and temporal pattern \nmatching capabilities of the algorithm by possessing all the following characteristics: \n\n\u2022 Time-Origin Independence: The sonar input signal is transformed so the \nresulting time-frequency representation is independent of the starting time \nof the transient with respect to its position within the observation window \n(Figure 1). \"Observation window\" refers to the most recent segment of the \nsonar time series that is currently under analysis. \n\n\u2022 Translation Independence: The time-frequency representation obtained \nby transforming the sonar input transient does not shift from one network \ninput node to another as the transient signal moves across most of the obser(cid:173)\nvation window (Figure 1). In other words, not only does the representation \nremain the same while the transient moves, but its position relative to specific \nnetwork nodes also does not change. Each given node continues to receive its \n\n\f250 Malkoff \n\nusual kind of information about the sonar transient, despite the relative posi(cid:173)\ntion of the transient in the window. For example, where the transform is an \nFFT, a specific input layer node will always receive the output of one specific \nfrequency bin, and none other. \nWhere the SNR is high, translation independence could be accomplished by \na simple time-transformation of the representation before sending it to the \nneural network. This is not possible in conditions where the SNR is sufficiently \nlow that segmentation of the transient becomes impossible using traditional \nmethods such as auto-regressive analysis; it cannot be determined at what \ntime the transient signal originated and where it is in the observation window . \n\n\u2022 The representation gains time-origin and translation .ndependence without \n\nsacrificing knowledge about the signal's temporal characteristics or its com(cid:173)\nplex infrastructure. This is accomplished by using (1) the absolute value of \nthe Fourier transform (with respect to time) of the spectrogram of the sonar \ninput, or (2) the radar Woodward Ambiguity Function. The derivation and \ncharacterization of these methods for representing data is discussed in a sep(cid:173)\narate paper (Malkoff, 1990). \n\nEncoded Outputs \n\nOlff.ent Aspects of the TransfOtmltlon Output. must \nalways enter their same 'l*lal node. of the Network \nand result In 1M same c/asslflcatlon. \n\nFigure 1: Despite passage of the transient, encoded data enters the same net(cid:173)\nwork input nodes (translation independence) and has the same form and output \nclassification (time-origin independence) . \n\n\fA Neural Network for Real-Time Signal Processing \n\n251 \n\n2.2 THE NETWORK ARCHITECTURE \n\nSonar data, suitably transformed, enters the network input layer. The input layer \nserves as a noise filter, or discriminator. The network has two additional layers, \nthe hidden and output layers (Figure 2). Learning of target templates, as well as \nclassification of unknown targets, takes place in a single \"feed-forward\" pass through \nthese layers. Additional exposures to the same target lead to further enhancement of \nthe template, if training, or refinement of the classification probabilities, if testing. \nThe hidden layer deals only with data that passes through the input filter. This data \npredominantly represents a target. Some degree of context dependency evaluation \nof the data is achieved. Hidden layer data and its permutations are distributed \nand maintained intact, separate, and transparent. Because of this, credit (error) \nassignment is easily performed. \n\nIn the output layer, evidence is accumulated, heuristically evaluated, and trans(cid:173)\nformed into figures of merit for each possible template class. \n\nIINPU'f LA YEA \n\nI OUTPUT LAYER I \n\nFigure .2: STOCHASM network architecture. \n\n2.2.1 The Input Layer \n\nEach input layer node receives a succession of samples of a unique part of the sonar \nrepresentation. This series of samples is stored in a first-in, first-out queue. \n\nWith the arrival of each new input sample, the mean and standard deviation of \nthe values in the queue are recomputed at every node. These statistical parameters \n\n\f252 Malkdf \n\nare used to detect and extract a signal from the background noise by computing \na threshold for each node. Arriving input values that exceed the threshold are \npassed to the hidden layer and not entered into the queues. Passed values are \nexpressed in terms of z-values (the number of standard deviations that the input \nvalue differs from the mean of the queued values). Hidden layer nodes receive only \ndata exceeding thresholds; they are otherwise inactive. \n\n2.2.2 The Hidden Layer \n\nThere are three basic types of hidden layer nodes: \n\n\u2022 The first type receive values from only a single input layer node; they reflect \n\nabsolute changes in an input layer parameter. \n\n\u2022 The second type receive values from a pair of inputs where each of those values \n\nsimultaneously deviates from normal in the same direction. \n\n\u2022 The third type receive values from a pair of inputs where each of those values \n\nsimultaneously deviates from normal in opposite directions. \n\nFor N data inputs, there are a total of N2 hidden layer nodes. \n\nValues are passed to the hidden layer only when they exceed the threshold levels \ndetermined by the input node queue. The hidden layer values are stored in first(cid:173)\nin, first-out queues, like those of the input layer. If the network is in the testing \nmode, these values represent signals awaiting classification. The mean and standard \ndeviation are computed for each of these queues, and used for subsequent pattern \nmatching. If, instead, the network is in the training mode, the passed values and \ntheir statistical descriptors are stored as templates at their corresponding nodes. \n\n2.2.3 Pattern Matching Output Layer \n\nPattern matching consists of computing Bayesian likelihoods for the undiagnosed \ninput relative to each template class. The computation assumes a normal distri(cid:173)\nbution of the values contained within the queue of each hidden layer node. The \nstatistical parameters of the queue representing undiagnosed inputs are matched \nwith those of each of the templates. For example, the number of standard devia(cid:173)\ntions distance between the means of the \"undiagnosed\" queue and a template queue \nmay be used to demarcate an area under a normal probability distribution. This \narea is then used as a weight, or measure, for their closeness of match. Note that \nthis computation has a non-linear, sigmoid-shaped output. \n\nThe weights for each template are summed across all nodes. Likelihood values \nare computed for each template. A priori data is used where available, and the \nresults normalized for final outputs. The number of computations is minimal and \ndone in parallel; they scale linearly with the number of templates per node. If \nmore computer processing hardware were available, separate processors could be \nassigned for each template of every node, and computational time would be of \nconstant complexity. \n\n\fA Neural Network for Real-Time Signal Processing \n\n253 \n\n3 PERFORMANCE \nThe sonar version was tested against three sets of totally overlapping double chirp \nsignals, the worst possible case for this algorithm. Where training and testing \nSNR's differed by a factor of anywhere from 1 to 8, 46 of 48 targets were correctly \nrecognized . \n\nIn extensive simulated testing against radar and jet engine modulation data, classi(cid:173)\nfications were better than 95% correct down to -25 dB using the unmodified sonar \nalgorithm. \n\n4 DISCUSSION \nDistinguishing features of this algorithm include the following capabilities: \n\n\u2022 Information fusion. \n\n\u2022 Improved classifications. \n\n\u2022 Real-time performance. \n\n\u2022 Explanation of outputs. \n\n4.1 \n\nINFORMATION FUSION \n\nIn STOCHASM, normalization of the input data facilitates the comparison of sep(cid:173)\narate data items that are diverse in type. This is followed by the fusion, or com(cid:173)\nbination, of all possible pairs of the set of inputs. The resulting combinations are \ntransferred to the hidden layer where they are evaluated and matched with tem(cid:173)\nplates. This allows the combining of different features derived either from the same \nsensor suite or from several different sensor suites. The latter is often one of the \nmost challenging tasks in situation assessment. \n\n4.2 \n\nIMPROVED CLASSIFICATIONS \n\n4.2.1 Multiple Output Weights per Node \n\nIn STOCHASM, each hidden layer node receives a single piece of data represent(cid:173)\ning some key feature extracted from the undiagnosed target signal. In contrast, \nthe node has many separate output weights; one for every target template. Each \nof those output weights represents an actual correlation between the undiagnosed \nfeature data and one of the individual target templates. STOCHASM optimizes \nthe correlations of an unknown input with each possible class. In so doing, it also \ngenerates figures of merit (numerical estimates of closeness of match) for ALL the \npossible target classes, instead of a single \"all-or-none\" classification. \n\nIn more popularized networks, there is only one output weight for each node. Its \neffectiveness is diluted by having to contribute to t!1e correlation between one undi(cid:173)\nagnosed feature data and MANY different templates. In order to achieve reasonable \nclassifications, an extra set of input connection weights is employed. The connection \n\n\f254 \n\nMalkofT \n\nweights provide a somewhat watered-down numerical estimate of the contribution \nof their particular input data feature to the correct classification, ON THE A VER(cid:173)\nAGE, of targets representing all possible classes. They employ iterative procedures \nto compute values for those weights, which prevents real-time training and gener(cid:173)\nates sub-optimal correlations. Moreover, because all of this results in only a single \noutput for each hidden layer node, another set of connection weights between the \nhidden layer node and each node of the output layer is required to complete the \nclassification process. Since these tend to be fully connected layers, the number of \nweights and computations is prohibitively large. \n\n4.2.2 Avoidance of Nearest-Neighbor Techniques \n\nSome popular networks are sensitive to initial conditions. The determination of \nthe final values of their weights is influenced by the initial values assigned to them. \nThese networks require that, before the onset of training, the values of weights \nbe randomly assigned. Moreover, the classification outcomes of these networks is \noften altered by changing the order in which training samples are submitted to the \nnetwork. Networks of this type may be unable to express their conclusions in figures \nof merit for all possible classes. When inputs to the network share characteristics \nof more than one target class, these networks tend to gravitate to the classification \nthat initially most closely resembles the input, for an \"all-or-none\" classification. \nSTOCHASM has none of these drawbacks \n\n4.2.3 Noisy Data \n\nThe algorithm handles SNR's of lower-than-one and situations where training and \ntesting SNR's differ. Segmentation of one dimensional patterns buried in noise is \ndone automatically. Even the noise itself can be classified. The algorithm can adapt \non-line to changing background noise patterns. \n\n4.3 REAL-TIME PERFORMANCE \n\nThere is no need for back-propagation/ gradient-descent methods to set the weights \nduring training. Therefore, no iterations or recursions are required. Only a single \nfeed-forward pass of data through the network is needed for either training or clas(cid:173)\nsification. Since the number of nodes, connections, layers, and weights is relatively \nsmall, and the algorithm is implemented in parallel, the compute time is fast enough \nto keep up with real-time in most application domains. \n\n4.4 EXPLANATION OF OUTPUTS \n\nThere is strict separation of target classification evidence in the nodes of this net(cid:173)\nwork. In addition, the evidence is maintained so that positive and negative corre(cid:173)\nlation data is separate and easily accessable. This enables improved credit (error) \nassignment that leads to more effective classifications and the potential for making \navailable to the operator real-time explanations of program behavior. \n\n\fA Neural Network for Real-Time Signal Processing \n\n255 \n\n4.5 FUTURE DIRECTIONS \n\nPrevious versions of the algorithm dynamically created, destroyed, or re-arranged \nnodes and their linkages to optimize the network, minimize computations, and elim(cid:173)\ninate unnecessary inputs. This algorithm also employed a multi-level hierarchical \ncontrol system. The control system, on-line and in real-time, adjusted sampling \nrates and queue lengths, governing when the background noise template is permit(cid:173)\nted to adapt to current noise inputs, and the rate at which it does so. Future versions \nof the Connection Machine version will be able to effect the same procedures. \n\nEfforts are now underway to: \n\n1. Improve the temporal pattern matching capabilities. \n\n2. Provide better heuristics for the computation of final figures of merit from the \nmassive amount of positive and negative correlation data resident within the \nhidden layer nodes. \n\n3. Adapt the algorithm to radar domains where time and spatial warping prob(cid:173)\n\nlems are prominent. \n\n4. Simulate more realistic and complex sonar transients, with the expectation \n\nthe algorithm will perform better on those targets. \n\n5. Apply the algorithm to information fusion tasks. \n\nReferences \n\nMalkoff, D.B., \"The Application of Artificial Intelligence to the Handling of Real(cid:173)\nTime Sensor Based Fault Detection and Diagnosis,\" Proceedings of the Eighth Ship \nControl Systems Symposium, Volume 3, Ministry of Defence, The Hague, pp 264-\n276. Also presented at the Hague, Netherlands, October 8, 1987. \n\nMalkoff, D.B., \"A Framework for Real-Time Fault Detection and Diagnosis Using \nTemporal Data,\" The International Journal for Artificial Intelligence in Engineering, \nVolume 2, No.2, pp 97-111, April 1987. \nMalkoff, D.B. and L. Cohen, \"A Neural Network Approach to the Detection Problem \nUsing Joint Time-Frequency Distributions,\" Proceedings of the IEEE 1990 Interna(cid:173)\ntional Conference on Acoustics, Speech, and Signal Processing, Albuquerque, New \nMexico, April 1990 (to appear). \n\n\f\fPART III: \nVISION \n\n\f", "award": [], "sourceid": 284, "authors": [{"given_name": "Donald", "family_name": "Malkoff", "institution": null}]}