{"title": "Non-Linear Statistical Analysis and Self-Organizing Hebbian Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 407, "page_last": 414, "abstract": null, "full_text": "Non-linear Statistical Analysis and \nSelf-Organizing Hebbian Networks \n\nJonathan L. Shapiro and Adam Priigel-Bennett \n\nDepartment of Computer Science \n\nThe University, Manchester \n\nManchester, UK \n\nM139PL \n\nAbstract \n\nNeurons learning under an unsupervised Hebbian learning rule can \nperform a nonlinear generalization of principal component analysis. \nThis relationship between nonlinear PCA and nonlinear neurons is \nreviewed. The stable fixed points of the neuron learning dynamics \ncorrespond to the maxima of the statist,ic optimized under non(cid:173)\nlinear PCA. However, in order to predict. what the neuron learns, \nknowledge of the basins of attractions of the neuron dynamics is \nrequired. Here the correspondence between nonlinear PCA and \nneural networks breaks down. This is shown for a simple model. \nMethods of statistical mechanics can be used to find the optima \nof the objective function of non-linear PCA. This determines what \nthe neurons can learn. In order to find how the solutions are parti(cid:173)\ntioned amoung the neurons, however, one must solve the dynamics. \n\n1 \n\nINTRODUCTION \n\nLinear neurons learning under an unsupervised Hebbian rule can learn to perform a \nlinear statistical analysis ofthe input data. This was first shown by Oja (1982), who \nproposed a learning rule which finds the first principal component of the variance \nmatrix of the input data. Based on this model, Oja (1989), Sanger (1989), and \nmany others have devised numerous neural networks which find many components \nof this matrix. These networks perform principal component analysis (PCA), a \nwell-known method of statistical analysis. \n\n407 \n\n\f408 \n\nShapiro and Priigel-Bennett \n\nSince PCA is a form of linear analysis, and the neurons used in the PCA networks \nare linear -\nthe output of these neurons is equal to the weighted sum of inputs; \nthere is no squashing function of sigmoid - it is obvious to ask whether non-linear \nHebbian neurons compute some form of non-linear PCA? Is this a useful way to \nunderstand the performance of the networks? Do these networks learn to extract \nfeatures of the input data which are different from those learned by linear neurons? \nCurrently in the literature, the phrase \"non-linear PCA\" is used to describe what \nis learned by any non-linear generalization of Oja neurons or other PCA networks \n(see for example, Oja, 1993 and Taylor, 1993). \n\nIn this paper, we discuss the relationship between a particular form of non-linear \nHebbian neurons (Priigel-Bennett and Shapiro, 1992) and a particular generaliza(cid:173)\ntion of non-linear PCA (Softky and Kammen 1991). It is clear that non-linear neu(cid:173)\nrons can perform very differently from linear ones. This has been shown through \nanalysis (Priigel-Bennett and Shapiro, 1993) and in application (Karhuenen and \nJoutsensalo, 1992). It can also be very useful way of understanding what the neu(cid:173)\nrons learn. This is because non-linear PCA is equivalent to maximizing some objec(cid:173)\ntive function. The features that this extracts from a data set can be studied using \ntechniques of statistical mechanics. However, non-linear PCA is ambiguous because \nthere are multiple solutions. What the neuron can learn is given by non-linear PCA. \nThe likelihood of learning the different solutions is governed by the dyanamics cho(cid:173)\nsen to implement non-linear PCA, and may differ in different implementations of \nthe dynamics. \n\n2 NON-LINEAR HEBBIAN NEURONS \n\nNeurons with non-linear activation functions can learn to perform very different \ntasks from those learned by linear neurons. Nonlinear Hebbian neurons have been \nanalyzed for general non-linearities by Oja (1991), and was applied to sinusoidal \nsignal detection by Karhuenen and Joutsensalo (1992). \n\nPreviously, we analysed a simple non-linear generalization of Oja's rule (Priigel(cid:173)\nBennett and Shapiro, 1993). We showed how the shape of the neuron activation \nfunction can control what a neuron learns. Whereas linear neurons learn to a \nstatistic mixture of all of the input patterns, non-linear neurons can learn to become \ntuned to individual patterns, or to small clusters of closely correlated patterns. \n\nIn this model, each neuron has weights, Wi is the weight from the ith input, and \nresponds to the usual sum of input times weights through an activation function \nA(y). This is assumed a simple power-law above a threshold and zero below it. I.e. \n\nHere \u00a2 is the threshold, b controls the power of the power-law, xf is the ith compo(cid:173)\nnent of the pth pattern, and VP = Li xf Wi. Curves of these functions are shown \nin figure laj if b = 1 the neurons are threshold-linear. For b > 1 the curves can be \nthought of as low activation approximations to a sigmoid which is shown in figure \n1 b. The generalization of Oja's learning rule is that the change in the weights 8Wi \n\n(1) \n\n\fNon-Linear Statistical Analysis and Self-Organizing Hebbian Networks \n\n409 \n\nNeuron Activation Function \n\nb>1 \n\nA Sigmoid Activation Function \n\nb<1 \n\n\u2022 \n\npsp \n\nFigure 1: a) The form of the neuron activation function. Control by two parameters \nband