{"title": "On the Effect of Analog Noise in Discrete-Time Analog Computations", "book": "Advances in Neural Information Processing Systems", "page_first": 218, "page_last": 224, "abstract": null, "full_text": "On the Effect of Analog Noise in \n\nDiscrete-Time Analog Computations \n\nWolfgang Maass \n\nInstitute for Theoretical Computer Science \n\nTechnische Universitat Graz* \n\nPekka Orponen \n\nDepartment of Mathematics \n\nUniversity of Jyvaskylat \n\nAbstract \n\nWe introduce a model for noise-robust analog computations with \ndiscrete time that is flexible enough to cover the most important \nconcrete cases, such as computations in noisy analog neural nets \nand networks of noisy spiking neurons. We show that the presence \nof arbitrarily small amounts of analog noise reduces the power of \nanalog computational models to that of finite automata, and we \nalso prove a new type of upper bound for the VC-dimension of \ncomputational models with analog noise. \n\n1 \n\nIntroduction \n\nAnalog noise is a serious issue in practical analog computation. However there exists \nno formal model for reliable computations by noisy analog systems which allows us \nto address this issue in an adequate manner. The investigation of noise-tolerant \ndigital computations in the presence of stochastic failures of gates or wires had been \ninitiated by [von Neumann, 1956]. We refer to [Cowan, 1966] and [Pippenger, 1989] \nfor a small sample of the nllmerous results that have been achieved in this direction. \nIn all these articles one considers computations which produce a correct output not \nwith perfect reliability, but with probability ~ t + p (for some parameter p E (0, t D\u00b7 \nThe same framework (with stochastic failures of gates or wires) hac; been applied \nto analog neural nets in [Siegelmann, 1994]. \nThe abovementioned approaches are insufficient for the investigation of noise in \nanalog computations, because in analog computations one has to be concerned not \nonly with occasional total failures of gates or wires, but also with \"imprecision\", i.e. \nwith omnipresent smaller (and occa<;ionally larger) perturbations of analog outputs \n\n\u2022 Klosterwiesgasse 32/2, A-BOlO Graz, Austria. E-mail: maass@igi.tu-graz.ac.at. \nt P. O. Box 35, FIN-4035l JyvaskyHi, Finland. E-mail: orponen@math.jyu.fi. Part of \nthis work was done while this author was at the University of Helsinki, and during visits \nto the Technische Universitat Graz and the University of Chile in Santiago. \n\n\fOn the Effect of Analog Noise in Discrete-Time Analog Computations \n\n219 \n\nof internal computational units. These perturbations may for example be given \nby Gaussian distributions. Therefore we introduce and investigate in this article \na notion of noise-robust computation by noisy analog systems where we assume \nthat the values of intermediate analog values are moved according to some quite \narbitrary probability distribution. We consider - as in the traditional framework for \nnoisy digital computations - arbitrary computations whose output is correct with \nsome given probability 2: ~ + P (for p E (O,~]) . We will restrict our attention to \nanalog computation with digital output. Since we impose no restriction (such as \ncontinuity) on the type of operations that can be performed by computational units \nin an analog computational system, an output unit of such system can convert an \nanalog value into a binary output via \"thresholding\". \nOur model and our Theorem 3.1 are somewhat related to the analysis of probabilistic \nfinite automata in [Rabin, 1963]. However there the finiteness of the state space \nsimplifies the setup considerably. [Casey, 1996] addresses the special case of analog \ncomputations on recurrent neural nets (for those types of analog noise that can \nmove an internal state at most over a distance c) whose digital output is perfectly \nreliable (Le. p = 1/2 in the preceding notation).l \nThe restriction to perfect reliability in [Casey, 1996] has immediate consequences \nfor the types of analog noise processes that can be considered, and for the types of \nmathematical arguments that are needed for their investigation. In a computational \nmodel with perfect reliability of the output it cannot happen that an intermediate \nstate \u00a7. occurs at some step t both in a computation for an input !!2 that leads to \noutput \"0\" , and at step t in a computation for the same input \"!!2\" that leads to \noutput \"I\" . Hence an analysis of perfectly reliable computations can focus on par(cid:173)\ntitions of intermediate states \u00a7. according to the computations and the computation \nsteps where they may occur. \nApparently many important concrete cases of noisy analog computations require a \ndifferent type of analysis. Consider for example the special case of a sigmoidal neural \nnet (with thresholding at the output), where for each input the output of an internal \nnoisy sigmoidal gate is distributed according to some Gaussian distribution (perhaps \nrestricted to the range of all possible output values which this sigmoidal gate can \nactually produce). In this case an intermediate state \u00a7. of the computational system \nis a vector of values which have been produced by these Gaussian distributions. \nObviously each such intermediate state ~ can occur at any fixed step t in any \ncomputation (in particular in computations with different network output for the \nsame network input). Hence perfect reliability of the network output is unattainable \nin this case. For an investigation of the actual computational power of a sigmoidal \nneural net with Gaussian noise one haC) to drop the requirement of perfect reliability \nof the output, and one has to analyze how probable it is that a particular network \noutput is given, and how probable it is that a certain intermediate state is assumed. \nHence one has to analyze for each network input and each step t the different \n\nIThere are relatively few examples for nontrivial computations on common digital or \nanalog computational models that can achieve perfect reliability of the output in spite of \nnoisy internal components. Most constructions of noise-robust computational models rely \non the replication of noisy computational units (see [von Neumann, 1956], [Cowan, 1966]). \nThe idea of this method is that the average of the outputs of k identical noisy computational \nunits (with stochastically independent noise processes) is with high probability close to the \nexpected value of their output, if k is sufficiently large. However for any value of k there \nexists in general a small but nonzero probability that this average deviates strongly from \nits expected value. In addition, if one assumes that the computational unit that produces \nthe output of the computations is also noisy, one cannot expect that the reliability of the \noutput of the computation is larger than the reliability of this last computational unit. \nConsequently there exist many methods for reducing the error-probability of the output \nto a small value, but these methods cannot achieve error probability 0 at the output. \n\n\f220 \n\nW. Maass and P. Orponen \n\nprobability distributions over intermediate states \u00a7.. that are induced by computations \nof the noisy analog computational system. In fact, one may view the set of these \nprobability distributions over intermediate states \u00a7.. as a generalized set of \"states\" of \na noisy analog computational system. In general the mathematical structure of this \ngeneralized set of \"states\" is substantially more complex than that of the original \nset of intermediate states \u00a7.. \u2022 We have introduced in [Maass, Orponen, 1996] some \nbasic methods for analyzing this generalized set of \"states\", and the proofs of the \nmain results in this article rely on this analysis. \nThe preceding remarks may illustrate that if one drops the assumption of perfect \nreliability of the output, it becomes more difficult to prove upper bounds for the \npower of noisy analog computations. We prove such upper bounds even for the case \nof stochastic dependencies among noises for different internal units, and for the case \nof nonlinear dependencies of the noise on the current internal state. Our results also \ncover noisy computations in hybrid analog/digital computational models, such as for \nexample a neural net combined with a binary register, or a network of noisy spiking \nneurons where a neuron may temporarily assume the discrete state \"not-firing\". \nObviously it becomes quite difficult to analyze the computational effect of such \ncomplex (but practically occuring) types of noise without a rigorous mathematical \nframework. We introduce in section 2 a mathematical framework that is general \nenough to subsume all these cases. The traditional case of noisy digital computations \nis captured as a special case of our definition. \nOne goal of our investigation of the effect of analog noise is to find out which features \nof analog noise have the most detrimental effect on the computational power of an \nanalog computational system. This turns out to be a nontrivial question.2 As a \nfirst step towards characterizing those aspects and parameters of analog noise that \nhave a strong impact on the computational power of a noisy analog system, the \nproof of Theorem 3.1 (see [Maass, Orponen, 1996]) provides an explicit bound on \nthe number of states of any finite automaton that can be implemented by an analog \ncomputational system with a given type of analog noise. It is quite surprising to \nsee on which specific parameters of the analog noise the bound depends. Similarly \nthe proofs of Theorem 3.4 and Theorem 3.5 provide explicit (although very large) \nupper bounds for the VC-dimension of noisy analog neural nets with batch input, \nwhich depend on specific parameters of the analog noise. \n\n2 Preliminaries: Definitions and Examples \n\nAn analog discrete-time computational system (briefly: computational system) M \nis defined in a general way as a 5-tuple (0, pO, F, 1:, s), where 0, the set of states, \nis a bounded subset of R d , po E 0 is a distinguished initial state, F ~ 0 is the \nset of accepting states, 1: is the input domain, and s : 0 x E ~ 0 is the transition \nfunction. To avoid unnecessary pathologies, we impose the conditions that 0 and \nF are Borel subsets of R d, and for each a E 1:, s(p, a) is a measurable function of \np. We also assume that E contains a distinguished null value U, which may be used \nto pad the actual input to arbitrary length. The nonnull input domain is denoted \nby 1:0 = 1: - {U}. \n\n2For example, one might think that analog noise which is likely to move an internal \nstate over a large distance is more harmful than another type of analog noise which keeps \nan internal state within its neighborhood. However this intuition is deceptive. Consider \nthe extreme case of analog noise in a Sigmoidal neural net which moves a gate output \nx E [-1,1] to a value in the e-neighborhood of -x. This type of noise moves some values \nx over large distances, but it appears to be less harmful for noise-robust computing than \nnoise which moves x to an arbitrary value in the lOe-neighborhood of x . \n\n\fOn the Effect of Analog Noise in Discrete-Time Analog Computations \n\n221 \n\nThe intended noise-free dynamics of such a system M is as follows. The system \nstarts its computation in state pO, and on each single computation step on input \nelement a E Eo moves from its current state p to its next state s(p, a). After \nthe actual input sequence has been exhausted, M may still continue to make pure \ncomputation steps. Each pure computation step leads it from a state p to the state \ns(p, U). The system accepts its input if it enters a state in the class F at some point \nafter the input has finished. \nFor instance, the recurrent analog neural net model of [Siegelmann, Sontag, 1991] \n(also known as the \"Brain State in a Box\" model) is obtained from this general \nframework as follows. For a network N with d neurons and activation values be(cid:173)\ntween -1 and 1, the state space is 0 = [-1, 1] d. The input domain may be chosen \nas either E = R or E = {-l,O, I} (for \"online\" input) or E = R n (for \"batch\" \ninput). \n\nFeedforward analog neural nets may also be modeled in the same manner, except \nthat in this case one may wish to select as the state set 0 := ([-1, 1] U {dormant})d, \nwhere dormant is a distinguished value not in [-1, 1]. This special value is used \nto indicate the state of a unit whose inputs have not all yet been available at the \nbeginning of a given computation step (e.g. for units on the l-th layer of a net at \ncomputation steps t < 1). \nThe completely different model of a network of m stochastic spiking neurons (see \ne.g. [Maass, 1997]) is also a special case of our general framework. 3 \nLet us then consider the effect of noise in a computational system M. Let Z (p, B) \nbe a function that for each state p E 0 and Borel set B ~ 0 indicates the probability \nof noise moving state p to some state in B. The function Z is called the noise process \naffecting M, and it should satisfy the mild conditions of being a stochastic kernel, \ni.e., for each p E 0, Z(p,.) should be a probability distribution, and for each Borel \nset B, Z(-, B) should be a measurable function. \nWe assume that there is some measure IL over 0 so that Z(p, \u00b7) is absolutely contin(cid:173)\nuous with respect to It for each p EO, i.e. IL(B) = 0 implies Z(p, B) = 0 for every \nmeasurable B ~ 0 . By the Radon-Nikodym theorem, Z then possesses a density \nkernel with respect to IL, i.e. there exists a function z(\u00b7,\u00b7) such that for any state \np E 0 and Borel set B ~ 0, Z(p, B) = JqEB z(p, q) dJL. \n\nWe assume that this function z(',\u00b7) has values in [0,00) and is measurable. (Actu(cid:173)\nally, in view of our other conditions this can be assumed without loss of generality.) \nThe dynamics of a computational system M affected by a noise process Z is now \ndefined as follows. 4 If the system starts in a state p, the distribution of states q \nobtained after a single computation step on input a E E is given by the density \nkernel 1f'a(P, q) = z(s(p, a), q). Note that as a composition of two measurable func-\n\n3In this case one wants to set nsp := (U~=l [0, T)i U {not-firing})m, where T > 0 is \na sufficiently large constant so that it suffices to consider only the firing history of the \nnetwork during a preceding time interval of length T in order to determine whether a \nneuron fires (e.g. T = 30 ms for a biological neural system). If one partitions the time \naxis into discrete time windows [0, T) , [T, 2T) ,. .. , then in the noise-free case the firing \nevents during each time window are completely determined by those in the preceding one. \nA component Pi E [0, T)i of a state in this set nsp indicates that the corresponding neuron \ni has fired exactly j times during the considered time interval, and it also specifies the j \nfiring times of this neuron during this interval. Due to refractory effects one can choose \n1< 00 for biological neural systems, e.g. 1= 15 for T = 30 ms. With some straightforward \nformal operations one can also write this state set nsp as a bounded subset of Rd for \nd:= l\u00b7m. \n\n4We would like to thank Peter Auer for helpful conversations on this topic. \n\n\f222 \n\nW. Maass and P. Orponen \n\ntions, 1ra is again a measurable function. The long-term dynamics of the system \nis given by a Markov process, where the distribution 1rza (p, q) of states after Ixal \ncomputation steps with input xa E E* starting in state p is defined recursively by \n1rza (p,q) = IrEo1rz(p,r) '1ra(r,q) dp,. \n\nLet us denote by 1rz (q) the distribution 1rz (pO,q), i.e. the distribution of states of \nM after it has processed string x, starting from the initial state pO. Let p > 0 \nbe the required reliability level. In the most basic version the system M accepts \n(rejects) some input x E I':o if IF 1rz (q) dll 2: ! + p (respectively ~ ~ - p). In less \ntrivial cases the system may also perform pure computation steps after it has read \nall of the input. Thus, we define more generally that the system M recognizes a set \nL ~ I':o with reliability p if for any x E Eo: \n\nx E L \n\nx \u00a2 L \n\n\u00a2:> 11rzu (q) dp, 2: ~ + P for some u E {u}* \n\u00a2:> 11rzu (q) dp, ~ ~ - p for all u E {u}*. \n\nThis covers also the case of batch input, where Ixl = 1 and Eo is typically quite \nlarge (e.g. Eo = Rn). \n\n3 Results \n\nThe proofs of Theorems 3.1, 3.4, 3.5 require a mild continuity assumption for the \ndensity functions z(r,\u00b7) , which is satisfied in all concrete cases that we have exam(cid:173)\nined. We do not require any global continuity property over 0 for the density func(cid:173)\ntions z(r,\u00b7) because there are important special cases (see [Maass, o rponen , 1996]), \nwhere the state space 0 is a disjoint union of subspaces 0 1 , .\u2022\u2022 ,Ok with different \nmeasures on each subspace. We only assume that for some arbitrary partition of \nn into Borel sets 0 1 , ... ,Ok the density functions z(r,\u00b7) are uniformly continuous \nover each OJ , with moduli of continuity that can be bounded independently of r . \nIn other words, we require that z(\u00b7, .) satisfies the following condition: \nWe call a function 1r(\" .) from 0 2 into R piecewise uniformly continuous if for every \nc> 0 there is a 8 > 0 such that for every rEO, and for all p, q E OJ, j = 1, ... , k: \n(1) \n\nimplies 11r(r,p) - 1r(r, q)1 ~ c. \n\nII p - q II ~ 8 \n\nIf z(',') satisfies this condition, we say that the re~mlting noise process Z is piecewise \nuniformly continuous. \n\nTheorem 3.1 Let L ~ I':o be a set of sequences over an arbitrary input domain \nEo. Assume that some computational system M, affected by a piecewise uniformly \ncontinuous noise process Z, recognizes L with reliability p, for some arbitrary p > O. \nThen L is regular. \n\nThe proof of Theorem 3.1 relies on an analysis of the space of probability density \nfunctions over the state set 0 . An upper bound on the number of states of a de(cid:173)\nterministic finite automaton that simulates M can be given in terms of the number \nk of components OJ of the state set 0 , the dimension and diameter of 0 , a bound \non the values of the noise density function z , and the value of 8 for c = p/4p,(0) in \n\u2022 \ncondition (1). For details we refer to [Maass, Orponen, 1996].5 \n\n'\" A corresponding result is claimed in Corollary 3.1 of [Casey, 1996] for the special case \n\n\fOn the Effect of Analog Noise in Discrete-TIme Analog Computations \n\n223 \n\nRemark 3.2 In stark contrast to the results of [Siegelmann, Sontag, 1991} and \n[Maass, 1996} for the noise-free case, the preceding Theorem implies that both re(cid:173)\ncurrent analog neural nets and recurrent networks of spiking neurons with online \ninput from ~o can only recognize regular languages in the presence of any reasonable \ntype of analog noise, even if their computation time is unlimited and if they employ \narbitrary real-valued parameters. \n\nLet us say that a noise process Z defined on a set 0 ~ Rd is bounded by 11 if it can \nmove a state P only to other states q that have a distance $ 11 from p in the LI -norm \nover Rd , Le. if its density kernel z has the property that for any p = (PI, ... ,Pd) \nand q = (ql, ... , qd) E 0, z(p, q) > 0 implies that Iqi - Pil $ 11 for i = 1, ... , d. \nObviously 11-bounded noise processes are a very special class. However they provide \nan example which shows that the general upper bound of Theorem 3.1 is a sense \noptimal: \nTheorem 3.3 For every regular language L ~ {-1, 1}* there is a constant 11 > 0 \nsuch that L can be recognized with perfect reliability (i. e. p = ~) by a recurrent \nanalog neural net in spite of any noise process Z bounded by 11. \n\u2022 \n\nWe now consider the effect of analog noise on discrete time analog computations \nwith batch-input. The proofs of Theorems 3.4 and 3.5 are quite complex (see \n[Maass, Orponen, 1996]). \n\nTheorem 3.4 There exists a finite upper bound for the VC-dimension of lay(cid:173)\nered feedforward sigmoidal neural nets and feedforward networks of spiking neurons \nwith piecewise uniformly continuous analog noise (for arbitrary real-valued inputs, \nBoolean output computed with some arbitrary reliability p > OJ and arbitrary real(cid:173)\nvalued ''programmable parameters\") which does not depend on the size or structure \nof the network beyond its first hidden layer. \n\u2022 \n\nTheorem 3.5 There exists a finite upper bound for the VC-dimension of recurrent \nsigmoidal neural nets and networks of spiking neurons with piecewise uniformly con(cid:173)\ntinuous analog noise (for arbitrary real valued inputs, Boolean output computed with \nsome arbitrary reliability p > 0, and arbitrary real valued ''programmable parame(cid:173)\nters\") which does not depend on the computation time of the network, even if the \ncomputation time is allowed to vary for different inputs. \n\u2022 \n\n4 Conclusions \n\nWe have introduced a new framework for the analysis of analog noise in discrete(cid:173)\ntime analog computations that is better suited for \"real-world\" applications and \nof recurrent neural nets with bounded noise and p = 1/2 , i.e. for certain computations \nwith perfect reliability. This case may not require the consideration of probability density \nfunctions. However it turns out that the proof for this special case in [Casey, 1996J is wrong. \nThe proof of Corollary 3.1 in [Casey, 1996J relies on the argument that a compact set \"can \ncontain only a finite number of disjoint sets with Jlonempty interior\" . This argument is \nwrong, as the counterexample of the intervals [1/{2i + 1), 1/2iJ for i = 1,2, ... shows. \nThese infinitely many disjoint intervals are all contained in the compact set [0, 1 J . In \naddition, there is an independent problem with the structure of the proof of Corollary 3.1 \nin [Casey, 1996J. It is derived as a consequence ofthe proof of Theorem 3.1 in [Casey, 1996]. \nHowever that proof relies on the assumption that the recurrent neural net accepts a regular \nlanguage. Hence the proof via probability density functions in [Maass, Orponen, 1996] \nprovides the first valid proof for the claim of Corollary 3.1 in [Casey, 1996]. \n\n\f224 \n\nW Maass and P. Orponen \n\nmore flexible than previous models. In contrast to preceding models it also covers \nimportant concrete cases such as analog neural nets with a Gaussian distribution of \nnoise on analog gate outputs, noisy computations with less than perfect reliability, \nand computations in networks of noisy spiking neurons. \nFurthermore we have introduced adequate mathematical tools for analyzing the \neffect of analog noise in this new framework. These tools differ quite strongly from \nthose that have previously been used for the investigation of noisy computations. \nWe show that they provide new bounds for the computational power and VC(cid:173)\ndimension of analog neural nets and networks of spiking neurons in the presence of \nanalog noise. \nFinally we would like to point out that our model for noisy analog computations \ncan also be applied to completely different types of models for discrete time analog \ncomputation than neural nets, such as arithmetical circuits, the random access \nmachine (RAM) with analog inputs, the parallel random access machine (PRAM) \nwith analog inputs, various computational discrete-time dynamical systems and \n(with some minor adjustments) also the BSS model [Blum, Shub, Smale, 1989]. Our \nframework provides for each of these models an adequate definition of noise-robust \ncomputation in the presence of analog noise, and our results provide upper bounds \nfor their computational power and VC-dimension in terms of characteristica of their \nanalog noise. \n\nReferences \n[Blum, Shub, Smale, 1989] L. Blum, M. Shub, S. Smale, On a theory of computation \nover the real numbers: NP-completeness, recursive functions and universal machines. \nBulletin of the Amer. Math. Soc. 21 (1989), 1-46. \n\n[Casey, 1996] M. Casey, The dynamics of discrete-time computation, with application to \nrecurrent neural networks and finite state machine extraction. Neural Computation 8 \n(1996), 1135-1178. \n\n[Cowan, 1966] J. D. Cowan, Synthesis of reliable automata from unreliable components. \nAutomata Theory (E. R. Caianiello, ed.), 131-145. Academic Press, New York, 1966. \n[Maass, 1996] W . Maass, Lower bounds for the computational power of networks of spiking \n\nneurons. Neural Computation 8 (1996), 1-40. \n\n[Maass, 1997] W. Maass, Fast sigmoidal networks via spiking neurons, to appear in \nNeural Computation 9, 1997. FTP-host: archive.cis.ohio-state.edu, FTP-filename: \n/pub/neuroprose /maass.sigmoidal-spiking.ps.Z. \n\n[Maass, Orponen, 1996] W. Maass, P. Orponen, On the effect of analog noise in \ndiscrete-time analog computations Uournal version), submitted for publication; see \nhttp://www.math.jyu.fi/ ... orponen/papers/noisyac.ps. \n\n[Pippenger, 1989] N. Pippenger, Invariance of complexity measures for networks with un(cid:173)\n\nreliable gates. J. Assoc. Comput. Mach. 36 (1989), 531-539. \n\n[Rabin, 1963] M. Rabin, Probabilistic automata. Information and Control 6 (1963), 230-\n\n245. \n\n[Siegelmann, 19941 H. T. Siegelmann, On the computational power of probabilistic and \nfaulty networks. Proc. 21st International Colloquium on Automata, Languages, and \nProgramming, 23-34. Lecture Notes in Computer Science 820, Springer-Verlag, Berlin, \n1994. \n\n[Siegelmann, Sontag, 1991] H. T. Siegelmann, E. D. Sontag, Turing computability with \n\nneural nets. Appl. Math. Letters 4(6) (1991), 77-80. \n\n[von Neumann, 1956] J. von Neumann, Probabilistic logics and the synthesis of reliable \n\norganisms from unreliable components. Automata Studies (C. E. Shannon, J. E. Mc(cid:173)\nCarthy, eds.), 329-378. Annals of Mathematics Studies 34, Princeton University Press, \nPrinceton, NJ, 1956. \n\n\f", "award": [], "sourceid": 1213, "authors": [{"given_name": "Wolfgang", "family_name": "Maass", "institution": null}, {"given_name": "Pekka", "family_name": "Orponen", "institution": null}]}