Part of Advances in Neural Information Processing Systems 8 (NIPS 1995)
Recently, several researchers have reported encouraging experimental re(cid:173) sults when using Gaussian or bump-like activation functions in multilayer perceptrons. Networks of this type usually require fewer hidden layers and units and often learn much faster than typical sigmoidal networks. To explain these results we consider a hyper-ridge network, which is a simple perceptron with no hidden units and a rid¥e activation function. If we are interested in partitioningp points in d dimensions into two classes then in the limit as d approaches infinity the capacity of a hyper-ridge and a perceptron is identical. However, we show that for p ~ d, which is the usual case in practice, the ratio of hyper-ridge to perceptron dichotomies approaches pl2(d + 1).