on the image dataset. (As shown in Coughlan and Yuille (1999), there is a \nconvex set of distributions, of which the true distribution P(I) is a member, which \nshare the same mean statistics < \u00a2 >.) This second kind of ambiguity stems from \nthe fact that the mean statistics convey only a fraction of the information that \nis contained in the true distribution P(I). To resolve this second ambiguity it is \nnecessary to extract more information from the image data set. The simplest way \nto achieve this is to use a larger (or more informative) set of filters to lower the \nentropy of PM(I) (this topic is discussed in more detail in Zhu, Wu and Mumford \n(1997, 1998), Coughlan and Yuille (1999)). Alternatively, one can extend Minimax \nto include second-order statistics, i.e. the covariance of \u00a2 in addition to its mean d. \nThis is an important topic for future research. \n\n3 The Minutemax Approximations \n\nWe now illustrate the phase space approach by showing that suitable approximations \nof the phase space factor g( \u00a2) make it easy to estimate the potential X given the \nempirical mean d. The resulting fast approximations to Minimax Learning are \ncalled \"Minutemax\" algorithms. \n\n3.1 The Gaussian Approximation of g(\u00a2) \n\nIf the phase space factor g( \u00a2) may be approximated as a multi-variate Gaussian \n(see Coughlan and Yuille (1999) for a justification of this approximation) then the \nprobability distribution PM(\u00a2) = g(\u00a2)e).\u00b7i/Z(X) reduces to another multi-variate \nGaussian. (Note that we are making the Gaussian approximation in \u00a2 space- the \nspace of all possible image statistics histograms-and not filter response (feature) \nspace.) As we will see, this result greatly simplifies the problem of estimating the \npotential X. \nRecall that the mean and covariance of g( \u00a2) are denoted by c and G, respectively. \nThe null space of G has dimension n and is spanned by vectors il(1), il(2) ... il(n). \nAs discussed in Theorem 1, for all feasible values of \u00a2 (Le. all \u00a2 E

gauss= .,p = J, and so we can write a linear equation relating X and \nd: d= c+cX. \nIt can be shown (Zhu - private communication) that solving this equation is equiv(cid:173)\nalent to one step of Newton-Raphson for minimization of an appropriate cost func(cid:173)\ntion. This will fail to be a good approximation if the cost function is highly non(cid:173)\nquadratic. As explained in Coughlan and Yuille (1999), the Gaussian approximation \nis also equivalent to a second-order perturbation expansion of the partition function \nZ(X); higher-order corrections can be made by computing higher-order moments of \ng($). \n\n3.2 Experimental Results \n\nWe tested the Gaussian Minutemax procedure on two sets of filters: a single (fine \nscale) image gradient filter aI/ax, and a set of multi-scale image gradient filters \ndefined at three scales, similar to those used by Zhu and Mumford (1997). In both \nsets, the fine scale gradient filter is linear with kernel (1, -1), representing a dis(cid:173)\ncretization of a/ax. In the second set, the medium scale filter kernel is (U2 , -U2 )/4 \nand the coarse scale kernel is (U4 , -U4 )/16, where Un denotes the n x n matrix of all \nones. The responses of the medium and coarse filters were rounded (i.e. quantized) \nto the nearest integer, thus adding a non-linearity to these filters. Finally, d was \nmeasured on a data set of over 100 natural images; the fine scale components of d \nare shown in the first panel of Figure (1) and were empirically very similar to the \nmedium and coarse scale components. \nA X that solves d = c + cX is shown in the third panel of Figure (1) for the first \nfilter (along with c in the second panel) and in the three panels of Figure (2) for \nthe multi-scale filter set. The form of X is qualitatively similar to that obtained by \nZhu and Mumford (1997) (bearing in mind that Zhu disregarded any filter responses \nwith magnitude above Q/2, i.e. his filter response range is half of ours). In addition, \nthe eigenvectors of C with small eigenvalues are large away from the origin, so one \nshould not trust the values of the potentials there (obtained by any algorithm). \n\nZhu and Mumford (1997) report interactions between filters ' applied at different \nscales. This is because the resulting potentials appear different than the potential \nat the fine scale even though the histograms appear similar at all scales. We argue, \nhowever, that some of this \"interaction\" is due to the different phase factors at \ndifferent scales. In other words the potentials would look different at different scales \neven if the empirical histograms were identical because of differing phase factors. \n\n3.3 The Multinomial Approximation of g(\u00a2) \n\nMany learning theories simply make probability distributions on feature space. How \ndo they differ from Minimax Entropy Learning which works on image space? By \n\n\f766 \n\n1. M Coughlan and A. L. Yuille \n\n., ~ \nI \n\n~ \n\n; \n! \n\n. , \n\n,'. \n\n.. \n\nl \n\nFigure 2: From left to right: the fine, medium and coarse components of - X as \ncomputed by the Gaussian Minutemax approximation. \n\n\". \n\nFigure 3: Left to right: d, c, and -X as given by multinomial approximation for the \na / ax filter at fine scale. \n\nexamining the phase factor we will show that the two approaches are not identical \nin general. The feature space learning ignores the coupling between the filters \nwhich arise due to how the statistics are obtained. More precisely, the probability \ndistribution obtained on feature space, PF, is equivalent to the Minimax distribution \nPM if, and only if, the phase factor is multinomial. \n\nWe begin the analysis by considering a single filter. As before we define the com(cid:173)\nbinatoric mean c = L:r$ g( i)i. The multinomial approximation of g( i) is equiv(cid:173)\nalent to assuming that the combinatoric frequencies of filter responses are inde(cid:173)\npendent from pixel to pixel. Since the combinatoric frequency of filter response \nj E {I, 2, .. . , fmax} is Cj and there are N