{"title": "Coupled Markov Random Fields and Mean Field Theory", "book": "Advances in Neural Information Processing Systems", "page_first": 660, "page_last": 667, "abstract": null, "full_text": "660 \n\nGeiger and Girosi \n\nCoupled Markov Random Fields and \n\nMean Field Theory \n\nDavi Geigerl \nArtificial Intelligence \nLaboratory, MIT \n545 Tech. Sq. # 792 \nCambridge, MA 02139 \n\nand \n\nABSTRACT \n\nFederico Girosi \nArtificial Intelligence \nLaboratory, MIT \n545 Tech. Sq. # 788 \nCambridge, MA 02139 \n\nIn recent years many researchers have investigated the use of Markov \nRandom Fields (MRFs) for computer vision. They can be applied \nfor example to reconstruct surfaces from sparse and noisy depth \ndata coming from the output of a visual process, or to integrate \nearly vision processes to label physical discontinuities. In this pa(cid:173)\nper we show that by applying mean field theory to those MRFs \nmodels a class of neural networks is obtained. Those networks can \nspeed up the solution for the MRFs models. The method is not \nrestricted to computer vision. \n\nIntroduction \n\n1 \nIn recent years many researchers (Geman and Geman, 1984) (Marroquin et. al. \n1987) (Gamble et. al. 1989) have investigated the use of Markov Random Fields \n(MRFs) for early vision. Coupled MRFs models can be used for the reconstruction \nof a function starting from a set of noisy sparse data, such as intensity, stereo, or \nmotion data. They have also been used to integrate early vision processes to label \nphysical discontinuities. Two fields are usually required in the MRFs formulation \nof a problem: one represents the function that has to be reconstructed, and the \nother is associated to its discontinuities. The reconstructed function, say I, has \n\n1 New address is Siemens Corporate Research, 755 College Road East, Princeton NJ 08540 \n\n\fCoupled Markov Random Fields and Mean Field Theory \n\n661 \n\nFigure 1: The 6quare lattice with the line proceu I and the field J defined at 60me \npizel6. \n\na continuous range and the discontinuity field, say I, is a binary field (1 if there \nis a discontinuity and 0 otherwise, see figure 1). The essence of the MRFs model \nis that the probability distribution of the configuration of the fields, for a given \na set of data, has a Gibbs distribution for some cost functional dependent upon \na small neighborhood. Since the fields have a discrete range, to find the solution \nbecomes a combinatorial optimization problem, that can be solved by means of \nmethods like the Monte Carlo one (simulated annealing (Kirkpatrick and all, 1983), \nfor example). However it has a main drawback: the amount of computer time \nneeded for the implementation. \n\nWe propose to approximate the solution of the problem formulated in the MRFs \nframe with its \"average solution.\" The mean field theory (MFT) allows us to find \ndeterministic equations for MRFs whose solution approximates the solution of the \nstatistical problem. A class of neural networks can naturally solve these equations \n(Hopfield, 1984) (Koch et. al., 1985) (Geiger and Yuille, 1989). An advantage of \nsuch an approach is that the solution of the networks is faster than the Monte Carlo \ntechniques, commonly used to deal with MRFs. \n\nA main novelty in this work, and a quite general one, is to show that the binary \nfield representing the discontinuities can be averaged out to yield an effective the(cid:173)\nory independent of the binary field. The possibility of writing a set of equations \ndescribing the network is also useful for a better understanding of the nature of the \nsolution and of the parameters of the model. We show the network performance in \nan example of image reconstruction from sparse data. \n\n\f662 \n\nGeiger and Girosi \n\n2 MRFs and Bayes approach \nOne of the main attractions of MRFs models in vision is that they can deal directly \nwith discontinuities. We consider coupled MRFs depending upon two fields, J \nand I. For the problem of image reconstruction the field J represents the field to \nbe smoothed and I represents the discontinuities. In this case I is a binary field, \nassuming the values 1 if there is a discontinuity and 0 otherwise. The Markov \nproperty asserts that the probability of a certain value of the field at any given \nsite in the lattice depends only upon neighboring sites. According to the Clifford(cid:173)\nHammersley theorem, the prior probability of a state of the fields J and I has the \nGibbs form: \n\nP(j, I) = _e-fjU(J,I) \n\n1 \nZ \n\n(2.1) \n\nwhere J and I are the fields, e.g. the surface-field and its discontinuities, Z is the \nnormalization constant also known as the partition function, U(j, I) = Ei Ui(J, I) \nis an energy function that can be computed as the sum of local contributions from \neach lattice site i, and f3 is a parameter that is called the inverse of the natural \ntemperature of the field. If a sparse observation 9 for any given surface-field / is \ngiven and a model of the noise is available then one knows the conditional probability \nP(gIJ, I). Bayes theorem then allows us to write the posterior distribution: \n\nP(J II ) = P(gIJ, I)P(j, I) = .!. -fjv(JI,) \n\n, 9 \n\nP(g) \n\n- Ze \n\n. \n\nFor the case of a sparse image corrupted by white gaussian noise \n\nV(j,llg) = L~i(ji _gi)2 + Ui(j,l) \n\ni \n\n(2.2) \n\n(2.3) \n\nwhere ~i; = 1 or 0 depending on whether data are available or not. V(J,llg) is \nsometimes called the visual cost !unction. The solution for the problem is the given \nby some estimate of the fields. The maximum of the posterior distribution or other \nrelated estimates of the \"true\" data-field value can not be computed analytically, \nbut sample distributions of the field with the probability distribution of (2.2) can \nbe obtained using Monte Carlo techniques such as the Metropolis algorithm. These \nalgorithms sample the space of possible values of the fields accwding to the proba.(cid:173)\nbility distribution P(j,llg). \nA drawback of coupled MRFs has been the amount of computer time used in the \nMetropolis algorithm or in simulated annea.ling (Kirkpatrick et. al., 1983). \nA justification for using the mean field (MF) as a measure of the fields, J for ex(cid:173)\nample, resides in the fact that it represents the minimum variance Bayes estimator. \nMore precisely, the average variance of the field J is given by \n\n\fCoupled Markov Random Fields and Mean Field Theory \n\n663 \n\nVa\".! = LU - /)2 PU, llg) \n\nI,l \n\nwhere / is a given estimate of the field, the EJ,l represents the sum over all the \npossible configurations of / and \" and Va\".! is the variance. Minimizing Va\".! with \nrespect to all possible values of / we obtain \n\nThis equation for / defines the deterministic MF equations. \n\n2.1 MFT and Neural Networks \n\nTo connect MRFs to neural networks, we use Mean field theory (MFT) to obtain \ndeterministic equations from MRFs that represent a class of neural networks. \nThe mean field for the values f and I at site i are given by \n\n\" = L \"PU, llg) and \n\nr. = L 'iP(/, Ilg) \n\n1.1 \n\n(2.4) \n\nThe sum over the binary process, Ii \napproximation, \n\n0,1 gives for (2.3), using the mean field \n\ne-tn~i(J'-\"i)2+Ui(J.f#i.I,=1)] \n\nIi = L ----Z-. - - -\n\nI \n\n\u2022 \n\nwhere the partition function Z where factorized as TIi Zi' In this case \nZi = L e-fJ >'i(Ji-\"i)2 (e-fJUi(/,f#,.li=O) + e-fJUi (J,T#i,I,=l\u00bb). \n\nI \n\nAnother way to write the equation for / is \n\n\" \n\n\" = L.J\" \n\nI \n\n_fJV.-llecti .. \n\ne \n\n\u2022 \nZ. \n\u2022 \n\nwhere \n\n(2.5) \n\n(2.6) \n\n\f664 \n\nGeiger and Girosi \n\nThe important result obtained here is that the effective potential does not depen(cid:173)\ndend on the binary field Ii. The line process field has been eliminated to yield a \ntemperature dependent effective potential (also called visual cost function). The \ninteraction of the field f with itself has changed after the line process has been \naveraged out. We interpret this result as the effect of the interaction of the line \nprocesses with the field f to yield a new temperature dependent potential. \nThe computation of the sum over all the configurations of the field f is hard and \nIn this case is equivalent to minimize \nwe use the saddle point approximation. \nveJJeeti\"\"(f). A dynamical equation to find the minimum of veJJeeti'Oe is given by \n\nintroducing a damping force * that brings the system to equilibrium. Therefore the \n\nmean field equation under the mean field and saddle point approximation becomes \n\n.!!.... ~eJJeeti'Oe(1 r = 8h \n8h \u2022 \n8t \n\n,'J \n\n(2.8) \n\nEquation (2.8) represents a class of unsupervised neural networks coupled to (2.5). \nThe mean field solution is given by the fixed point of (2.8) and (2.5) it is attained \nafter running (2.8) and (2.5) as t ........ 00. This network is better understood with an \nexample of image reconstruction. \n\n3 Example: Image reconstruction \n\nTo reconstruct images from sparse data and to detect discontinuities we use the \nweak membrane model where Ui(J, I) in two dimensions is given by \n\nu.. \u00b7(f h v) = Q ~[(-I . . - J . . 1)2(1-h. ')+(/' '- -I. 1 .)2(1_v . . )]+\"V(J.. \u00b7+V\u00b7 .) \n'\" \n\n'\" J.-\" \n\n, , \n\n',,-\n\n'\" \n\nL...J J.\" \ni,; \n\n'\" \n\nI \n\n'''i\" \n\n'\" \n\n(3.1) \n\nand Q and l' are positive parameters. \n\nThe first term, contains the interaction between the field and the line processes: if \nthe horizon tal or vertical gradient is very high at site (i, j) the corresponding line \nprocess will be very likely to be active (~,; = 1 or Vi,; = 1), to make the visual cost \nfunction decrease and signal a discontinuity. The second term takes into account \nthe price we pay each time we create a discontinuity and is necessary to prevent \nthe creation of discontinuities everywhere. The effective cost function (2.7) then \nbecomes \n\n\fCoupled Markov Random Fields and Mean Field Theory \n\n665 \n\nFigure 2: The network i& repre,ented for the one dimen&ional ca,e. The line, are \nthe connection, \n\nVai\" = ~ [\"\\ii(Ji,i-9i,i )2+a(a~i)2+(ai,i)2-~ln[(I+e-\"('Y-a4t/\u00bb)(1+e-\"('Y-a4i./\u00bb]] \n\n',J \n\nwhere a~i = Ii.; - fi-1,i, ar,i = Ii,i - Ji';-l and (2.5) is then given by \n\n-\nh\u00b7 . - ----..\",.....~--\n',J - 1 + e\"('Y-a(f'.j-la-l,j)2) \n\n1 \n\nand Vi i = \n\n, \n\n1 + e\"('Y-a ( '.j- '.j-d ) \n\nf, \n\n1 \nJ, \n\n2 \n\n(3.2) \n\n(3.3). \n\nwe point out here that while the line process field is a binary field, its mean value \nis a continuous (analog) function in the range between 0 and 1. \n\nDiscretizing (2.8) in time and applying for (3.2), we obtain \n\n1.';+1 = \n\nI.j - w [..\\ii(h~i - 9i,i) - a(l.~i -1.~i-1)(1 - v~i) + a(l.~i+1 -1.~i)(1 - v~i+d \n-a(l.~i -1.\"_1,;)(1 - hf.i) + a(l.\"+1'i -1.~i)(l- hf+1,i)] \n\n(3.4) \n\nwhere h.,i and vi,i are given by the network (3.3) and n is the time step on the \nalgorithm. We notice that (3.4) is coupled with (3.3) such that the field fis updated \nby (3.4) at step n and then (3.3) updates the field h and v before (3.4) updates field \nJ again at step n + 1. \nThis is a simple unsupervised neural network where the imput are the fields J and \nthe output is the line process field h or v. This network is coupled to the network \n(2.8) to solve for the field J and then constitute the global network for this problem \n(see figure 2). It has been shown by many authors and (Geiger and Yuille, 1989) \nthat these class of networks is equivalent to Hcpfield networks (Hopfield, 1984) \n(Koch et. al., 1985). \n\n\f666 \n\nGeiger and Girosi \n\nFigure 3: a. The .dill life image 128 x 128 pizel6. The image 6moothed with \nI = 1400 and Q = 4 for 9 iteration6. The line proceS6 field (needs thinning). b. A \nface image of 128 x 128 pizel6. Randomly chosen 50 % of the original image (for \ndi6play the other 50% are filled with white dot6). c. The network described above i6 \napplied to 6mooth and fill in using the same parameters and for 10 iterations. For \ncomparison we show the results of simply bluring the 6par6e data (no line process \nfield). \n\nAn important connection we make is to show (Geiger and Girosi, 1989) (Geiger, \n1989) that the work of Blake and Zisserman (Blake and Zisserman, 1987) can be \nseen as an approximation of these results. \n\n\fCoupled Markov Random Fields and Mean Field Theory \n\n667 \n\nIn the zero temperature limit (f3 -+ 00) (3.3) becomes the Heaviside function (1 \nor 0) and the interpretation is simple: when the horizontal or vertical gradient are \nlarger than a threshold (JI) a vertical or horizontal discontinuity is created. \n\n4 Results \n\nWe applied the network to a real still life image and the result was an enhancement \nof specular edges, shadow edges and some other contours while smoothing out the \nnoise (see Figure 3a). This result is consistent with all the images we have used. \nFrom one face image we produced sparse data by randomly suppressing 50% of \nthe data. (see Figure 3b). We then applied the neural network to reconstruct the \nlmage. \n\nAcknowledgeIllents \n\nWe are grateful to Tomaso Poggio for his guidance and support. \n\nReferences \n\nA. Blake and A. Zisserman. (1987) Vi,mal Recondruction. Cambridge, Mass: \nMIT Press. \n\nE. Gamble and D. Geiger and T. Poggio and D. Weinshall. (1989) Integration of \nvision modulea and labeling of aurface diacontinuitiea. Invited paper to IEEE Trans. \nSustems, Man & Cybernetics, December. \n\nD. Geiger and F. Girosi. (1989) Parallel and deterministic algorithma for MRFs: \nsurface reconstruction and integration. A.!, Memo No.1114. Artificial Intelligence \nLaboratory of MIT. \n\nD. Geiger. (1989) Viaual modela with datiatical field theory. Ph.D. thesis. MIT, \nPhysics department and Artificial Intelligence Laboratory. \nD. Geiger and A. Yuille. (1989) A common framework for image segmentation \nand surface recon..truction. Harvard Robotics Laboratory Technical Report, 89-7, \nHarvard, August. \n\nS. Geman and D. Geman. (1984) Stochadic Relazation, Gibba Dutributiona, and \nthe Bayeaian Redoration of Imagea. Pattern Analysis and Machine Intelligence, \nPAMI-6:721-741. \n\nJ. J. Hopfield. (1984) Neurona with graded reaponse have collective computational \nproperties like those of two-state neurona. Proc. N atl. Acad. ScL, 81:3088-3092, \n\nS. Kirkpatrick and C. D. Gelatt and M. P. Vecchio \nSimulated Annealina. Science. 220:219-227. \n\n(1983) Optimization by \n\nC. Koch and J. Marroquin and A. Yuille. (1985) Analog 'Neuronal' Networka in \nEarly Vision. Proc. Natl. Acad. SeL, 83:4263-4267. \n\nJ. L. Marroquin and S. Mitter and T. Poggio. (1987) Probabilutic Solution of \nnl-Poled Problema in Computational Viaion. J. Amer. Stat. Assoc., 82:76-89. \n\n\f", "award": [], "sourceid": 270, "authors": [{"given_name": "Davi", "family_name": "Geiger", "institution": null}, {"given_name": "Federico", "family_name": "Girosi", "institution": null}]}