{"title": "Markov Random Fields Can Bridge Levels of Abstraction", "book": "Advances in Neural Information Processing Systems", "page_first": 396, "page_last": 403, "abstract": null, "full_text": "Markov Random Fields Can Bridge Levels of \n\nAbstraction \n\nPaul R. Cooper \n\nPeter N. Prokopowicz \n\nInstitute for the Learning Sciences \n\nInstitute for the Learning Sciences \n\nNorthwestern University \n\nEvanston, IL \n\ncooper@ils.nwu.edu \n\nNorthwestern U ni versity \n\nEvanston, IL \n\nprokopowicz@ils.nwu.edu \n\nAbstract \n\nNetwork vision systems must make inferences from evidential informa(cid:173)\ntion across levels of representational abstraction, from low level invariants, \nthrough intermediate scene segments, to high level behaviorally relevant \nobject descriptions. This paper shows that such networks can be realized \nas Markov Random Fields (MRFs). We show first how to construct an \nMRF functionally equivalent to a Hough transform parameter network, \nthus establishing a principled probabilistic basis for visual networks. Sec(cid:173)\nond, we show that these MRF parameter networks are more capable and \nflexible than traditional methods. In particular, they have a well-defined \nprobabilistic interpretation, intrinsically incorporate feedback, and offer \nricher representations and decision capabilities. \n\n1 \n\nINTRODUCTION \n\nThe nature of the vision problem dictates that neural networks for vision must make \ninferences from evidential information across levels of representational abstraction. \nFor example, local image evidence about edges might be used to determine the \noccluding boundary of an object in a scene. This paper demonstrates that parameter \nnetworks [Ballard, 1984], which use voting to bridge levels of abstraction, can be \nrealized with Markov Random Fields (MRFs). \n\nWe show two main results. First, an MRF is constructed with functionality formally \nequivalent to that of a parameter net based on the Hough transform. Establishing \n\n396 \n\n\fMarkov Random Fields Can Bridge Levels of Abstraction \n\n397 \n\nthis equivalence provides a sound probabilistic foundation for neural networks for \nvision. This is particularly important given the fundamentally evidential nature of \nthe vision problem. \n\nSecond, we show that parameter networks constructed from MRFs offer a more \nflexible and capable framework for intermediate vision than traditional feedforward \nparameter networks with threshold decision making. In particular, MRF parame(cid:173)\nter nets offer a richer representational framework, the potential for more complex \ndecision surfaces, an integral treatment of feedback, and probabilistically justified \ndecision and training procedures. Implementation experiments demonstrate these \nfeatures. \n\nTogether, these results establish a basis for the construction of integrated network \nvision systems with a single well-defined representation and control structure that \nintrinsically incorporates feedback. \n\n2 BACKGROUND \n\n2.1 HOUGH TRANSFORM AND PARAMETER NETS \n\nOne approach to bridging levels of abstraction in vision is to combine local, highly \nvariable evidence into segments which can be described compactly by their pa(cid:173)\nrameters. The Hough transform offers one method for obtaining these high-level \nparameters. Parameter networks implement the Hough transform in a parallel \nfeedforward network. The central idea is voting: local low-level evidence cast votes \nvia the network for compatible higher-level parameterized hypotheses. The clas(cid:173)\nsic Hough example finds lines from edges. Here local evidence about the direction \nand magnitude of image contrast is combined to extract the parameters of lines \n(e.g. slope-intercept), which are more useful scene segments. The Hough transform \nis widely used in computer vision (e.g. \n[Bolle et al., 1988]) to bridge levels of \nabstraction. \n\n2.2 MARKOV RANDOM FIELDS \n\nMarkov Random Fields offer a formal foundation for networks [Geman and Geman, \n1984] similar to that of the Boltzmann machine. MRFs define a prior joint prob(cid:173)\nability distribution over a set X of discrete random variables. The possible values \nfor the variables can be interpreted as possible local features or hypotheses. Each \nvariable is associated with a node S in an undirected graph (or network), and can \nbe written as X,. An assignment of values to all the variables in the field is called \na configuration, and is denoted Wi an assignment of a single variable is denoted w,. \nEach fully-connected neighborhood C in a configuration of the field has a weight, \nor clique potential, Vc. \nWe are interested in the probability distributions P over the random field X. \nMarkov Random Fields have a locality property: \n\nP(X, = w,IXr = Wr,r E S,r '# s) = P(X, = w,lXr = Wr,r EN,) \n\n(1) \nthat says roughly that the state of site is dependent only upon the state of its \nneighbors (N,). MRFs can also be characterized in terms of an energy function U \n\n\f398 \n\nCooper and Prokopowicz \n\nwith a Gibb's distribution: \n\ne-U(w)/T \n\nP(w) = \n\nZ \n\n(2) \n\nwhere T is the temperature, and Z is a normalizing constant. \nIf we are interested only in the prior distribution P(w), the energy function U is \ndefined as: \n\nU(w) = L Vc(w) \n\n(3) \n\ncEO \n\nwhere C is the set of cliques defined by the neighborhood graph, and the Vc are \nthe clique potentials. Specifying the clique potentials thus provides a convenient \nway to specify the global joint prior probability distribution P, i.e. to encode prior \ndomain knowledge about plausible structures. \nSuppose we are instead interested in the distribution P(wIO) on the field after an \nobservation 0, where an observation constitutes a combination of spatially distinct \nobservations at each local site. The evidence from an observation at a site is denoted \nP ( 0 .11lw.ll) and is called a likelihood. Assuming likelihoods are local and spatially \ndistinct, it is reasonable to assume that they are conditionally independent. Then, \nwith Bayes' Rule we can derive: \n\n(4) \n\nThe MRF definition, together with evidence from the current problem, leaves a \nprobability distribution over all possible configurations. An algorithm is then \nused to find a solution, normally the configuration of maximal probability, or \nequivalently, minimal energy as expressed in equation 4. The problem of min(cid:173)\nimizing non-convex energy functions, especially those with many local minima, \nhas been the subject of intense scrutiny recently (e.g. [Kirkpatrick et al., 1983; \nHopfield and Tank, 1985]). In this paper we focus on developing MRF represen(cid:173)\ntations wherein the minimum energy configuration defines a desirable goal, not on \nmethods of finding the minimum. In our experiments have have used the determin(cid:173)\nistic Highest Confidence First (HCF) algorithm [Chou and Brown, 1990]. \n\nMRFs have been widely used in computer vision applications, including image \nrestoration, segmentation, and depth reconstruction [Geman and Geman, 1984; \nMarroquin, 1985; Chellapa and Jain, 1991]. All these applications involve Hat rep(cid:173)\nresentations at a single level of abstraction. A novel aspect of our work is the \nhierarchical framework which explicitly represents visual entities at different levels \nof abstraction, so that these higher-order entities can serve as an interpretation of \nthe data as well as playa role in further constraint satisfaction at even higher levels. \n\n3 CONSTRUCTING MRFS EQUIVALENT TO \n\nPARAMETER NETWORKS \n\nHere we define a Markov Random Field that computes a Hough transform; i.e. \nit detects higher-order features by tallying weighted votes from low-level image \ncomponents and thresholding the sum. The MRF has one discrete variable for \n\n\fMarkov Random Fields Can Bridge Levels of Abstraction \n\n399 \n\nParameterized \nsegment \n\nLinear sum and \nthreshold \n\nf U(WE I 0) \n(Vw)(w = -,E ... elt ... ) :::} U(w I 0) > U(W9 I 0) \nProof: For a mixed configuration W = E . .. -,elt ... , changing label -,elt to elt adds \nIt \nenergy because of the evidence associated with elt. This is at most Wi!mo.1:' \nalso removes energy because of the potential of the clique Eelt, which is -Wi!mo.1:' \nBecause the clique potential K2 from E-,e1c is also removed, if K2 > 0, then changing \nthis label always reduces the energy. \nFor a mixed configuration w = -,E ... elt ... , changing the low-level label e1e to \n-,e1c cannot add to the energy contributed by evidence, since -,elt has no evidence \nassociated with it. There is no binary clique potential for -,E-,e, but the potential \nK1 for clique -,Ee1c is removed. Therefore, again, choosing any K1 > 0 reduces \nenergy and ensures that compatible labels are preferred.D \n\nFrom lemma 2, there are two configurations that could possibly have minimal pos(cid:173)\nterior energy. From lemma I, the configuration which represents the existence of \nthe higher-order feature is preferred if and only if the weighted sum of the evidence \nexceeds threshold, as in the Hough transform. \n\nOften it is desirable to find the mode in a high-level parameter space rather than \nthose elements which surpass a fixed threshold. Finding a single mode is easy to \ndo in a Hough-like MRFj add lateral connections between the ezists labels of the \nhigh-level features to form a winner-take-all network. If the potentials for these \ncliques are large enough, it is not possible for more than one variable corresponding \nto a high-level feature to be labeled ezists. \n\n4 BEYOND HOUGH TRANSFORMS: MRF \n\nPARAMETER NETS \n\nThe essentials of a parameter network are a set of variables representing low-order \nfeatures, a set of variables representing high-order features, and the appropriate \n\n\fMarkov Random Fields Can Bridge Levels of Abstraction \n\n401 \n\nFigure 2: Noisy image data \n\nFigure 3: Three parameter-net MRF experiments: white dots in the lower images \nindicate the decision that a horizontal or vertical local edge is present. Upper images \nshow the horizontal and vertical lines found. The left net is a feedforward Hough \ntransform; the middle net uses positive feedback from lines to edges; the right net \nuses negative feedback, from non-existing lines to non-existing edges \n\nweighted connections between them. This section explores the characteristics of \nmore \"natural\" MRF parameter networks, still based on the same variables and \nconnections, but not limited to binary label sets and sum/threshold decision pro(cid:173)\ncedures. \n\n4.1 EXPERIMENTS WITH FEEDBACK \n\nThe Hough transform and its parameter net instantiation are inherently feed(cid:173)\nforward. In contrast, all MRFs intrinsically incorporate feedback. We experimented \nwith a network designed to find lines from edges. Horizontal and vertical edge inputs \nare represented at the low level, and horizontal and vertical lines which span the \nimage at the high level. The input data look like Figure 2. Probabilistic evidence \nfor the low-level edges is generated from pixel data using a model of edge-image for(cid:173)\nmation [Sher, 1987]. The edges vote for compatible lines. In Figure 3, the decision \nof the feed-forward, Hough transform MRF is shown at the left: edges exist where \nthe local evidence is sufficient; lines exist where enough votes are received. \n\nKeeping the same topology, inputs, and representations in the MRF, we added top(cid:173)\ndown feedback by changing binary clique potentials so that the existence of a line at \nthe high level is more strongly compatible with the existence of its edges. Missing \nedges are filled in (middle). By making non-existent lines strongly incompatible with \nthe existence of edges, noisy edges are substantially removed (right). Other MRFs \nfor segmentation [Chou and Brown, 1990; Marroquin, 1985] find collinear edges, \n\n\f402 \n\nCooper and Prokopowicz \n\nbut cannot reason about lines and therefore cannot exploit top-down feedback. \n\n4.2 REPRESENTATION AND DECISION MAKING \n\nBoth parameter nets and MRFs represent confidence in local hypotheses, but here \nthe MRF framework has intrinsic advantages. MRFs can simultaneously represent \nindependent beliefs for and against the same hypotheses. In an active vision sys(cid:173)\ntem, which must reason about gathering as well as interpreting evidence, one could \nextend this to include the label don't know, allowing explicit reasoning about the \ncondition in which the local evidence insufficiently supports any decision. MRFs can \nalso express higher-order constraints as more than a set of pairs. The exploitation \nof appropriate 3-cliques, for example, has been shown to be very useful [Cooper, \n1990]. \n\nSince the potentials in an MRF are related to local conditional probabilities, there \nis a principled way to obtain them. Observations can be used to estimate local joint \nprobabilities, which can be converted to the clique potentials defining the prior \ndistribution on the field [Pearl, 1988; Swain, 1990]. \n\nMost evidence integration schemes require, in addition to the network topology and \nparameters, the definition of a decision making process (e.g. thresholding) and a \ntheory of parameter acquisition for that process, which is often ad hoc. To estimate \nthe maximum posterior probability of a MRF, on the other hand, is intrinsically \nto make a decision among the possibilities embedded in the chosen variables and \nlabels. \n\nThe space of possible decisions (interpretations of problem input) is also much \nricher for MRFs than for parameter networks. For both nets, the nodes for which \nevidence is available define a n-dimensional problem input space. The weights \ndi vide this space into regions defined by the one best interpretation (configuration) \nfor all problems in that region. With parameter nets, these regions are separated \nby planes, since only the sum of the inputs matters. In MRFs, the energy depends \non the log-product of the evidence and the sum of the potentials, allowing more \ngeneral decision surfaces. Non-linear decisions such as AND or XOR are easy to \nencode, whereas they are impossible for the linear Hough transform. \n\n5 CONCLUSION \n\nThis paper has shown that parameter networks can be constructed with Markov \nRandom Fields. MRFs can thus bridge representational levels of abstraction in \nnetwork vision systems. Furthermore, it has been demonstrated that MRFs offer \nthe potential for a significantly more powerful implementation of parameter nets, \neven if their topological architecture is identical to traditional Hough networks. In \nshort, at least one method is now available for constructing intermediate vision \nsolutions with Markov Random Fields. \n\nIt may thus be possible to build entire integrated vision systems with a single well(cid:173)\njustified formal framework - Markov Random Fields. Such systems would have a \nunified representational scheme, constraints and evidence with well-defined seman(cid:173)\ntics, and a single control structure. Furthermore, feedback and feedforward flow of \n\n\fMarkov Random Fields Can Bridge Levels of Abstraction \n\n403 \n\ninformation, crucial in any complete vision system, is intrinsic to MRFs. \n\nOf course, the task still remains to build a functioning vision system for some \ndomain. In this paper we have said nothing about the definition of specific \"fea(cid:173)\ntures\" and the constraints between them that would constitute a useful system. \nBut providing essential tools implemented in a well-defined formal framework is an \nimportant step toward building robust, functioning systems. \n\nAcknowledgements \n\nSupport for this research was provided by NSF grant #IRI-9110492 and by Andersen \nConsulting, through their founding grant to the Institute for the Learning Sciences. \nPatrick Yuen wrote the MRF simulator that was used in the experiments. \n\nReferences \n\n[Ballard, 1984] D.H. Ballard, \n\n\"Parameter Networks,\" \n\nArtificial Intelligence, \n\n22(3):235-267, 1984. \n\n[Bolle et al., 1988] Ruud M. Bolle, Andrea Califano, Rick Kjeldsen, and R.W. Tay(cid:173)\nlor, \"Visual Recognition Using Concurrent and Layered Parameter Networks,\" \nTechnical Report RC-14249, IBM Research Division, T.J. Watson Research Cen(cid:173)\nter, Dec 1988. \n\n[Chellapa and Jain, 1991] Rama Chellapa and Anil Jain, editors, Markov Random \n\nFields: Theory and Application, Academic Press, 1991. \n\n[Chou and Brown, 1990] Paul B. Chou and Christopher M. Brown, \"The Theory \nand Practice of Bayesian Image Labeling,\" International Journal of Computer \nVision, 4:185-210, 1990. \n\n[Cooper, 1990] Paul R. Cooper, \"Parallel Structure Recognition with Uncertainty: \nCoupled Segmentation and Matching,\" In Proceedings of the Third International \nConference on Computer Vision ICCV '90, Osaka, Japan, December 1990. \n\n[Geman and Geman, 1984] Stuart Geman and Donald Geman, \"Stochastic Relax(cid:173)\n\nation, Gibbs Distributions, and the Bayesian Restoration of Images,\" PAMI, \n6(6):721-741, November 1984. \n\n[Hopfield and Tank, 1985] J. J. Hopfield and D. W. Tank, \"\"Neural\" Computation \nof Decisions in Optimization Problems,\" Biological Cybernetics, 52:141-152, 1985. \n\n[Kirkpatrick et al., 1983] S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi, \"Optimiza(cid:173)\n\ntion by Simulated Annealing,\" Science, 220:671-680, 1983. \n\n[Marroquin, 1985] Jose Luis Marroquin, \"Probabilistic Solution of Inverse Prob(cid:173)\n\nlems,\" Technical report, MIT Artificial Intelligence Laboratory, September, 1985. \n[Pearl, 1988] Judea Pearl, Probabalistic Reasoning in Intelligent Systems, Morgan \n\nKaufman, 1988. \n\n[Sher, 1987] David B. Sher, \"A Probabilistic Approach to Low-Level Vision,\" Tech(cid:173)\n\nnical Report 232, Department of Computer Science, University of Rochester, \nOctober 1987. \n\n[Swain, 1990] Michael J. Swain, \"Parameter Learning for Markov Random Fields \n\nwith Highest Confidence First Estimation,\" Technical Report 350, Dept. of Com(cid:173)\nputer Science, University of Rochester, August 1990. \n\n\f", "award": [], "sourceid": 505, "authors": [{"given_name": "Paul", "family_name": "Cooper", "institution": null}, {"given_name": "Peter", "family_name": "Prokopowicz", "institution": null}]}