{"title": "Learning to Segment Images Using Dynamic Feature Binding", "book": "Advances in Neural Information Processing Systems", "page_first": 436, "page_last": 443, "abstract": null, "full_text": "Learning to Segment Images \n\nUsing Dynamic Feature Binding \n\nMichael C. Moser \n\nDept. of Compo Science & \nInst. of Cognitive Science \nUniversity of Colorado \nBoulder, CO 80309-0430 \n\nRichard S. Zemel \n\nDept. of Compo Science \nUniversity of Toronto \n\nToronto, Ontario \nCanada M5S lA4 \n\nMarlene Behrmann \nDept. of Psychology & \n\nFaculty of Medicine \nUniversity of Toronto \n\nToronto, Ontario \nCanada M5S lAl \n\nAbstract \n\nDespite the fact that complex visual scenes contain multiple, overlapping \nobjects, people perform object recognition with ease and accuracy. One \noperation that facilitates recognition is an early segmentation process in \nwhich features of objects are grouped and labeled according to which ob(cid:173)\nject they belong. Current computational systems that perform this oper(cid:173)\nation are based on predefined grouping heuristics. We describe a system \ncalled MAGIC that learn. how to group features based on a set of pre(cid:173)\nsegmented examples. In many cases, MAGIC discovers grouping heuristics \nsimilar to those previously proposed, but it also has the capability of find(cid:173)\ning nonintuitive structural regularities in images. Grouping is performed \nby a relaxation network that aUempts to dynamically bind related fea(cid:173)\ntures. Features transmit a complex-valued signal (amplitude and phase) \nto one another; binding can thus be represented by phase locking related \nfeatures. MAGIC'S training procedure is a generalization of recurrent back \npropagation to complex-valued units. \n\nWhen a visual image contains multiple, overlapping objects, recognition is difficult \nbecause features in the image are not grouped according to which object they belong. \nWithout the capability to form such groupings, it would be necessary to undergo a \nmassive search through all subsets of image features. For this reason, most machine \nvision recognition systems include a component that performs feature grouping or \nimage .egmentation (e.g., Guzman, 1968; Lowe, 1985; Marr, 1982). \n\n436 \n\n\fLearning to Segment Images Using Dynamic Feature Binding \n\n437 \n\nA multitude of heuristics have been proposed for segmenting images. Gestalt psy(cid:173)\nchologists have explored how people group elements of a display and have suggested \na range of grouping principles that govern human perception (Rock &z: Palmer, 1990). \nComputer vision researchers have studied the problem from a more computation(cid:173)\nal perspective. They have investigated methods of grouping elements of an image \nbased on nonaccidental regularitie..-feature combinations that are unlikely to occur \nby chance when several objects are juxtaposed, and are thus indicative of a single \nobject (Kanade, 1981; Lowe &z: Binford, 1982). \nIn these earlier approaches, the researchers have hypothesized a set of grouping \nheuristics and then tested their psychological validity or computational utility. In \nour work, we have taken an adaptive approach to the problem of image segmenta(cid:173)\ntion in which a system learns how to group features based on a set of examples. \nWe call the system MAGIC, an acronym for multiple-object !daptive grouping of \nimage ~omponents. In many cases MAGIC discovers grouping heuristics similar to \nthose proposed in earlier work, but it also has the capability offinding nonintuitive \nstructural regularities in images. \n\nMAGIC is trained on a set of presegmented images containing multiple objects. By \n\"presegmented,\" we mean that each image feature is labeled as to which object it \nbelongs. MAGIC learns to detect configurations of the image features that have a \nconsistent labeling in relation to one another across the training examples. Identify(cid:173)\ning these configurations allows MAGIC to then label features in novel, unsegmented \nimages in a manner consistent with the training examples. \n\n1 REPRESENTING FEATURE LABELINGS \n\nBefore describing MAGIC, we must first discuss a representation that allows for \nthe labeling of features. Von der Malsburg (1981), von der Malsburg &z: Schneider \n(1986), Gray et al. (1989), and Eckhorn et al. (1988), among others, have suggested \na biologically plausible mechanism of labeling through temporal correlations among \nneural signals, either the relative timing of neuronal spikes or the synchronization of \noscillatory activities in the nervous system. The key idea here is that each processing \nunit conveys not just an activation value-average firing frequency in neural terms(cid:173)\nbut also a second, independent value which represents the relative phcue of firing. \nThe dynamic grouping or binding of a set of features is accomplished by aligning \nthe phases of the features. Recent work (Goebel, 1991; Hummel &z: Biederman, in \npress) has used this notion of dynamic binding for grouping image features, but has \nbeen based on relatively simple, predetermined grouping heuristics. \n\n2 THE DOMAIN \n\nOur initial work has been conducted in the domain of two-dimensional geometric \ncontours, including rectangles, diamonds, crosses, triangles, hexagons, and octa(cid:173)\ngons. The contours are constructed from four primitive feature types-oriented \nline segments at 0\u00b0, 45\u00b0, 90\u00b0, and 135\u00b0-and are laid out on a 15 X 20 grid. At \neach location on the grid are units, called feature unib, that detect each of the four \nprimitive feature types. In our present experiments, images contain two contours. \nContours are not permitted to overlap in their activation of the same feature unit. \n\n\f438 \n\nMozer, Zemel, and Behrmann \n\nhidden \n\nlayer __ r \n\nFigure 1: The architedure of MAGIC. The lower layer contains the feature units; the \nupper layer contains the hidden units. Each layer is arranged in a spatiotopic array \nwith a number of different feature types at each position in the array. Each plane in \nthe feature layer corresponds to a different feature type. The grayed hidden units \nare reciprocally conneded to all features in the corresponding grayed region of the \nfeature layer. The lines between layers represent projections in both directions. \n\n3 THE ARCHITECTURE \n\nThe input to MAGIC is a paUern of activity over the feature units indicating which \nfeatures are present in an image. The initial phases ofthe units are random. MAGIC'S \ntask is to assign appropriate phase values to the units. Thus, the network performs \na type of paUern completion. \nThe network architedure consists of two layers of units, as shown in Figure 1. The \nlower (input) layer contains the feature units, arranged in spatiotopic arrays with \none array per feature type. The upper layer contains hidden units that help to align \nthe phases of the feature units; their response properties are determined by training. \nEach hidden unit is reciprocally conneded to the units in a local spatial region of \nall feature arrays. We refer to this region as a patch; in our current simulations, the \npatch has dimensions 4 x 4. For each patch there is a corresponding fixed-size pool \nof hidden units. To achieve uniformity of response across the image, the pools are \narranged in a spatiotopic array in which neighboring pools respond to neighboring \npatches and the weights of all pools are consbained to be the same. \nThe feature units activate the hidden units, which in turn feed back to the feature \nunits. Through a relaxation process, the system settles on an assignment of phases \nto the features. \n\n\fLearning to Segment Images Using Dynamic Feature Binding \n\n439 \n\n4 NETWORK DYNAMICS \n\nFormally, the response of each feature unit i, ~i, is a complex value in polar form, \n(<