{"title": "Part-based Probabilistic Point Matching using Equivalence Constraints", "book": "Advances in Neural Information Processing Systems", "page_first": 969, "page_last": 976, "abstract": null, "full_text": "Part-based Probabilistic Point Matching using Equivalence Constraints\n\nGraham McNeill, Sethu Vijayakumar Institute of Perception, Action and Behavior School of Informatics, University of Edinburgh, Edinburgh, UK. EH9 3JZ [graham.mcneill, sethu.vijayakumar]@ed.ac.uk\n\nAbstract\nCorrespondence algorithms typically struggle with shapes that display part-based variation. We present a probabilistic approach that matches shapes using independent part transformations, where the parts themselves are learnt during matching. Ideas from semi-supervised learning are used to bias the algorithm towards finding `perceptually valid' part structures. Shapes are represented by unlabeled point sets of arbitrary size and a background component is used to handle occlusion, local dissimilarity and clutter. Thus, unlike many shape matching techniques, our approach can be applied to shapes extracted from real images. Model parameters are estimated using an EM algorithm that alternates between finding a soft correspondence and computing the optimal part transformations using Procrustes analysis.\n\n1 Introduction\nShape-based object recognition is a key problem in machine vision and content-based image retrieval (CBIR). Over the last decade, numerous shape matching algorithms have been proposed that perform well on benchmark shape retrieval tests. However, many of these techniques share the same limitations: Firstly, they operate on contiguous shape boundaries (i.e. the ordering of the boundary points matters) and assume that every point on one boundary has a counterpart on the boundary it is being matched to (c.f. Fig. 1c). Secondly, they have no principled mechanism for handling occlusion, non-boundary points and clutter. Finally, they struggle to handle shapes that display significant part-based variation. The first two limitations mean that many algorithms are unsuitable for matching shapes extracted from real images; the latter is important since many common objects (natural and man made) display part-based variation. Techniques that match unordered point sets (e.g. [1]) are appealing since they do not require ordered boundary information and can work with non-boundary points. The methods described in [2, 3, 4] can handle outliers, occlusions and clutter, but are not designed to handle shapes whose parts are independently transformed. In this paper, we introduce a probabilistic model that retains the desirable properties of these techniques but handles parts explicitly by learning the most likely part structure and correspondence simultaneously. In this framework, a part is defined as a set of points that undergo a common transformation. Learning these variation-based parts from scratch is an underconstrained problem. To address this, we incorporate prior knowledge about valid part assignments using two different mechanisms. Firstly, the distributions of our hierarchical mixture model are chosen so that the learnt parts are spatially localized. Secondly, ideas from semi-supervised learning [5] are used to encourage a perceptually meaningful part decomposition. The algorithm is introduced in Sec. 2 and described in detail in Sec. 3. Examples are given in Sec. 4 and a sequential approach for tackling model selection (the number of parts) and parameter initialization is introduced in Sec. 5.\n\n\f\na. Occlusion\n\nb. Irreg. sampling\n\nc. Localized dissimilarity\n\nFigure 1: Examples of probabilistic point matching (PPM) using the technique described in [4]. In each case, the initial alignment and the final match are shown.\n\n2 Part-based Point Matching (PBPM): Motivation and Overview\nThe PBPM algorithm combines three key ideas: Probabilistic point matching (PPM): Probabilistic methods that find a soft correspondence between unlabeled point sets [2, 3, 4] are well suited to problems involving occlusion, absent features and clutter (Fig. 1). Natural Part Decomposition (NPD): Most shapes have a natural part decomposition (NPD) (Fig. 2) and there are several algorithms available for finding NPDs (e.g. [6]). We note that in tasks such as object recognition and CBIR, the query image is frequently a template shape (e.g. a binary image or line drawing) or a high quality image with no occlusion or clutter. In such cases, one can apply an NPD algorithm prior to matching. Throughout this paper, it is assumed that we have obtained a sensible NPD for the query shape only1  it is not reasonable to assume that an NPD can be computed for each database shape/image. Variation-based Part Decomposition (VPD): A different notion of parts has been used in computer vision [7], where a part is defined as a set of pixels that undergo the same transformations across images. We refer to this type of part decomposition (PD) as a variation-based part decomposition (VPD). Given two shapes (i.e. point sets), PBPM matches them by applying a different transformation to each variation-based part of the generating shape. These variation-based parts are learnt during matching, where the known NPD of the data shape is used to bias the algorithm towards choosing a `perceptually valid' VPD. This is achieved using the equivalence constraint Constraint 1 (C1): Points that belong to the same natural part should belong to the same variation-based part. As we shall see in Sec. 3, this influences the learnt VPD by changing the generative model from one that generates individual data points to one that generates natural parts (subsets of data points). To further increase the perceptual validity of the learnt VPD, we assume that variation-based parts are composed of spatially localized points of the generating shape. PBPM aims to find the correct correspondence at the level of individual points, i.e. each point of the generating shape should be mapped to the correct position on the data shape despite the lack of an exact point wise correspondence (e.g. Fig. 1b). Soft correspondence techniques that achieve this using a single nonlinear transformation [2, 3] perform well on some challenging problems. However, the smoothness constraints used to control the nonlinearity of the transformation will prevent these techniques from selecting the discontinuous transformations associated with part-based movements. PBPM learns an independent linear transformation for each part and hence, can find the correct global match. In relation to the point matching literature, PBPM is motivated by the success of the techniques described in [8, 2, 3, 4] on non-part-based problems. It is perhaps most similar to the work of Hancock and colleagues (e.g. [8]) in that we use `structural information' about the point sets to constrain the matching problem. In addition to learning multiple parts and transformations, our work differs in the type of structural information used (the NPD rather then the Delauney triangulation) and the way in which this information is incorporated. With respect to the shape-matching literature, PBPM can be seen as a novel correspondence technique for use with established NPD algorithms. Despite the large number of NPD algorithms, there\n1\n\nThe NPDs used in the examples were constructed manually.\n\n\f\na.\n\nb.\n\nc.\n\nd.\n\nFigure 2: The natural part decomposition (NPD) (b-d) for different representations of a shape (a). are relatively few NPD-based correspondence techniques. Siddiqi and Kimia show that the parts used in their NPD algorithm [6] correspond to specific types of shocks when shock graph representations are used. Consequently, shock graphs implicitly capture ideas about natural parts. The Inner-Distance method of Ling and Jacobs [9] handles part articulation without explicitly identifying the parts.\n\n3 Part-based Point Matching (PBPM): Algorithm\n3.1 Shape Representation Shapes are represented by point sets of arbitrary size. The points need not belong to the shape boundary and the ordering of the points is irrelevant. Given a generating shape X = (x1 , x2 , . . . , xM )T  RM 2 and a data shape Y = (y1 , y2 , . . . , yN )T  RN 2 (generally M = N ), our task is to compute the correspondence between X and Y. We assume that an NPD of Y is available, expressed as L a partition of Y into subsets (parts): Y = l=1 Yl . 3.2 The Probabilistic Model We assume that a data point y is generated by the mixture model\np (y ) =\n\nvV\n=0\n\np(y|v )v ,\n\n(1)\n\nwhere v indexes the variation-based parts. A uniform background component, y|(v =0)  Uniform, ensures that all data points are explained to some extent and hence, robustifies the model against outliers. The distribution of y given a foreground component v is itself a mixture model :\np(y|v ) =\n\nmM\n=1\n\np(y|m, v )p(m|v ),\n\nv = 1, 2, . . . , V ,\n\n(2)\n\nwith\n\ny|(m, v )  N (Tv xm ,  2 I).\n\n(3)\n\nHere, Tv is the transformation used to match points of part v on X to points of part v on Y. Finally, we define p(m|v ) in such a way that the variation-based parts v are forced to be spatially coherent:\np(m|v ) = {- xm  ) expxp{(-(x- -v e\nm m T\n\n-1 (xm - v )/2} v , T -1 v ) v (xm - v )/2}\n\n(4)\n\nwhere v  R2 is a mean vector and v  R22 is a covariance matrix. In words, we identify m  {1, . . . , M } with the point xm that it indexes and assume that the xm follow a bivariate Gaussian distribution. Since m must take a value in {1, . . . , M }, the distribution is normalized using the points x1 , . . . , xM only. This assumption means that the xm themselves are essentially generated by a GMM with V components. However, this GMM is embedded in the larger model and maximizing the data likelihood will balance this GMM's desire for coherent parts against the need for the parts and transformations to explain the actual data (the yn ). Having defined all the distributions, the next step is to estimate the parameters whilst making use of the known NPD of Y. 3.3 Parameter Estimation With respect to the model defined in the previous section, C1 states that all yn that belong to the same subset Yl were generated by the same mixture component v . This requirement can be enforced using the technique introduced by Shental et. al. [5] for incorporating equivalence constraints\n\n\f\nbetween data points in mixture models. The basic idea is to estimate the model parameters using the EM algorithm. However, when taking the expectation (of the complete log-likelihood) we now only sum over assignments of data points to components which are valid with respect to the constraints. Assuming that subsets and points within subsets are sampled i.i.d., it can be shown that the expectation is given by:\nE=\n\nvV l L\n=0 =1\n\np(v |Yl ) log v +\n\nvV l L y\n=0 =1\nn Yl\n\np(v |Yl ) log p(yn |v ).\n\n(5)\n\nNote that eq.(5) involves p(v |Yl )  the responsibility of a component v for a subset Yl , rather than the term p(v |yn ) that would be present in an unconstrained mixture model. Using the expression for p(yn |v ) in eq.(2) and rearranging slightly, we have\nE =\n\nvV l L\n=0 =1\n\np(v |Yl ) log v +\n\nlL\n=1\n\np(v =0|Yl ) log {u|Yl | }\n\n+\n\nvV l L y\n=1 =1\nn Yl\n\np(v |Yl ) log\n\nmM\n=1\n\n\np(yn |m, v )p(m|v ) , (6)\n\nwhere u is the constant associated with the uniform distribution p(yn |v =0). The parameters to be estimated are v (eq.(1)), v , v (eq.(4)) and the transformations Tv (eq.(3)). With the exception of v , these are found by maximizing the final term in eq.(6). For a fixed v , this term is the log-likelihood of data points y1 , . . . , yN under a mixture model, with the modification that there is a weight, p(v |Yl ), associated with each data point. Thus, we can treat this subproblem as a standard maximum likelihood problem and derive the EM updates as usual. The resulting EM algorithm is given below. E-step. Compute the responsibilities using the current parameters:\np(m|yn , v ) p(v |Yl ) = = v v n| p(y(ym|,m)p()m(|m)|v) , ,v p n mp  v y Y p(yn |v )  p(y |v )\nn l\n\nv = 1, 2, . . . , V\n\n(7) (8)\n\nv\n\nv\n\nyn Yl\n\nn\n\nM-step. Update the parameters using the responsibilities:\nv = = = = 1 L\n\nlL\n=1\n\nv v Tv\n\narg min\nT\n\n p(v|Y )p(m|y , v)x l, n ,m p(v|Yn )p(m|n , v)m yn l,n n,m  p(v|Y )p(m|y , v)(x -  )(x -  )T n l, n,m n p(v|Y )p(m |y ,vv) m v mn l,n n n,m\np(v |Yl,n )p(m|yn , v ) yn - Tv xm 2\n,m\n\np(v |Yl )\n\n(9) (10) (11) (12)\n\nwhere Yl,n is the subset Yl containing yn . Here, we define Tv x  sv v x + cv , where sv is a scale parameter, cv  R2 is a translation vector and v is a 2D rotation matrix. Thus, eq.(12) becomes a weighted Procrustes matching problem between two points sets, each of size N  M  the extent to which xm corresponds to yn in the context of part v is given by p(v |Yl,n )p(m|yn , v ). This least squares problem for the optimal transformation parameters sv , v and cv can be solved analytically [8]. The weights associated with the updates in eqs.(10-12) are similar to p(v |yn )p(m|yn , v ) = p(m, v |yn ), the responsibility of the hidden variables (m, v ) for the observed data, yn . The difference is that p(v |yn ) is replaced by p(v |Yl,n ), and hence, the impact of the equivalence constraints is propagated throughout the model. The same fixed variance  2 (eq.(3)) is used in all experiments. For the examples in Sec. 4, we initialize v , v and v by fitting a standard GMM to the xm . In Sec. 5, we describe a sequential algorithm that can be used to select the number of parts V as well as provide initial estimates for all parameters.\n\n\f\nX and initial Gaussians for p(m|v)\n\nY\n\nNPD of Y\n\nInitial alignment\n\nInput\n\nOutput\n\nVPD of X with final Gaussians for p(m|v)\n\nTransformed X\n\nVPD of Y\n\nNPD of X\n\nFinal match\n\nFigure 3: An example of applying PBPM with V =3.\nPPM\n2 parts 4 parts\n\nPBPM\n5 parts 6 parts\n\nVPD of X\n\nVPD of Y\n\nFinal match\n\nFigure 4: Results for the problem in Fig. 3 using PPM [4] and PBPM with V = 2, 4, 5 and 6.\n\n4 Examples\nAs discussed in Secs. 1 and 2, unsupervised matching of shapes with moving parts is a relatively unexplored area  particularly for shapes not composed of single closed boundaries. This makes it difficult to quantitatively assess the performance of our algorithm. Here, we provide illustrative examples which demonstrate the various properties of PBPM and then consider more challenging problems involving shapes extracted from real images. The number of parts, V , is fixed prior to matching in these examples; a technique for estimating V is described in Sec. 5. To visualize the matches found by PBPM, each point yn is assigned to a part v using maxv p(v |yn ). Points assigned to v =0 are removed from the figure. For each yn assigned to some v  {1, . . . , V }, we find mn  arg maxm p(m|yn , v ) and assign xmn to v . Those xm not assigned to any parts are removed from the figure. The means and the ellipses of constant probability density associated with the distributions N (v , v ) are plotted on the original shape X. We also assign the xm to natural parts using the known natural part label of the yn that they are assigned to. Fig. 3 shows an example of matching two human body shapes using PBPM with V =3. The learnt VPD is intuitive and the match is better than that found using PPM (Fig. 4). The results obtained using different values of V are shown in Fig. 4. Predictably, the match improves as V increases, but the improvement is negligible beyond V =4. When V =5, one of the parts is effectively repeated, suggesting that four parts is sufficient to cover all the interesting variation. However, when V =6 all parts are used and the VPD looks very similar to the NPD  only the lower leg and foot on each side are grouped together. In Fig. 5, there are two genuine variation-based parts and X contains additional features. PBPM effectively ignores the extra points of X and finds the correct parts and matches. In Fig. 6, the left leg is correctly identified and rotated, whereas the right leg of Y is `deleted'. We find that deletion from the generating shape tends to be very precise (e.g. Fig. 5), whereas PBPM is less inclined to delete points from the data shape when it involves breaking up natural parts (e.g. Fig. 6). This is\n\n\f\nX and initial Gaussians for p(m|v)\n\nY\n\nNPD of Y\n\nInitial alignment\n\nInput\n\nOutput VPD of X with final Gaussians for p(m|v) Transformed X VPD of Y NPD of X Final match\n\nFigure 5: Some features of X are not present on Y; the main building of X is smaller and the tower is more central.\nX and initial Gaussians for p(m|v) Y NPD of Y Initial alignment\n\nInput\n\nOutput\n\nVPD of X with final Gaussians for p(m|v)\n\nTransformed X\n\nVPD of Y\n\nNPD of X\n\nFinal match\n\nFigure 6: The left legs do not match and most of the right leg of X is missing.\n\nlargely due to the equivalence constraints trying to keep natural parts intact, though the value of the uniform density, u, and the way in which points are assigned to parts is also important. In Figs. 7 and 8, a template shape is matched to the edge detector output from two real images. We have not focused on optimizing the parameters of the edge detector since the aim is to demonstrate the ability of PBPM to handle suboptimal shape representations. The correct correspondence and PDs is estimated in all cases, though the results are less precise for these difficult problems. Six parts are used in Fig. 8, but two of these are initially assigned to clutter and end up playing no role in the final match. The object of interest in X is well matched to the template using the other four parts. Note that the left shoulder is not assigned to the same variation-based part as the other points of the torso, i.e. the soft equivalence constraint has been broken in the interests of finding the best match. We have not yet considered the choice of V . Figs. 4 (with V =5) and 8 indicate that it may be possible to start with more parts than are required and either allow extraneous parts to go unused or perhaps prune parts during matching. Alternatively, one could run PBPM for a range of V and use a model selection technique based on a penalized log-likelihood function (e.g. BIC) to select a V . Finally, one could attempt to learn the parts in a sequential fashion. This is the approach considered in the next section.\n\n5 Sequential Algorithm for Initialization\nWhen part variation is present, one would expect PBPM with V =1 to find the most significant part and allow the background to explain the remaining parts. This suggests a sequential approach whereby a single part is learnt and removed from further consideration at each stage. Each new part/component should focus on data points that are currently explained by the background. This is achieved by modifying the technique described in [7] for fitting mixture models sequentially. Specifically, assume that the first part (v =1) has been learnt and now learn the second part using the\n\n\f\nX = edge detector output.\n\nY\n\nNPD of Y\n\nInitial alignment\n\nInput\n\nOutput\n\nVPD of X with final Gaussians for p(m|v)\n\nTransformed X\n\nVPD of Y\n\nNPD of X\n\nFinal match\n\nFigure 7: Matching a template shape to an object in a cluttered scene.\nX = edge detector output. Y NPD of Y Initial alignment\n\nInput\n\nOutput\n\nVPD of X with final Gaussians for p(m|v)\n\nTransformed X\n\nVPD of Y\n\nNPD of X\n\nFinal match\n\nFigure 8: Matching a template shape to a real image. weighted log-likelihood\nJ2 =\n\nlL\n=1\n\n1 zl log{p(Yl |v =2)2 + u|Yl | (1 - 1 - 2 )}.\n\n(13)\n\nHere, 1 is known and\n1 zl \n\nu|Yl | (1 - 1 ) p(Yl |v =1)1 + u|Yl | (1 - 1 )\n\n(14)\n\nis the responsibility of the background component for the subset Yl after learning the first part  the superscript of z indicates the number of components that have already been learnt. Using the modified log-likelihood in eq.(13) has the desired effect of forcing the new component (v =2) to explain the data currently explained by the uniform component. Note that we use the responsibilities for the subsets Yl rather than the individual yn [7], in line with the assumption that complete subsets belong to the same part. Also, note that eq.(13) is a weighted sum of log-likelihoods over the subsets, it cannot be written as a sum over data points since these are not sampled i.i.d. due to the equivalence constraints. Maximizing eq.(13) leads to similar EM updates to those given in eqs.(7)(12). Having learnt the second part, additional components v = 3, 4, . . . are learnt in the same way except for minor adjustments to eqs.(13) and (14) to incorporate all previously learnt components. The sequential algorithm terminates when the uniform component is not significantly responsible for any data or the most recently learnt component is not significantly responsible for any data. As discussed in [7], the sequential algorithm is expected to have fewer problems with local minima since the objective function will be smoother (a single component competes against a uniform component at each stage) and the search space smaller (fewer parameters are learnt at each stage). Preliminary experiments suggest that the sequential algorithm is capable of solving the model selection problem (choosing the number of parts) and providing good initial parameter values for the full model described in Sec. 3. Some examples are given in Figs. 9 and 10  the initial transformations for each part are not shown. The outcome of the sequential algorithm is highly dependent on the value of the uniform density, u. We are currently investigating how the model can be made more robust to this value and also how the used xm should be subtracted (in a probabilistic sense) at each step.\n\n\f\nX and initial Gaussians for p(m|v)\n\nY\n\nNPD of Y\n\nInitial alignment\n\nInput\n\nOutput\n\nVPD of X with final Gaussians for p(m|v)\n\nTransformed X\n\nVPD of Y\n\nNPD of X\n\nFinal match\n\nFigure 9: Results for PBPM; V and initial parameters were found using the sequential approach.\nX and initial Gaussians for p(m|v) Y NPD of Y Initial alignment\n\nInput\n\nOutput\n\nVPD of X with final Gaussians for p(m|v)\n\nTransformed X\n\nVPD of Y\n\nNPD of X\n\nFinal match\n\nFigure 10: Results for PBPM; V and initial parameters were found using the sequential approach.\n\n6 Summary and Discussion\nDespite the prevalence of part-based objects/shapes, there has been relatively little work on the associated correspondence problem. In the absence of class models and training data (i.e. the unsupervised case), this is a particularly difficult task. In this paper, we have presented a probabilistic correspondence algorithm that handles part-based variation by learning the parts and correspondence simultaneously. Ideas from semi-supervised learning are used to bias the algorithm towards finding a `perceptually valid' part decomposition. Future work will focus on robustifying the sequential approach described in Sec. 5.\n\nReferences\n[1] S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. PAMI, 24:509522, 2002. [2] H. Chui and A. Rangarajan. A new point matching algorithm for non-rigid registration. Comp. Vis. and Image Understanding, 89:114141, 2003. [3] Z. Tu and A.L. Yuille. Shape matching and recognition using generative models and informative features. In ECCV, 2004. [4] G. McNeill and S. Vijayakumar. A probabilistic approach to robust shape matching. In ICIP, 2006. [5] Noam Shental, Aharon Bar-Hillel, Tomer Hertz, and Daphna Weinshall. Computing Gaussian mixture models with EM using equivalence constraints. In NIPS. 2004. [6] Kaleem Siddiqi and Benjamin B. Kimia. Parts of visual form: Computational aspects. PAMI, 17(3):239 251, 1995. [7] M. Titsias. Unsupervised Learning of Multiple Objects in Images. PhD thesis, Univ. of Edinburgh, 2005. [8] B. Luo and E.R. Hancock. A unified framework for alignment and correspondence. Computer Vision and Image Understanding, 92(26-55), 2003. [9] H. Ling and D.W. Jacobs. Using the inner-distance for classification of ariculated shapes. In CVPR, 2005.\n\n\f\n", "award": [], "sourceid": 3146, "authors": [{"given_name": "Graham", "family_name": "Mcneill", "institution": null}, {"given_name": "Sethu", "family_name": "Vijayakumar", "institution": null}]}