{"title": "A Topographic Support Vector Machine: Classification Using Local Label Configurations", "book": "Advances in Neural Information Processing Systems", "page_first": 929, "page_last": 936, "abstract": null, "full_text": " A Topographic Support Vector Machine:\n Classification Using Local Label Configurations\n\n\n\n Johannes Mohr\n Clinic for Psychiatry and Psychotherapy\n Charite Medical School\n and\n Bernstein Center for Computational Neuroscience Berlin\n 10117 Berlin, Germany\n\n\n Klaus Obermayer\n Department of Electrical Engineering and Computer Science\n Berlin University of Technology\n and\n Bernstein Center for Computational Neuroscience Berlin\n 10587 Berlin, Germany\n\n\n johann@cs.tu-berlin.de, oby@cs.tu-berlin.de\n\n\n\n\n Abstract\n\n The standard approach to the classification of objects is to consider the\n examples as independent and identically distributed (iid). In many real\n world settings, however, this assumption is not valid, because a topo-\n graphical relationship exists between the objects. In this contribution we\n consider the special case of image segmentation, where the objects are\n pixels and where the underlying topography is a 2D regular rectangular\n grid. We introduce a classification method which not only uses measured\n vectorial feature information but also the label configuration within a to-\n pographic neighborhood. Due to the resulting dependence between the\n labels of neighboring pixels, a collective classification of a set of pixels\n becomes necessary. We propose a new method called 'Topographic Sup-\n port Vector Machine' (TSVM), which is based on a topographic kernel\n and a self-consistent solution to the label assignment shown to be equiv-\n alent to a recurrent neural network. The performance of the algorithm is\n compared to a conventional SVM on a cell image segmentation task.\n\n\n1 Introduction\n\nThe segmentation of natural images into semantically meaningful subdivisions can be con-\nsidered as one or more binary pixel classification problems, where two classes of pixels are\ncharacterized by some measurement data (features). For each binary problem the task is\nto assign a set of new pixels to one of the two classes using a classifier trained on a set of\nlabeled pixels (training data).\n\n\f\nIn conventional classification approaches usually the assumption of iid examples is made,\nso the classification result is determined solely by the measurement data. Natural images,\nhowever, possess a topographic structure, in which there are dependencies between the\nlabels of topographic neighbors, making the data non-iid. Therefore, not only the measure-\nment data, but also the labels of the topographic neighbors can be used in the classification\nof a pixel. It has been shown for a number of problems that dependencies between in-\nstances can improve model accuracy. A Conditional Random Field approach approach has\nbeen used for labeling text sequences by [1]. Combining this idea with local discriminative\nmodels, in [2] a discriminative random field was used to model the dependencies between\nthe labels of image blocks in a probabilistic framework. A collective classification rela-\ntional dependency network was used in [3] for movie box-office receipts prediction and\npaper topic classification. The maximization of the per label margin of pairwise Markov\nnetworks was applied in [4] to handwritten character recognition and collective hypertext\nclassification. There, the number of variables and constraints of the quadratic programming\nproblem was polynomial in the number of labels.\n\nIn this work, we propose a method which is also based on margin maximization and allows\nthe collective assignments of a large number of binary labels which have a regular grid\ntopography. In contrast to [4] the number of constraints and variables does not depend on\nthe number of labels. The method called topographic support vector machine (TSVM) is\nbased on the assumption that knowledge about the local label configuration can improve the\nclassification of a single data point. Consider as example the segmentation of a collection\nof images depicting physical objects of similar shape, but high variability in gray level and\ntexture. In this case, the measurements are dissimilar, while the local label configurations\nshow high similarity.\n\nHere, we apply the TSVM to the supervised bottom-up segmentation of microscopic im-\nages of Papanicolaou stained cervical cell nuclei from the CSSIP pap smear dataset1. Seg-\nmentation of these images is important for the detection of cervical cancer or precancerous\ncells. The final goal is to use so-called malignancy associated changes (MACs), e.g. a\nslight shift of the distribution of nuclear size not yet visual to the human observer, in order\nto detect cancer at an early stage [5]. A previously used bottom-up segmentation approach\nfor this data using morphological watersheds was reported to have difficulties with weak\ngradients and the presence of other large gradients adjacent to the target [5]. Top-down\nmethods like active contour models have successfully been used [6], but require heuristic\ninitialization and error correction procedures.\n\n\n2 Classification using a Topographic Support Vector Machine\n\nLet O = {o1, ..., on} be a set of n sites on a 2D pixel-grid and G = {Go, o O} be\na neighborhood system for O, where Go is the set of neighbors of o and neighborhood\nis defined by o Go and o Gp p Go. For each pixel site oi from the set O,\na binary label yi {-1, +1} giving the class assignment is assumed to be known. To\nsimplify the notation, in the following we are going to make use of multi-indices written in\nthe form of vectors, referring to pairs of indices on a two-dimensional grid. We define the\nneighborhood of order c as Gc = {Gi, i O}; Gi = {k O : 0 < (k-i)2 c}. This way,\nG1 describes the first order neighborhood system (4 neighbors), G2 the second order system\n(8 neighbors), and so on. Each pixel site is characterized by some measurement vector. This\ncould for example be the vector of gray value intensities at a pixel site, the gray value patch\naround a central pixel location, or the responses to a bank of linear or nonlinear filters (e.g.\nGabor coefficients). Using a training set composed of (possibly several) sets of pixel sites,\neach accompanied by a set of measurement vectors X = {xi, i [1..n]} and a set of\n\n 1Centre for Sensor Signal and Information Processing, University of Queensland\n\n\f\nlabels Y = {yi, i [1..n]} (e.g. a manually labeled image), the task of classification is\nto assign class labels to a set of pixels sites U = {u1, ..., u} of an unlabeled image, for\nwhich a set of measurements ~\n X = {~\n xi, i [1..]} is available. For the classification we\nwill use a support vector machine.\n\n\n2.1 Support Vector Classification\n\nIn Support Vector Classification (SVC) methods ([7]), a kernel is used to solve a complex\nclassification task in a usually high-dimensional feature space via a separating hyperplane.\nResults from statistical learning theory ([8]) show that maximizing the margin (the distance\nof the closest data point to the hyperplane) leads to improved generalization abilities. In\npractice, the optimal margin hyperplane can be obtained solving a quadratic programming\nproblem. Several schemes have been introduced to deal with noisy measurements via the\nintroduction of slack variables. In the following we will shortly review one such scheme,\nthe C-SVM, which is also later used in the experiments. For a canonical separating hy-\nperplane (w, b) in a higher dimensional feature space H, to which the n variables xi are\nmapped by (x), and n slack variables i the primal objective function of a C-SVM can\nbe formulated as\n 1 n\n 2 C\n min w + i , (1)\n wH,Rn 2 n i=1\nsubject to yi(wT (xi) + b) 1 - i, i 0, C > 0, i = 1, ..., n.\n\nIn order to classify a new object h with unknown label, the following decision rule is\nevaluated: m\n\n f (xh) = sgn iyiK(xh, xi) + b , (2)\n i=1\n\nwhere the sum runs over all m support vectors.\n\n\n2.2 Topographic Kernel\n\nWe now assume that the label of each pixel is determined by both the measurement and\nthe set of labels of its topographic neighbors. We define a vector yG where the labels of\n h\nthe q topographic neighbors of the pixel h are concatenated in an arbitrary, but fixed order.\nWe propose a support vector classifier using an extended kernel, which in addition to the\nmeasurement vector xh, also includes the vector yG :\n h\n\n\n K(xh, xj, yG , y ) = K , y ), (3)\n h Gj 1(xh, xj) + K2(yGh Gj\nwhere is a hyper-parameter. Kernel K1 can be an arbitrary kernel working on the mea-\nsurements. For kernel K2 an arbitrary dot-product kernel might be used. In the following\nwe restrict ourselves to a linear kernel (corresponding to the normalized Hamming distance\nbetween the local label configurations)\n\n 1\n K2(yG , y ) = y |y , (4)\n h Gj q Gh Gj\n\nwhere ...|... denotes a scalar product. The kernel K2 defined in eq. (4) thus consists of a\ndot-product between these vectors divided by their length. For a neighborhood Gc of order\n h\nc we obtain 1\n K2(yG , y ) = y\n h Gj q h+s yj+s (5)\n |s|