{"title": "Clustering via Concave Minimization", "book": "Advances in Neural Information Processing Systems", "page_first": 368, "page_last": 374, "abstract": null, "full_text": "Clustering via Concave Minimization \n\nP. S. Bradley and O. L. Mangasarian \n\nComputer Sciences Department \n\nUniversity of Wisconsin \n1210 West Dayton Street \n\nMadison, WI 53706 \n\nw. N. Street \n\nComputer Science Department \n\nOklahoma State University \n205 Mathematical Sciences \n\nStillwater, OK 74078 \n\nemail: paulb@es.wise.edu, olvi@es.wise.edu \n\nemail: nstreet@es. okstate. edu \n\nAbstract \n\nThe problem of assigning m points in the n-dimensional real space \nRn to k clusters is formulated as that of determining k centers in \nRn such that the sum of distances of each point to the nearest \ncenter is minimized. If a polyhedral distance is used, the problem \ncan be formulated as that of minimizing a piecewise-linear concave \nfunction on a polyhedral set which is shown to be equivalent to \na bilinear program: minimizing a bilinear function on a polyhe(cid:173)\ndral set. A fast finite k-Median Algorithm consisting of solving \nfew linear programs in closed form leads to a stationary point of \nthe bilinear program. Computational testing on a number of real(cid:173)\nworld databases was carried out. On the Wisconsin Diagnostic \nBreast Cancer (WDBC) database, k-Median training set correct(cid:173)\nness was comparable to that of the k-Mean Algorithm, however its \ntesting set correctness was better. Additionally, on the Wisconsin \nPrognostic Breast Cancer (WPBC) database, distinct and clini(cid:173)\ncally important survival curves were extracted by the k-Median \nAlgorithm, whereas the k-Mean Algorithm failed to obtain such \ndistinct survival curves for the same database. \n\n1 \n\nIntroduction \n\nThe unsupervised assignment of elements of a given set to groups or clusters of \nlike points, is the objective of cluster analysis. There are many approaches to this \nproblem, including statistical [9], machine learning [7], integer and mathematical \nprogramming [18,1]. In this paper we concentrate on a simple concave minimization \nformulation of the problem that leads to a finite and fast algorithm. Our point of \n\n\fClustering via Concave Minimization \n\n369 \n\ndeparture is the following explicit description of the problem: given m points in the \nn-dimensional real space Rn , and a fixed number k of clusters, determine k centers in \nRn such that the sum of \"distances\" of each point to the nearest center is minimized. \nIf the I-norm is used, the problem can be formulated as the minimization of a \npiecewise-linear concave function on a polyhedral set. This is a hard problem to \nsolve because a local minimum is not necessarily a global minimum. However, by \nconverting this problem to a bilinear program, a fast successive-linearization k(cid:173)\nMedian Algorithm terminates after a few linear programs (each explicitly solvable \nin closed form) at a point satisfying the minimum principle necessary optimality \ncondition for the problem. Although there is no guarantee that such a point is a \nglobal solution to our original problem, numerical tests on five real-world databases \nindicate that the k-Median Algorithm is comparable to or better than the k-Mean \nAlgorithm [18, 9, 8]. This may be due to the fact that outliers have less influence \non the k-Median Algorithm which utilizes the I-norm distance. In contrast the k(cid:173)\nMean Algorithm uses squares of 2-norm distances to generate cluster centers which \nmay be inaccurate if outliers are present. We also note that clustering algorithms \nbased on statistical assumptions that minimize some function of scatter matrices \ndo not appear to have convergence proofs [8, pp. 508-515]' however convergence to \na partial optimal solution is given in [18] for k-Mean type algorithms. \n\nWe outline now the contents of the paper. In Section 2, we formulate the clustering \nproblem for a fixed number of clusters, as that of minimizing the sum of the I-norm \ndistances of each point to the nearest cluster center. This piecewise-linear concave \nfunction minimization on a polyhedral set turns out to be equivalent to a bilinear \nprogram [3]. We use an effective linearization of the bilinear program proposed in \n[3, Algorithm 2.1] to solve our problem by solving a few linear programs. Because \nof the simple structure, these linear programs can be explicitly solved in closed \nform, thus leading to the finite k-Median Algorithm 2.3 below. In Section 3 we give \ncomputational results on five real-world databases. Section 4 concludes the paper. \n\nA word about our notation now. All vectors are column vectors unless otherwise \nspecified. For a vector x E Rn, Xi, i = 1, ... ,n, will denote its components. The \nnorm II . lip will denote the p norm, 1 ~ p ~ 00, while A E RTnxn will signify a real \nm x n matrix. For such a matrix, AT will denote the transpose, and Ai will denote \nrow i. A vector of ones in a real space of arbitrary dimension will be denoted bye. \n\n2 Clustering as Bilinear Programming \n\nGiven a set A of m points in R n represented by the matrix A E RTnxn and a number \nk of desired clusters, we formulate the clustering problem as follows. Find cluster \ncenters Gl, e = 1, ... , k, in Rn such that the sum of the minima over e E {I, ... , k} \nof the I-norm distance between each point Ai, i = 1, ... , m, and the cluster centers \nGl , e = 1, ... , k, is minimized. More specifically we need to solve the following \nmathematical program: \n\nminimize \n\nC ,D \n\nsubject to \n\nTn \n\nL min { e T Dil} \n\ni=l l=l , ... ,k \n\n-Dil ~ AT - Gl ~ Dil' i = 1, ... ,m, e = 1, ... k \n\n(1) \n\nHere Dil E Rn, is a dummy variable that bounds the components of the difference \n\n\fP. S. Bradley, O. L. Mangasarian and W. N. Street \n\n370 \nAT - Ct between point AT and center Ct, and e is a vector of ones in Rn. Hence \neT Dit bounds the I-norm distance between Ai and Ct. We note immediately that \nsince the objective function of (1) is the sum of minima of k linear (and hence \nconcave) functions, it is a piecewise-linear concave function [13, Corollary 4.1.14]. \nIf the 2-norm or p-norm, p oF 1,00, is used, the objective function will be neither \nconcave nor convex. Nevertheless, minimizing a piecewise-linear concave function \non a polyhedral set is NP-hard, because the general linear complementarity prob(cid:173)\nlem, which is NP-complete [4], can be reduced to such a problem [11, Lemma 1]. \nGiven this fact we try to look for effective methods for processing this problem. We \npropose reformulation of problem (1) as a bilinear program. Such reformulations \nhave been very effective in computationally solving NP-complete linear complemen(cid:173)\ntarity problems [14] as well as other difficult machine learning [12] and optimization \nproblems with equilibrium constraints [12]. In order to carry out this reformulation \nwe need the following simple lemma. \n\nLemma 2.1 Let a E Rk. Then \n\nmin {at} = min { t altl ttl = 1, tt ~ 0, f = 1, ... , k} \n\n(2) \n\n1