{"title": "Going Metric: Denoising Pairwise Data", "book": "Advances in Neural Information Processing Systems", "page_first": 841, "page_last": 848, "abstract": null, "full_text": "Going Metric: Denoising Pairwise Data \n\nVolker Roth \n\nInformatik III, University of Bonn \n\nJulian Laub \n\nFraunhofer FIRST.IDA \n\nRoemerstr 164, 53117 Bonn, Germany \n\nKekulestr. 7, 12489 Berlin, Germany \n\nroth\u00a9cs.uni-bonn.de \n\njlaub\u00a9first.fhg.de \n\nJoachim M. Buhmann \n\nInformatik III, University of Bonn \n\nRoemerstr 164, 53117 Bonn, Germany \n\njb\u00a9cs.uni-bonn.de \n\nKlaus-Robert Miiller \nFraunhofer FIRST.IDA, \n12489 Berlin, Germany, \nUniversity of Potsdam, \n\n14482 Potsdam, Germany \n\nklaus\u00a9first.fhg.de \n\nAbstract \n\nPairwise data in empirical sciences typically violate metricity, ei(cid:173)\nther due to noise or due to fallible estimates, and therefore are \nhard to analyze by conventional machine learning technology. In \nthis paper we therefore study ways to work around this problem. \nFirst, we present an alternative embedding to multi-dimensional \nscaling (MDS) that allows us to apply a variety of classical ma(cid:173)\nchine learning and signal processing algorithms. The class of pair(cid:173)\nwise grouping algorithms which share the shift-invariance property \nis statistically invariant under this embedding procedure, leading \nto identical assignments of objects to clusters. Based on this new \nvectorial representation, denoising methods are applied in a sec(cid:173)\nond step. Both steps provide a theoretically well controlled setup \nto translate from pairwise data to the respective denoised met(cid:173)\nric representation. We demonstrate the practical usefulness of our \ntheoretical reasoning by discovering structure in protein sequence \ndata bases, visibly improving performance upon existing automatic \nmethods. \n\n1 \n\nIntroduction \n\nUnsupervised grouping or clustering aims at extracting hidden structure from data \n(see e.g. [5]). However, for several major applications, e.g. bioinformatics or imag(cid:173)\ning, the data is solely available as scores of pairwise comparisons. Pairwise data is \nin no natural way related to the common viewpoint of objects lying in some \"well \nbehaved\" space like a vector space. Particularly, pairwise data may violate the tri(cid:173)\nangular inequality. Two cases should be distinguished: (i) The triangle inequality \nmight not be satisfied as a result of noisy measurements (for instance using string \nalignment algorithms in DNA analysis). (ii) The violation might be an intrinsic \nfeature of the data. This case, for instance, applies to datasets based upon some \nhuman judgment, e.g. \"X likes Y, Y likes Z =I? X likes Z\". \n\n\fSuch violations preclude the use of well established machine learning methods, which \ntypically have been formulated for metric data only. This paper proposes an algo(cid:173)\nrithm to metricize and subsequently de noise pairwise data. It uses the so-called \nconstant shift embedding (cf. [14]) for metrization, then constructs a positive semi(cid:173)\ndefinite matrix which can in sequel be used for denoising and clustering purposes. \nRegarding data-mining or clustering purposes, the most outstanding difference to \nclassical MDS is the following: for the class of pairwise clustering cost functions \nsharing the shift-invariance property1 the metrization step is loss-free in the sense \nthat the optimal assignments of objects to clusters remain unchanged. \n\nThe next section introduces techniques for metrization, denoising and clustering \npairwise data. This is followed by a section illustrating our methods for real world \ndata such as bacterial GyrE amino acid sequences and sequences from the ProD om \ndata base and a brief discussion. \n\n2 Proximity-based clustering and denoising \n\nOne of the most popular methods for grouping vectorial data is k-means clustering \n(see e.g. [1][5]). It derives a set of k prototype vectors which quantize the data set \nwith minimal quantization error. \n\nPartitioning proximity data is considered a much harder problem, since the inherent \nstructure of n samples is hidden in n 2 pairwise relations. The pairwise proximities \ncan violate the requirements of a distance measure, i.e. they may be non-symmetric \nand negative, and the triangular inequality does not necessarily hold. Thus, a loss(cid:173)\nfree embedding into a vector space is not possible, so that grouping problems of this \nkind cannot be directly transformed into a vectorial representation by means of clas(cid:173)\nsical embedding strategies such as multi-dimensional scaling (MDS [4]). Moreover \nclustering the MDS embedded data-vectors in general yields partitionings differ(cid:173)\nent from those obtained by directly solving the pairwise problem, since embedding \nconstraints might be in conflict with the clustering goal. \n\nLet us start from a pairwise clustering loss function (see [12]) that combines the \nproperties of additivity, scale- and shift invariance, and statistical robustness \n\nHPc = t 2:~=1 2:7=1 MivMjvDij \n\nv=1 \n\n2:~=1 Mlv \n\n' \n\n(1) \n\nwhere the data are characterized by the matrix of pairwise dissimilarities D ij . \nThe assignments of objects to clusters are encoded in the binary stochastic ma-\ntrix M E {O, l}nxk : 2:~=1 Miv = 1. For such cost functions it can be shown [14] \nthat there always exists a set of vectorial data representations-the constant shift \nembeddings-such that the grouping problem can be equivalently restated in terms \nof Euclidian distances between these vectors. In order to handle non-symmetric \ndissimilarities, it should be noticed that HPc is also invariant under symmetriz(cid:173)\ning transformations: Dij +- 1/2(Dij + Dji). In the following we will thus restrict \nourselves to the case of symmetric dissimilarity matrices. \nTheorem 2.1. [141 Given an arbitrary (possibly non-metric) (n x n) dissimilarity \nmatrix D with zero self-dissimilarities, there exists a transformed matrix fJ such \nthat \n(i) the matrix fJ can be interpreted as a matrix of squared Euclidian distances \n\nIThe term shift-invariance means that the optimal assignments of objects to clusters \nare not influenced by constant additive shifts of the pairwise dissimilarities (excluding the \nself-dissimilarities which are assumed to be zero). \n\n\fbetween a set of vectors {xdi=l' D is derived from D by both symmetrizing and \napplying the constant shift embedding trick; \n(ii) the original pairwise clustering problem is equivalent to a k-means problem in \nthis vector space, in the sense that the optimal assignments of objects to clusters \n{MiV } are identical in both problems. \n\nA re-formulation of pairwise clustering as a k-means problem is clearly advanta(cid:173)\ngeous: (i) the availability of prototype vectors defines a generic rule for using the \nlearned partitioning in a predictive sense, (ii) we can apply standard noise- and \ndimensionality-reduction methods in order to both stabilize the estimation proce(cid:173)\ndure and to speed up the grouping itself. \n\nConstant shift embedding Let D = (Dij) E jRnxn be the matrix of pairwise \nsquared dissimilarities between n objects. For a generic noisy dataset yfJ5:j 1:. \nJD ik + JD kj so that v15 is non metric. Since\";-: is monotonically increasing, \n~ Do such that JDij + Do ~ JDik + Do + JDkj + Do V i,j, k = 1,2 ... n. Let \n\nD=D+Do(eeT -In) \n\n(2) \n\nwhere e = (1 , 1, ... 1) T is a n-dimensional column-vector and In the identity matrix. \nfor all i i:- j. We \nThis corresponds to a constant additive shift Dij = Dij + Do \nlook for the minimal constant shift Do such that D satisfy the triangle inequality. \nIn order to make the main result clear, we first need to introduce the notion of a \ncentralized matrix. Let P be an arbitrary matrix and let Q = I - ~ee T. Q is the \nprojection matrix on the orthogonal complement of e. Define the centralized P by: \n\npe = QPQ. \n\n(3) \n\nLet D be fixed and let us decompose D as follows: \n\nDij = Sii + Sjj - 2Sij . \n\n(4) \nThis decomposition is motivated by the fact that if D is a squared Euclidian distance \nbetween the vectorial data Xi, then Dij = Ilxi - xjl12 = IIxil12 + IIxjl12 - 2x{ Xj' \nIt follows from equation (4) that a constant off-diagonal shift on D corresponds to \na constant shift on the diagonal of S. S is not fixed by the choice of D, since we \nmay always change its diagonal elements, yet recover the same D. That is, any \nmatrix of the form (Sij + I/2~Si + I/2~Sj) gives the same distance D as S for \narbitrary ~Si's. By simple algebra it can be shown that se = - ~ De, i. e. se is \nunique. Furthermore D derives from a squared Euclidian distance if and only if s e \nis positive semi-definite [14]. Let s e = s e - An(se)In, where AnU is the minimal \neigenvalue of its argument. Then se is positive semi-definite [14]. These are the \nmain ingredients for proving the following: \nTheorem 2.2 (Minimal Do). !4J. Do = -2An(se) is the minimal constant such \nthat D = D + Do (ee T -\n\nIn) derive from squared Euclidian distance. \n\nAll proofs can be found in [14] . We have thus shown that applying large enough \nadditive shifts to the off-diagonal elements of D results in a matrix se that is posi(cid:173)\ntive semi-definite, and can thus be interpreted as a Gram matrix. This means, that \nin some (n -\nI)-dimensional Euclidian space there exists a vector representation of \nthe objects, summarized in the \"design\" matrix X (the rows of X are the feature \nvectors), such that se = XX T . \nFor the pairwise clustering cost function the optimal assignments of objects to \nclusters are invariant under the constant-shift embedding procedure, according to \n\n\ftheorem 2.1. Hence, the grouping problem can be re-formulated as optimizing the \nclassical k-means criterion in the embedding space. \nIn many applications, however, it is advantageous not to cluster in the full space \nbut to insert some dimension reduction step, that serves the purpose of increasing \nefficiency and noise reduction. While it is unclear how to denoise for the original \npairwise object representations while respecting additivity, scale- and shift invari(cid:173)\nance, and statistical robustness properties of the clustering criterion, we can easily \napply kernel PCA [16] to Be after the constant-shift embedding. \n\nDenoising of pairwise data by Constant Shift Embedding For de noising \nwe construct D which derives from \"real\" points in a vector space, i.e. Be is positive \nIn a first step, we briefly describe, how these real points can be \nsemi-definite. \nrecovered by loss-free kernel PCA [16]: \n(i) Calculate the centralized kernel matrix se = -~QDQ . \n(ii) Decompose se = V A V T where V = (Vl,'\" vn ) with eigenvectors vi's and \n.An) with eigenvalues .A1 ~ ... ~ .Ap > .Ap+1 = a ~ .Ap+2 ~ ... ~ .An. \nA = diag(.A1 , '\" \n(iii) Calculate the n x (n - 2) mapping matrix X~_2 = V':_2 (A~_2)1 /2, where \nV':_2 = (V1, ... Vp ,Vp+2,\u00b7\u00b7\u00b7 vn-d and A~_2 = diag(.A1 -\n.An,.Ap+2-\n.An,'\" \nThe rows of X~_2 contain the vectors {xD (i = 1,2 ... n) in n - 2 dimensional \nspace, whose mutual distances are given by D. When focusing on noise reduction, \nhowever, we are rather interested in some approximative reconstructions of the ''real'' \nvectors. In the PCA framework, one usually discards the directions which corre(cid:173)\nspond to small eigenvalues as noise (c.f. [9]). We can thus obtain a representation \nin a space of reduced dimension (with the well-defined error of PCA reconstruction) \nwhen choosing t < n - 2 in step (iii) of the above algorithm: \n\n.An) (these are the constantly shifted eigenvalues). \n\n.An, ... .Ap -\n\n.An-1 -\n\nX* - y.*(A*)1/2 \n, \n\nt \n\nt \n\n-\n\nt \n\nwhere i't* consists of the first t column vectors of V':_2 and At is the top txt \nsubmatrix of A~ _ 2' The vectors in ~t then differ the least from the vectors in ~n-2 \nin the sense of a quadratic error. \nThe advantages of this method in comparison to directly applying classical scaling \nvia MDS are: (i) t can be larger than the number p of positive eigenvalues, (ii) \nthe embedded vectors are the best least squares error approximation to the optimal \nvectors which preserve the grouping structure. \nIt should be noticed, however, that given the exactly reconstructed vectors in ~n-2 \nfound by loss-free kernel PCA, we could have also applied any other standard meth(cid:173)\nods for dimensionality reduction or visualization, such as projection pursuit [6], local \nlinear embedding (LLE) [15], Isomap [17] or Self-organizing maps [8]. \n\n3 Application on protein sequences \n\n3.1 Bacterial GyrB amino acid sequences \n\nWe first illustrate our de noising technique on the gyrase subunit B. The dataset con(cid:173)\nsists of 84 amino acid sequences from five genera in Actinobacteria: 1: Corynebac(cid:173)\nterium, 2: Mycobacterium, 3: Gordonia, 4: Nocardia and 5: Rhodococcus. A de(cid:173)\ntailed description can be found in [7]. This dataset was used in [18] for illustration \nof marginalized kernels. The authors hinted at the possibility of computing the dis(cid:173)\ntance matrix by using BLAST scores [2], noting, however, that these scores could \nnot be converted into positive semidefinite kernels. \n\n\fIn our experiment, the sequences have been aligned by the Smith-Waterman algo(cid:173)\nrithm [11] which yields pairwise alignment scores. Using constant shift embedding \na positive semidefinite kernel is obtained, leaving the cluster assignment unchanged \nfor shift invariant cost functions. \nThe important step is the denoising. Several projections to lower dimensions have \nbeen tested and t = 5 turned out to be a good choice, eliminating the bulk of noise \nwhile retaining the essential cluster structure. \nFigure 1 shows the striking improvement of the distance matrix after denoising. \nOn the left hand side the ideal distance matrix is depicted, consisting solely of O's \n(black) and l 's (white), reflecting the true cluster membership. In the middle and \non the right the original and the denoised distance matrix are shown, respectively. \nDenoising visibly accentuates the cluster structure in the pairwise data. Since we \n\n10 \n\n20 \n\n30 \n\n40 \n\n50 \n\n60 \n\n70 \n\n80 \n\n10 \n\n20 \n\n30 \n\n40 \n\n50 \n\n60 \n\n70 \n\n80 \n\n20 \n\n40 \n\n60 \n\n80 \n\n20 \n\n40 \n\n60 \n\n80 \n\n20 \n\n40 \n\n60 \n\n80 \n\nFigure 1: Distance matrix: On the left the ideal distance matrix reflects the true \ncluster structure. In the middle and on the right: distance matrix before and after \nde noising \n\ndispose of the true labels, we can quantitatively assess the improvement by de(cid:173)\nnoising. We performed usual k-means clustering, followed by a majority voting \nto match cluster labeling. For the denoised data we obtained 3 misclassifications \n(3.61%) whereas we got 17 (20.48%) for the original data. This simple experiment \ncorroborates the usefulness of our embedding and denoising strategy for pairwise \ndata. \n\nIn order to fulfill the spirit of the theory of constant-shift embedding, the cost(cid:173)\nfunction of the data-mining algorithm subsequent to the embedding needs to be \nshift invariant. We may by the same token go a step further and apply algorithms \nfor which this condition does not hold. In doing so, however, we give up the math(cid:173)\nematical traceability of the error. \nTo illustrate that denoised pairwise data can act as standalone quality data inde(cid:173)\npendent of the framework of algorithms based on shift invariant cost functions (and \nin order to compare to the results obtained in [18]), a linear SVM is trained on 25% \nof the total data to mutually classify the genera-pairs: 3 - 4, 3 - 5, 4 - 5. Genera 1 \nand 2 separate errorless and have therefore been omitted. Model selection over the \nregularization parameter C has been performed by choosing the optimal value out \nof 10 equally spaced values from [10-4, 102 ]. The results and have been averaged \nby a lOOO-fold sampling (cf. table 1). The best values are printed in bold. \nFor the classification of genera 3 - 5 and 4 - 5 we obtain a substantial improve(cid:173)\nment by denoising. Interestingly this is not the case for genera 3 - 4 which may \nbe due to the elimination of discriminative features by the de noising procedure. \nThe error still is significantly smaller than the error obtained by MCK2 and FK, \nwhich is in agreement with the superiority of a structure preserving embedding of \nSmith-Waterman scores even when left undenoised: FK and MCK are kernels de-\n\n\fGenera \n3 - 4 \n3-5 \n4-5 \n\nFK \n10.4 \n10.9 \n23.1 \n\nMCK2 \n\nUndenoised \n\nDenoised \n\n8.48 \n5.71 \n11.6 \n\n5.06 \n5.72 \n7.55 \n\n5.43 \n3.83 \n3.17 \n\nTable 1: Comparison of mean test-error of supervised classification by linear SVM \nof genera with training sample 25 % of the total sample. The results for MCK2 \n(Marginalized Count Kernel) and FK (Fisher Kernel) is obtained by kernel Fisher \ndiscriminant analysis which compares favorably to the SVM in several benchmarks \n[18]. \n\nrived from a generative model, whereas the alignment scores are obtained from a \nmatching algorithm specifically tuned for protein sequences, reflecting much better \nthe underlying structure of protein data. \n\n3.2 Clustering of ProDom sequences \n\nThe analysis described in this section aims at finding a partition of domain sequences \nfrom the ProDom database, [3], that is meaningful w.r.t. structural similarity. In \norder to measure the quality of the grouping solution, we use the computed solution \nin a predictive way to assign group labels to SCOP sequences, which have been \nlabeled by experts according to their structure, [10]. The predicted labels are then \ncompared with the \"true\" SCOP labels. \n\nFor demonstration purposes, we select the following subset of sequences from \nprodom2001. 2. srs: among all sequences we choose those which are highly simi(cid:173)\nlar to at least one sequence contained in the first four folds of the SCOP database. 2 \nBetween these sequences, we compute pairwise (length-corrected and standardized) \nSmith-Waterman alignment scores, summarized in the matrix (Sij). These similar(cid:173)\nities are transformed into dissimilarities by setting Dij := Sii + Sjj - 2Sij . The \ncentralized score matrix SC = -1/2Dc possesses some highly negative eigenvalues, \nindicating that metric properties are violated. Applying the constant-shift embed(cid:173)\nding method, a valid Mercer kernel is derived, with an eigenvalue spectrum that \nshows only a few dominating components over a broad \"noise\"-spectrum (see figure \n2). Extracting the first 16 leading principal components3 leads to a vector repre(cid:173)\nsentation of the sequences as points in ~16. These points are then clustered by \nminimizing the k-means cost function within a deterministic annealing framework. \nThe model order was selected by applying a re-sampling based stability analysis, \nwhich has been demonstrated to be a suitable model order selection criterion for \nunsupervised grouping problems in [13]. \n\nIn order to measure the quality of the grouping solution, all 1158 SCOP sequences \nfrom the first four folds are embedded into the 16-dimensional space. The predicted \ngroup structure on this test set is then compared with the true SCOP fold-labels. \nFigure 3 shows both the predicted group membership of these sequences and their \ntrue SCOP fold-label in the form of a bar diagram: the sequences are ordered by \nincreasing group label (the lower horizontal bar), and compared with the true fold \nclassification (upper bar) . In order to quantify the results, the inferred clusters are \n\n2\"Highly similar\" here means that the highest alignment score exceeds a predefined \n\nthreshold. The result is a subset of roughly 2700 ProD om domain sequences. \n\n3Subsampling techniques or deflation can be used to reduce computational load for \nlarge-scale problems. We only used a subset of 800 randomly chosen proteins for estimating \nthe 16 leading eigenvectors. \n\n\f<1,) 1200 -\n\n\" \" ~ \"'OO \nij \n.~ ~) \n'\" \n\n16lcading cigcnvcctors selcctcd \n\n(Partial) eigenvalue spec(cid:173)\n\nFigure 2: \ntrum of the shifted score matrix. The \ndata are projected onto the first leading \n16 eigenvectors, whereas the remaining \nprincipal components are considered to \nbe dominated by noise. \n\nre-Iabeled (''re-colored'') according to the maximum number of correctly identifiable \nfold-labels. This procedure allows us to correctly identify the fold label of roughly \n94 % of the SCOP sequences. \n\n1158 SCOP sequences from folds 1-4 \n\n~1==::j1\"1 -------~I ~I(=~~~II -.... ~I\nr=11+1. =1 ~II ___ ..... _IIIIIIJI ~I(==:_-~I II Prediction \n\nI SCOP fold label \n\n1 \n\nCluster I Cluster 3 ... \n\nCluster 2 \n\nErrors \n\nre- Iabeled by \nmajority voting \n\nFigure 3: Visualization of cluster membership of the SCOP sequences contained in \nfolds 1-4. \n\nDespite this surprisingly high percentage, it is necessary to deeper analyze the \nbiological relevance of the inferred grouping solution. In order to check to what \nextent the above \"over-all\" result is influenced by artefacts due to highly related \n(or even almost identical) SCOP sequences, we repeated the analysis based on \nthe subset of 128 SCOP sequences with less than 50 % sequence identity (PDB-\n50). Predicting the group membership of these 128 sequences and using the same \nre-Iabeling approach, we can correctly identify 86 % of the fold-labels. This result \ndemonstrates that we have not only found trivial groups of almost identical proteins, \nbut that we have indeed extracted relevant structural information. \n\n4 Discussion and Conclusion \n\nThis paper provides two main contributions that are highly useful when analyzing \npairwise data. First, we employ the concept of constant shift embedding to provide \na metric representation of the data. For a certain class of grouping principles sharing \na shift-invariance property, this embedding is distortion-less in the sense that it does \nnot influence the optimal assignments of objects to groups. Given the metricized \ndata we can now use common signal (pre-)processing and denoising techniques that \nare typically only defined for vectorial data. \n\nAs we investigate the clustering of protein sequences from data bases like GyrB and \nProDom, we are given non-metric pairwise proximity information that is strongly \ndeteriorated by the shortcomings of the available alignment procedures. Thus, it \nis important to apply denoising techniques to the data as a second step before \nrunning the actual clustering procedure. We find that the combination of these two \nprocessing steps is successful in unraveling protein structure, greatly improving over \nexisting methods (as exemplified for GyrB and ProDom). \n\n\fFuture research will be dedicated to further evaluation of the proposed algorithm. \nWe will also explore the perspectives it opens in any field handling pairwise data. \n\nAcknowledgments The gyrE amino acid sequences where offered by courtesy of \nIdentification and Classification of Bacteria (ICB) databank team [19]. The authors \nare partially supported by DFG grants # MU 987/ 1-1 and # BU 914/ 4-1. \n\nReferences \n\n[1] A.KJain, M.N. Murty, and P.J. Flynn. Data clustering: a review. ACM Computing \n\nSurveys, 31(3):264- 323, 1999. \n\n[2] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local \n\nalignment search tool. J. Mol. Bioi., 215:403 - 410, 1990. \n\n[3] F. Corpet, F. Servant, J. Gouzy, and D. Kahn. Prodom and prodom-cg: tools for \nprotein domain analysis and whole genome comparisons. Nucleid Acids Res., 28:267-\n269, 2000. \n\n[4] T. F. Cox and M. A. A. Cox. Multidimensional Scaling. Chapman & Hall, London, \n\n2001. \n\n[5] R.O. Duda, P.E.Hart, and D.G.Stork. Pattern classification. John Wiley & Sons, \n\nsecond edition, 2001. \n\n[6] P. J. Huber. Projection pursuit. The Annals of Statistics, pages 435--475, 1985. \n[7] H. Kasai, A. Bairoch, K Watanabe, K Isono, and S. Harayama. Construction of the \ngyrb database for the identification and classification of bacteria. Genome Informatics, \npages 13 - 21, 1998. \n\n[8] T. Kohonen. Self-Organizing Maps. Springer-Verlag, Berlin, 1995. \n[9] S. Mika, B. SchOlkopf, A.J. Smola, K-R. Miiller, M. Scholz, and G. Ratsch. Kernel \nPCA and de- noising in feature spaces. In M.S. Kearns, S.A. Solla, and D.A. Cohn, \neditors, Advances in Neural Information Processing Systems, volume 11, pages 536-\n542. MIT Press, 1999. \n\n[10] A.G. Murzin, S.E. Brenner, T. Hubbard, and C. Chothia. Scop: a structural classifi(cid:173)\ncation of proteins database for the investigation of sequences and structures. J. Mol. \nBioi., 247:536- 540, 1995. \n\n[11] W. R. Pearson and D. J. Lipman. Improved tools for biological sequence analysis. \n\nProc. Natl. Acad. Sci, 85:2444 - 2448, 1988. \n\n[12] J. Puzicha, T. Hofmann, and J. Buhmann. A theory of proximity based clustering: \n\nStructure detection by optimization. Pattern Recognition, 33(4):617- 634, 1999. \n\n[13] V. Roth, M. Braun, T. Lange, and J. Buhmann. A resampling approach to cluster \n\nvalidation. In Computational Statistics-COMPSTAT'02, 2002. To appear. \n\n[14] V. Roth, J. Laub, M. Kawanabe, and J.M. Buhmann. Optimal cluster preserving \nembedding of non-metric proximity data. Technical Report IAI-TR-2002-5, University \nof Bonn, 2002. \n\n[15] S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embed(cid:173)\n\nding. Science, 290:2323-2326, 2000. \n\n[16] B. Schiilkopf, A. Smola, and K-R. Miiller. Nonlinear component analysis as a kernel \n\neigenvalue problem. Neural Computation, 10:1299- 1319, 1998. \n\n[17] J.B. Tenenbaum, V. Silva, and J.C. Langford. A global geometric framework for \n\nnonlinear dimensionality reduction. Science, 290:2319- 2323, 2000. \n\n[18] K Tsuda, T. Kin, and K Asai. Marginalized kernels for biological sequences. Proc. \n\nISMB, to appear:2002 , http://www.cbrc.jp/ tsuda/. \n\n[19] K Watanabe, J. Nelson, S. Harayama, and H. Kasai. Icb database: the gyrb database \nfor identification and classification of bacteria. Nucleic Acids Res., 29:344 - 345, 2001. \n\n\f", "award": [], "sourceid": 2215, "authors": [{"given_name": "Volker", "family_name": "Roth", "institution": null}, {"given_name": "Julian", "family_name": "Laub", "institution": null}, {"given_name": "Klaus-Robert", "family_name": "M\u00fcller", "institution": null}, {"given_name": "Joachim", "family_name": "Buhmann", "institution": null}]}