{"title": "ICA-based Clustering of Genes from Microarray Expression Data", "book": "Advances in Neural Information Processing Systems", "page_first": 675, "page_last": 682, "abstract": "", "full_text": "ICA-Based Clustering of Genes from \n\nMicroarray Expression Data \n\n \n\nSu-In Lee* and Serafim Batzoglou\u00a7 \n*Department of Electrical Engineering \n\n\u00a7Department of Computer Science \n\nStanford University, Stanford, CA 94305 \n\nsilee@stanford.edu, serafim@cs.stanford.edu \n\nAbstract \n\nWe  propose  an  unsupervised  methodology  using  independent \ncomponent  analysis  (ICA)  to  cluster  genes  from  DNA  microarray \ndata.  Based  on  an  ICA  mixture  model  of  genomic  expression \npatterns, linear and nonlinear ICA finds components that are specific \nto  certain  biological  processes.  Genes  that  exhibit  significant \nup-regulation  or  down-regulation  within  each  component  are \ngrouped  into  clusters.  We  test  the  statistical  significance  of \nenrichment  of  gene  annotations  within  each  cluster.  ICA-based \nclustering  outperformed  other  leading  methods  in  constructing \nfunctionally  coherent  clusters  on  various  datasets.  This  result \nsupports our model of genomic expression data as composite effect \nof  independent  biological  processes.  Comparison  of  clustering \nperformance  among  various \nincluding  a \nkernel-based  nonlinear  ICA  algorithm  shows  that  nonlinear  ICA \nperformed \nthe  best  for  small  datasets  and  natural-gradient \nmaximization-likelihood worked well for all the datasets.  \n\nICA  algorithms \n\n1  Introduction \n\nMicroarray technology has enabled genome-wide expression profiling, promising to \nprovide insight into underlying biological mechanism involved in gene regulation. To \naid  such  discoveries,  mathematical  tools  that  are  versatile  enough  to  capture  the \nunderlying biology and simple enough to be applied efficiently on large datasets are \nneeded. Analysis tools based on novel data mining techniques have been proposed \n[1]-[6].  When  applying  mathematical  models  and  tools  to  microarray  analysis, \nclustering genes that have the similar biological properties is an important step for \nthree  reasons:  reduction  of  data  complexity,  prediction  of  gene  function,  and \nevaluation  of  the  analysis  approach  by  measuring  the  statistical  significance  of \nbiological coherence of gene clusters.  \nIndependent component analysis (ICA) linearly decomposes each of N vectors into M \ncommon  component  vectors  (N\u2265M)  so  that  each  component  is  statistically  as \nindependent from the others as possible. One of the main applications of ICA is blind \n\n\f \n\nsource  separation  (BSS)  that  aims  to  separate  source  signals  from  their  mixtures. \nThere have been a few attempts to apply ICA to the microarray expression data to \nextract  meaningful  signals  each  corresponding  to  independent  biological  process \n[5]-[6].  In  this  paper,  we  provide  the  first  evidence  that  ICA  is  a  superior \nmathematical model and clustering tool for microarray analysis, compared to the most \nwidely  used  methods  namely  PCA  and  k-means  clustering.  We  also  introduce  the \napplication  of  nonlinear  ICA  to  microarray  analysis,  and  show  that  it  outperforms \nlinear ICA on some datasets. \nWe  apply  ICA  to  microarray  data  to  decompose  the  input  data  into  statistically \nindependent components. Then, genes are clustered in an unsupervised fashion into \nnon-mutually exclusive clusters. Each independent component is assigned a putative \nbiological  meaning  based  on  functional  annotations  of  genes  that  are  predominant \nwithin  the  component.  We  systematically  evaluate  the  clustering  performance  of \nseveral  ICA  algorithms  on  four  expression  datasets  and  show  that  ICA-based \nclustering is superior to other leading methods that have been applied to analyze the \nsame datasets. We also proposed a kernel based nonlinear ICA algorithm for dealing \nwith  more  realistic  mixture  model.  Among  the  different  linear  ICA  algorithms \nincluding  six  linear  and  one  nonlinear  ICA  algorithm,  the  natural-gradient \nmaximum-likelihood  estimation  method  (NMLE)  [7]-[8]  performs  well  in  all  the \ndatasets. Kernel-based nonlinear ICA method worked better for three small datasets. \n\n2  Mathematical model of genome-wide expression \n\nSeveral  distinct  biological  processes  take  place  simultaneously  inside  a  cell;  each \nbiological process has its own expression program to up-regulate or down-regulate the \nlevel of expression of specific sets of genes. We model a genome-wide expression \npattern in a given condition (measured by a microarray assay) as a mixture of signals \ngenerated by statistically independent biological processes with different activation \nlevels. We design two kinds of models for genomic expression pattern: a linear and \nnonlinear mixture model.  \nSuppose that a cell is governed by M independent biological processes S = (s1, \u2026, \nsM)T, each of which is a vector of K gene expression levels, and that we measure the \nlevels of expression of all genes in N conditions, resulting in a microarray expression \nmatrix  X  =  (x1,\u2026,xN)T.  The  expression  level  at  each  different  condition  j  can  be \nexpressed as linear combinations of the M biological processes: xj=aj1s1+\u2026+ajMsM. \nWe can express this idea concisely in matrix notation as follows. \n\nX\n\n=\n\nAS\n\n,\n\nx\n1\n\nM\nx\n\nN\n\n\uf8ee\n\uf8ef\n\uf8ef\n\uf8ef\n\uf8f0\n\n\uf8f9\n\uf8fa\n\uf8fa\n\uf8fa\n\uf8fb\n\n=\n\na\n11\n\nM\n\nN\n\n1\n\na\n\n\uf8ee\n\uf8ef\n\uf8ef\n\uf8ef\n\uf8f0\n\na\nM\n1\n\nM\n\nNM\n\na\n\ns\n1\n\nM\ns\n\nM\n\n\uf8f9\n\uf8fa\n\uf8fa\n\uf8fa\n\uf8fb\n\n\uf8ee\n\uf8ef\n\uf8ef\n\uf8ef\n\uf8f0\n\n\uf8f9\n\uf8fa\n\uf8fa\n\uf8fa\n\uf8fb\n\nL\n\nL\n\n                            (1) \n\nMore  generally,  we  can  express  X  = (x1,\u2026,xN)T  as  a  post-nonlinear  mixture  of  the \nunderlying independent processes as follows, where f(.) is a nonlinear mapping from \nN to N dimensional space. \n\n  \n\nX\n\n=\n\nf\n\n(\n\nAS\n\n),\n\nx\n1\n\nM\nx\n\nN\n\n\uf8ee\n\uf8ef\n\uf8ef\n\uf8ef\n\uf8f0\n\n\uf8f9\n\uf8fa\n\uf8fa\n\uf8fa\n\uf8fb\n\n=\n\nf\n\n\uf8eb\n\uf8ec\n\uf8ec\n\uf8ec\n\uf8ed\n\n\uf8ee\n\uf8ef\n\uf8ef\n\uf8ef\n\uf8f0\n\na\n11\n\nM\n\nN\n\n1\n\na\n\nL\n\nL\n\na\n1\n\nM\n\nM\n\nNM\n\na\n\n\uf8f9\n\uf8fa\n\uf8fa\n\uf8fa\n\uf8fb\n\n\uf8ee\n\uf8ef\n\uf8ef\n\uf8ef\n\uf8f0\n\ns\n1\n\nM\n\nM\n\ns\n\n\uf8f9\n\uf8fa\n\uf8fa\n\uf8fa\n\uf8fb\n\n\uf8f6\n\uf8f7\n\uf8f7\n\uf8f7\n\uf8f8\n\n                           (2) \n\n\f3  Independent component analysis \n\n \n\nIn  the  models  described  above,  since  we  assume  that  the  underlying  biological \nprocesses  are  independent,  we  suggest  that  vectors  S=(s1,\u2026,sM)  are  statistically \nindependent and so ICA can recover S from the observed microarray data X. For linear \nICA,  we  apply  natural-gradient  maximum  estimation  (NMLE)  method  which  was \nproposed in [7] and was made more efficient by using natural gradient method in [8]. \nWe also apply nonlinear ICA using reproducible kernel Hilbert spaces (RKHS) based \non [9], as follows: \n1. We map the N dimensional input data xi to \u0424(xi) in the feature space by using the \nkernel trick. The feature space is defined by the relationship \u0424(xi)T\u0424(xj)=k(xi,, xj). \nThat is, inner product of mapped data is determined to by a kernel function k(.,.) in \nthe  input  space;  we  used  a  Gaussian  radial  basis  function  (RBF)  kernel \n(k(x,y)=exp(-|x-y|2))  and  a  polynomial  kernel  of  degree  2  (k(x,y)=(xTy+1)2).  To \nperform mapping, we found orthonormal bases of the feature space by randomly \nsampling L input data v={v1,\u2026,vL} 1000 times and choosing one set minimizing the \ncondition number of \u03a6v=(\u03a6(v1),\u2026,\u03a6(vL)). Then, a set of orthonormal bases of the \nfeature  space  is  determined  by  the  selected L  images  of  input  data  in  v  as \u039e  = \nT\u03a6v)-1/2.  We  map  all  input  data  x1,\u2026,xK,  each  corresponding  to  a  gene,  to \n\u03a6v(\u03a6v\n\u03a8(x1),\u2026,\u03a8(xK)  in the feature space with basis \u039e, as follows:  \n\n    \n\n \n\n\u03a8(xi)=(\u03a6v\n\nT\u03a6v)-1/2\u03a6v\n\nT\u03a6v(xi)\n\n=\n\n\uf8ee\n\uf8ef\n\uf8ef\n\uf8ef\n\uf8f0\n\nvvk\n,\n(\n1\n\n1\n\n)\n\nM\n\nvvk\n,\n(\n1\n\nL\n\n)\n\n\u2212\n\n2/1\n\nK\n\nL\n\nvvk\n,\n(\n\n1\n\nL\n\n)\n\nM\nvvk\n,\n(\n\nL\n\n)\n\nL\n\n\uf8f9\n\uf8fa\n\uf8fa\n\uf8fa\n\uf8fb\n\nxvk\n,\n(\n\n1\n\ni\n\n)\n\nM\n,\n\nL\n\nx\n\ni\n\n)\n\nvk\n(\n\n\uf8ee\n\uf8ef\n\uf8ef\n\uf8ef\n\uf8f0\n\n\uf8f9\n\uf8fa\n\uf8fa\n\uf8fa\n\uf8fb\n\nL\n\n\u211c\u2208\n\n(1\u2264 i\u2264K)   (3) \n\n2. We linearly decompose the mapped data \u03a8=[\u03a8(x1),.,\u03a8(xK)]\u2208RL\u00d7K into statistically \n\nindependent components using NMLE. \n\n4  Proposed approach \n\nThe  microarray  dataset  we  are  given  is  in  matrix  form  where  each  element  xij \ncorresponds to the level of expression of the jth gene in the ith experimental condition. \nMissing  values  are  imputed  by  KNNImpute  [10],  an  algorithm  based  on  k  nearest \nneighbors that is widely used in microarray analysis. Given the expression matrix X of \nN experiment by K genes, we perform the following steps. \n1.  Apply ICA to decompose X into independent components y1, \u2026,yM as in Equations \n(1)  and  (2).  Prior  to  applying  ICA,  remove  any  rows  that  make  the  expression \nmatrix  X  singular.  After  ICA,  each  component  denoted  by    yi    is  a  vector \ncomprising K loads gene expression levels, i.e.,  yi = (yi1, ...,yiK). We chose to let \nthe  number  of  components  M  to  be  maximized,  which  is  equal  the  number  of \nmicroarray experiments N because the maximum for N in our datasets was 250, \nwhich is smaller than the number of biological processes we hypothesize to act \nwithin a cell.  \n\n2. For each component, cluster genes according to their relative loads yij/mean(yi). \nBased  on  our  ICA  model,  each  component  is  a  putative  genomic  expression \nprogram of an independent biological process. Thus, our hypothesis is that genes \nshowing relatively high or low expression level within the component are the most \nimportant for the process. We create two clusters for each component: one cluster \ncontaining  genes  with  expression  level  higher  than  a  threshold,  and  one  cluster \ncontaining genes with expression level lower than a threshold.  \n\n\f \n\nCluster i,1 = {gene j | \nCluster i,2 = {gene j | \n\nijy\nijy\n\n> mean(\n< mean(\n\niy\niy\n\n) + c\n) \u2013 c\n\n\u00d7std(\n\u00d7std(\n\niy\niy\n\n \n\n)} \n)}                 (4) \n\n  \n\nHere, mean(yi) is the average, std(yi) is the standard deviation of yi; and c is an \nadjustable coefficient. The value of the coefficient c was varied from 1.0 to 2.0 and \nthe result for c=1.25 was presented in this paper. The results for other values of c \nare similar, and are presented on the website www.stanford.edu/~silee/ICA/. \n\n3.  For  each  cluster,  measure  the  enrichment  of  each  cluster  with  genes  of  known \nfunctional annotations. Using the Gene Ontology (GO) [11] and KEGG [12] gene \nannotation databases, we calculate the p-value for each cluster with every gene \nannotation, which is the probability that the cluster contains the observed number \nof genes with the annotation by chance assuming the hypergeometric distribution \n(details in [4]). For each gene annotation, the minimum p-value that is smaller than \n10-7  obtained from  any  cluster  was  collected.  If  no p-value  smaller  than  10-7  is \nfound, we consider the gene annotation not to be detected by the approach. As a \nresult,  we  can  assign  biological  meaning  to  each  cluster  and  the  corresponding \nindependent  component  and  we  can  evaluate  the  clustering  performance  by \ncomparing the collected minimum p-value for each gene annotation with that from \nother clustering approach. \n\n5  Performance evaluation \n\nWe tested the ICA-based clustering to four expression datasets (D1\u2014D4) described in \nTable 1. \n\n \n\nD1 \n\nARRAY \nTYPE \nSpotted \n\nD2  Oligonucl\neotide \nSpotted \nD3 \nD4  Oligonucl\neotide \n\nTable 1: The four datasets used in our analysis \n\nDESCRIPTION \n\n    # OF  \nGENES (K) \n\n   # OF \nEXPS (N) \n\nBudding yeast during cell cycle and \nCLB2/CLN3 overactive strain [13] \nBudding yeast during cell cycle [14] \n\nC. elegans in various conditions [3] \nNormal human tissue including 19 \n\nkinds of tissues [15] \n\n4579 \n\n6616 \n\n17817 \n7070 \n\n22 \n\n17 \n\n553 \n59 \n\nFor D1 and D4, we compared the biological coherence of ICA components with that \nof PCA applied in the same datasets in [1] and [2], respectively. For D2 and D3, we \ncompared  with  k-means  clustering  and  the  topomap  method,  applied  in  the  same \ndatasets in [4] and [3], respectively. We applied nonlinear ICA to D1, D2 and D4. \nDataset D3 is very large and makes the nonlinear algorithm unstable. \nD1  was  preprocessed  to  contain  log-ratios  xij=log2(Rij/Gij)  between  red  and  green \nintensities.  In  [1],  principal  components,  referred \nto  as  eigenarrays,  were \nhypothesized to be genomic expression programs of distinct biological processes. We \ncompared the biological coherence of independent components with that of principal \ncomponents  found  by  [1].  Comparison  was  done  in  two  ways:  (1)  For  each \ncomponent,  we  grouped  genes  within  top  x%  of  significant  up-regulation  and \ndown-regulation  (as  measured  by  the  load  of  the  gene  in  the  component)  into  two \nclusters with x adjusted from 5% to 45%. For each value of x, statistical significance \nwas measured for clusters from independent components and compared with that from \n\n\f \n\nprincipal  components  based  on  the  minimum  p-value  for  each  gene  annotation,  as \ndescribed in Section 4. We  made a scatter plot to compare  the negative log of the \ncollected best p-values for each gene annotation when x is fixed to be 15%, shown in \nFigure 1  (a)  (2)  Same  as  before,  except we  did  not  fix  the value  of x;  instead, we \ncollected  the  minimum  p-value  from  each  method  for  each  GO  and  KEGG  gene \nannotation  category  and  compared  the  collected  p-values  (Figure  1  (b)).  For  both \ncases, in the majority of the gene annotation categories ICA produced significantly \nlower p-values than PCA did, especially for gene annotation for which both ICA and \nPCA showed high significance. \n \n\n \nFigure 1. Comparison of linear ICA (NMLE) to PCA on dataset D1 (a) when x is fixed \nto be 15%; (b) when x is not fixed. (c) Three independent components of dataset D4. \n\nEach gene is mapped to a point based on the value assigned to the gene in three \n\nindependent components, which are enriched with liver- (red), Muscle- (orange) and \n\nvulva-specific (green) genes, respectively. \n\nindependent  components  were  enriched  for \n\n \nThe expression levels of genes in D4 were normalized across the 59 experiments, and \nthe logarithms of the resulting values were taken. Experiments 57, 58, and 59 were \nremoved because they made the expression matrix nearly singular. In [2], a clustering \napproach based on PCA and subsequent visual inspection was applied to an earlier \nversion of this dataset, containing 50 of the 59 samples. After we performed ICA, the \nmost  significant \nliver-specific, \nmuscle-specific  and  vulva-specific  genes  with p-value  of  10-133,  10-124  and  100-117, \nrespectively. In the ICA liver cluster, 198 genes were liver specific (out of a total of \n244), as compared with the 23 liver-specific genes identified in [2] using PCA. The \nICA muscle cluster of 235 genes contains 199 muscle specific genes compared to 19 \nmuscle-specific genes identified in [2].  We generated a 3-dimensional scatter plot of \nthe  load  expression  levels  of  all  genes  annotated  in  [15]  on  these  significant  ICA \ncomponents in Figure 1 (c). We can see that the liver-specific, muscle-specific and \nvulva-specific genes are strongly biased to lie on the x-, y-, and z- axis, respectively. \nWe applied nonlinear ICA on this dataset and the first four most significant clusters \nfrom nonlinear ICA with Gaussian RBF kernel were muscle-specific, liver-specific, \nvulva-specific  and  brain-specific  with  p-value  of  10-158,  10-127,  10-112  and  10-70, \nrespectively, showing considerable improvement over the linear ICA clusters.  \nFor D2, variance-normalization was applied to the 3000 most variant genes as in [4]. \nThe  17th  experiment,  which  made  the  expression  matrix  close  to  singular,  was \nremoved. We measured the statistical significance of clusters as described in Section \n4 and compared the smallest p-value of each gene annotation from our approach to \nthat from k-means clustering applied to the same dataset [4]. We made a scatter plot \n\n\f \n\nfor comparing the negative log of the smallest p-value (y-axis) from ICA clusters with \nthat from k-means clustering (x-axis). The coefficient c is varied from 1.0 to 2.0 and \nthe  superiority  of  ICA-based  clustering  to  k-means  clustering  does  not  change.  In \nmany practical settings, estimation of the best c is not needed; we can adjust c to get a \ndesired size of the cluster unless our focus is to blindly find the size of clusters. Figure \n2  (a)  (b)  (c)  shows  for  c=1.25  a  comparison  of  the  performance  of  linear  ICA \n(NMLE),  nonlinear  ICA  with  Gaussian  RBF  kernel  (NICA  gauss),  and  k-means \nclustering (k-means).  \nFor D3, first we removed experiments that contained more than 7000 missing values, \nbecause  ICA  does  not  perform  properly  when  the  dataset  contains  many  missing \nvalues. The 250 remaining experiments were used, containing expression levels for \n17817  genes  preprocessed  to  be  log-ratios  xij=log2(Rij/Gij)  between  red  and  green \nintensities. We compared the biological coherence of clusters by our approach with \nthat of topomap-based approach applied to the same dataset in [3]. The result when \nc=1.25 is plotted in the Figure 2 (d). We observe that the two methods perform very \nsimilarly, with most categories having roughly the same p-value in ICA and in the \ntopomap  clusters.  The  topomap  clustering  approach  performs  slightly  better  in  a \nlarger fraction of the categories. Still, we consider this performance a confirmation \nthat ICA is a widely applicable method that requires minimal training: in this case the \nmissing values and high diversity of the data make clustering especially challenging, \nwhile the topomap approach was specifically designed and manually trained for this \ndataset as described in [3]. \nFinally, we compared different ICA algorithms in terms of clustering performance. \nWe tested six linear ICA methods: Natural Gradient Maximum Likelihood Estimation \n(NMLE) [7][8], Joint Approximate Diagonalization of Eigenmatrices [16], Fast Fixed \nPoint  ICA  with  three  different  measures  of  non-Gaussianity  [17],  and  Extended \nInformation Maximization (Infomax) [18]. We also tested two kernels for nonlinear \nICA: Gaussian RBF kernel, and polynomial kernel (NICA ploy). For each dataset, we \ncompared the biological coherence of clusters generated by each method. Among the \nsix linear ICA algorithms, NMLE was the best in all datasets. Among both linear and \nnonlinear  methods,  the  Gaussian  kernel  nonlinear  ICA  method  was  the  best  in \nDatasets D1, D2 and D4, the polynomial kernel nonlinear ICA method was best in \nDataset D4, and NMLE was best in the large datasets (D3 and D4). In Figure 3, we \ncompare the NMLE method with three other ICA methods for the dataset D2. Overall, \nthe NMLE algorithm consistently performed well in all datasets. The nonlinear ICA \nalgorithms performed best in the small datasets, but were unstable in the two largest \ndatasets.  More \ncomparison \nthe  website \nwww.stanford.edu/~silee/ICA/. \n\ndemonstrated \n\nin \n\nresults \n\nare \n\n \n\n \n\n\f \n\n \n\nFigure 2: Comparison of (a) linear ICA (NMLE) with k-means clustering, (b) \n\nnonlinear ICA with Gaussian RBF kernel to linear ICA (NMLE), and (c) nonlinear \n\nICA with Gaussian RBF kernel to k-means clustering on the dataset D2. (d) \n\nComparison of linear ICA (NMLE) to topomap-based approach on the dataset D3. \n\n \n\n \nFigure 3: Comparison of linear ICA (NMLE) to (a) Extended Infomax ICA algorithm, \n\n(b) Fast ICA with symmetric orthogonalization and tanh nonlinearity and (c) \n\nNonlinear ICA with polynomial kernel of degree 2 on the Dataset (B). \n\n6  Discussion \n\nICA is a powerful statistical method for separating mixed independent signals. We \nproposed  applying  ICA  to  decompose  microarray  data  into  independent  gene \nexpression  patterns  of  underlying  biological  processes,  and  to  group  genes  into \nclusters  that  are  mutually  non-exclusive  with  statistically  significant  functional \ncoherence. Our clustering method outperformed several leading methods on a variety \nof  datasets,  with  the  added  advantage  that  it  requires  setting  only  one  parameter, \nnamely the fraction c of standard deviations beyond which a gene is considered to be \nassociated with a component\u2019s cluster. We observed that performance was not very \nsensitive  to  that  parameter,  suggesting  that  ICA  is  robust  enough  to  be  used  for \nclustering with little human intervention.  \nThe empirical performance of ICA in our tests supports the hypothesis that statistical \nindependence is a good criterion for separating mixed biological signals in microarray \ndata.  The  Extended  Infomax  ICA  algorithm  proposed  in  [18]  can  automatically \ndetermine  whether  the  distribution  of  each  source  signal  is  super-Gaussian  or \nsub-Gaussian.  Interestingly,  the  application  of  Extended  Infomax  ICA  to  all  the \n\n\f \n\nexpression  datasets  uncovered  no  source  signal  with  sub-Gaussian  distribution.  A \nlikely  explanation  is  that  global  gene  expression  profiles  are  mixtures  of \nsuper-Gaussian sources rather than of sub-Gaussian sources. This finding is consistent \nwith  the  following  intuition:  underlying  biological  processes  are  super-Gaussian, \nbecause they affect sharply the relevant genes, typically a small fraction of all genes, \nand leave the majority of genes relatively unaffected. \n\nAcknowledgments \nWe thank Te-Won Lee for helpful feedback. We thank Relly Brandman, Chuong Do, \nand Yueyi Liu for edits to the manuscript. \n\nReferences \n[1] Alter O, Brown PO, Botstein D. Proc. Natl. Acad. Sci. USA 97(18):10101-10106, 2000. \n[2] Misra J, Schmitt W, et al. Genome Research 12:1112-1120, 2002. \n[3] Kim SK, Lund J, et al. Science 293:2087-2092, 2001. \n[4] Tavazoie S, Hughes JD, et al. Nature Genetics 22(3):281-285, 1999. \n[5] Hori G, Inoue M, et al. Proc. 3rd Int. Workshop on Independent Component Analysis and \n\nBlind Signal Separation, Helsinki, Finland, pp. 151-155, 2000. \n\n[6] Liebermeister W. Bioinformatics 18(1):51-60, 2002. \n[7] Bell AJ. and Sejnowski TJ. Neural Computation, 7:1129-1159, 1995. \n[8] Amari S, Cichocki A, et al. In Advances in Neural Information Processing Systems 8, pp. \n\n757-763.  Cambridge, MA: MIT Press, 1996. \n\n[9] Harmeling S, Ziehe A, et al. In Advances in Neural Information Processing Systems 8, pp. \n\n757-763.  Cambridge, MA: MIT Press, .  \n\n[10] Troyanskaya O., Cantor M, et al. Bioinformatics 17:520-525, 2001. \n[11] The Gene Ontology Consortium. Genome Research 11:1425-1433, 2001. \n[12]  Kanehisa  M.,  Goto  S.  In  Current  Topics  in  Computational  Molecular  Biology,  pp. \n\n301\u2013315. MIT-Press, Cambridge, MA, 2002. \n\n[13] Spellman PT, Sherlock G, et al. Mol. Biol. Cell 9:3273-3297, 1998. \n[14] Cho RJ, Campell MJ, et al. Molecular Cell 2:65-73, 1998.  \n[15] Hsiao L, Dangond F, et al. Physiol. Genomics 7:97-104, 2001. \n[16] Cardoso JF, Neural Computation 11(1):157-192, 1999. \n[17] Hyvarinen A. IEEE Transactions on Neural Network 10(3):626\u2013634, 1999. \n[18] Lee TW, Girolami M, et al. Neural Computation 11:417\u2013441, 1999. \n\n\f", "award": [], "sourceid": 2396, "authors": [{"given_name": "Su-in", "family_name": "Lee", "institution": null}, {"given_name": "Serafim", "family_name": "Batzoglou", "institution": null}]}