{"title": "Self-Organizing and Adaptive Algorithms for Generalized Eigen-Decomposition", "book": "Advances in Neural Information Processing Systems", "page_first": 396, "page_last": 402, "abstract": null, "full_text": "Self-Organizing and Adaptive Algorithms for \n\nGeneralized Eigen-Decomposition \n\nChanchal Chatterjee \nNewport Corporation \n\n1791  Deere Avenue, Irvine, CA 92606 \n\nVwani P. Roychowdhury \nElectrical Engineering Department \nUCLA, Los Angeles, CA 90095 \n\nABSTRACT \n\nThe paper is developed in  two parts where we discuss a new approach \nto self-organization in a single-layer linear feed-forward network.  First, \ntwo novel algorithms for self-organization are derived from  a two-layer \nlinear hetero-associative network performing a one-of-m classification, \nand trained with the constrained least-mean-squared classification error \ncriterion.  Second, two adaptive algorithms are derived from  these self(cid:173)\norganizing  procedures \nthe  principal  generalized \neigenvectors  of  two  correlation  matrices  from  two  sequences  of \nrandom  vectors. These novel  adaptive  algorithms can  be  implemented \nin  a  single-layer  linear  feed-forward  network.  We  give  a  rigorous \nconvergence  analysis  of the  adaptive  algorithms  by  using  stochastic \napproximation theory. As an example, we consider a problem of online \nsignal detection in digital mobile communications. \n\nto  compute \n\n1.  INTRODUCTION \n\nWe  study  the  problems  of  hetero-associative  trammg,  linear  discriminant  analysis, \ngeneralized eigen-decomposition and their theoretical connections.  The paper is  divided \ninto two parts. In the first part, we study the relations between hetero-associative training \nwith  a  linear  feed-forward  network,  and  feature  extraction  by  the  linear  discriminant \nanalysis  (LOA)  criterion.  Here  we  derive  two  novel  algorithms  that  unify  the  two \nproblems.  In  the  second  part,  we  generalize  the  self-organizing  algorithm  for  LOA  to \nobtain adaptive algorithms for  generalized eigen-decomposition, for which we provide a \nrigorous proof of convergence by using stochastic approximation theory. \n\n1.1  HETERO-ASSOCIATION AND LINEAR DISCRIMINANT ANALYSIS \n\nIn  this  discussion,  we  consider  a  special  case  of hetero-association  that  deals  with  the \nclassification problems. Here the inputs belong to a finite m-set of pattern classes, and the \n\n\fSelf-Organizing and Adaptive Generalized Eigen-Decomposition \n\n397 \n\noutputs  indicate  the  classes  to  which  the  inputs  belong.  Usually,  the  ith  standard  basis \nvector ei  is chosen to indicate that a particular input vector x belongs to class i. \nThe  LDA  problem,  on  the  other hand,  aims  at  projecting  a  multi-class  data  in  a  lower \ndimensional  subspace  such  that  it  is  grouped  into  well-separated  clusters  for  the  m \nclasses.  The  method  is  based  upon  a  set  of scatter  matrices  commonly  known  as  the \nmixture  scatter  Sm  and  between  class  scatter  Sb  (Fukunaga,  1990).  These  matrices  are \nused  to  formulate  criteria  such  as  tr(Sm-ISb)  and  det(Sb)1  det(Sm)  which  yield  a  linear \ntransform <1>  that satisfy the generalized eigenvector problem Sb<1>=Sm<1>A,  where A is  the \ngeneralized eigenvalue matrix.  If Sm  is positive definite, we obtain a <1>  such that <1>TSm<1> \n=1  and  <1>TSb<1>=A.  Furthermore,  the  significance  of  each  eigenvector  (for  class \nseparability) is determined by the corresponding generalized eigenvalue. \n\nA  relation  between  hetero-association  and  LDA  was  demonstrated  by  Gallinari  et  al. \n(1991).  Their  work  made  explicit  that  for  a  linear  multi-layer  perceptron  performing  a \none-from-m  classification  that  minimized  the  total  mean  square  error  (MSE)  at  the \nnetwork  output,  also  maximized  a  criterion  det(Sb)/det(Sm)  for  LDA  at the  final  hidden \nlayer.  This  study  was  generalized  by  Webb  and  Lowe  (1990)  by  using  a  nonlinear \ntransform from  the input data to the final  hidden units, and a linear transform  in the final \nlayer.  This  has  been  further  generalized  by  Chatterjee  and  Roychowdhury  (1996)  by \nincluding the Bayes cost for misclassification into the criteria tr(Sm -ISb). \n\nAlthough  the  above  studies  offer  useful  insights  into  the  relations  between  hetero(cid:173)\nassociation  and  LDA,  they  do  not  suggest  an  algorithm  to  extract  the  optimal  LDA \ntransform  <1>.  Since the  criteria for  class  separability are  insensitive to  multiplication  by \nnonsingular  matrices,  the  above  studies  suggest  that  any  training  procedure  that \nminimizes the  MSE at the network output will  yield a nonsingular transformation  of <1>; \ni.e.,  we  obtain  Q<1>  where  Q  is  a  nonsingular  matrix.  Since  Q<1>  does  not  satisfy  the \ngeneralized eigenvector problem Sb<1>=Sm<1>A  for  any arbitrary  nonsingular matrix Q,  we \nneed to determine an algorithm that will yield Q=I. \n\nIn  order to  obtain  the  optimum  linear  transform  <1>,  we  constrain  the  training  of a  two(cid:173)\nlayer  linear  feed-forward  network,  such  that  at  convergence,  the  weights  for  the  first \nlayer  simultaneously  diagonalizes  Sm  and  Sb.  Thus,  the  hetero-associative  network  is \ntrained by minimizing a constrained MSE at the network output. This training procedure \nyields two novel algorithms for LDA. \n\n1.2  LDA AND GENERALIZED EIGEN-DECOMPOSITION \n\nis  a  generalized  eigen-decomposition  problem  for \n\nSince  the  LDA  problem \nsymmetric-definite  case,  the  self-organizing  algorithms  derived  from  the  hetero(cid:173)\nassociative  networks  lead  us  to  construct  adaptive  algorithms  for  generalized  eigen(cid:173)\ndecomposition.  Such  adaptive  algorithms  are  required  in  several  applications  of image \nand  signal  processing.  As  an  example,  we  consider the  problem  of online  interference \ncancellation in digital mobile communications. \n\nthe \n\nSimilar to  the  LDA  problem  Sb<1>=Sm<1>A,  the generalized  eigen-decomposition  problem \nA<1>=B<1>A  involves  the  matrix  pencil  (A ,B),  where  A  and  B  are  assumed  to  be  real, \nsymmetric and positive definite. Although a solution to the problem can be obtained by a \nconventional method, there are several applications in  image and signal processing where \nan  online  solution  of generalized  eigen-decomposition  is  desired.  In  these  real-time \nsituations, the matrices A and B are themselves unknown.  Instead, there are available two \n\n\f398 \n\nC.  Chatterjee and V.  P  Roychowdhury \n\nsequences  of  random  vectors  {xk}  and  {Yk}  with  limk~ooE[x~/J  =A  and  limk~oo \nE[Yky/'I=B,  where  xk  and  Yk  represent  the  online  observations  of the  application.  For \nevery  sample  (x/C>Yk),  we  need  to  obtain  the  current  estimates  <1>k  and  Ak  of <1>  and  A \nrespectively, such that <1>k  and Ak converge strongly to their true values. \n\nThe  conventional  approach  for  evaluating  <1>  and  A  requires  the  computation  of (A,B) \nafter collecting all of the samples, and then the application of a numerical procedure; i.e., \nthe  approach  works  in  a  batch  fashion.  There  are  two  problems  with  this  approach. \nFirstly, the dimension of the  samples may be  large so that even  if all  of the  samples are \navailable,  performing  the  generalized eigen-decomposition  may  take  prohibitively  large \namount of computational time. Secondly, the conventional schemes can not adapt to slow \nor  small  changes  in  the  data.  So  the  approach  is  not  suitable  for  real-time  applications \nwhere the samples come in an online fashion. \n\nAlthough \nthe  adaptive  generalized  eigen-decomposition  algorithms  are  natural \ngeneralizations  of  the  self-organizing  algorithms  for  LDA,  their  derivations  do  not \nconstitute  a proof of convergence.  We,  therefore,  give  a rigorous  proof of convergence \nby  stochastic  approximation  theory,  that  shows  that  the  estimates  obtained  from  our \nadaptive algorithms converge with probability one to the generalized eigenvectors. \n\nIn  summary,  the  study  offers  the  following  contributions:  (1)  we  present  two  novel \nalgorithms  that  unify  the  problems  of  hetero-associative  training  and  LDA  feature \nextraction; and (2) we discuss two single-stage adaptive algorithms for generalized eigen(cid:173)\ndecomposition from  two sequences of random vectors. \n\nIn  our experiments, we consider an example of online interference cancellation in  digital \nmobile communications.  In  this  problem, the signal  from  a desired user at a far distance \nfrom  the receiver is  corrupted by another user very near to the base.  The optimum  linear \ntransform  w for weighting the signal  is  the  first  principal  generalized eigenvector of the \nsignal correlation matrix  with respect to the  interference correlation matrix.  Experiments \nwith our algorithm suggest a rapid convergence within four bits of transmitted signal, and \nprovides a significant advantage over many current methods. \n\n2.  HETERO-ASSOCIATIVE TRAINING AND LDA \n\nWe  consider a two-layer linear network  performing a one-from-m classification.  Let  XE \n9tn be an  input to the network to be classified into one out of m classes ro l'''''rom. If x E ro j \nthen the desired output d=e j  (ith  std.  basis vector).  Without loss  of generality, we assume \nthe inputs to be a zero-mean stationary process with a nonsingular covariance matrix. \n\n2.1  EXTRACTING THE PRINCIPAL LDA COMPONENTS \n\nIn  the  two-layer  linear hetero-associative  network,  let  there  be p  neurons  in  the  hidden \nlayer,  and  m  output units.  The  aim  is  to  develop  an  algorithm  so  that  indi\",idual  weight \nvectors  for \nlayer  converge  to  the  first  p~m  generalized  eigenvectors \ncorresponding to  the p  significant generalized eigenvalues arranged  in  decreasing  order. \nLet WjE9tn (i=I, ... ,n) be the weight vectors for the  input layer,  and VjE9tm (i=I, ... ,m) be \nthe weight vectors for the output layer. \n\nthe  first \n\nThe  neurons  are  trained  sequentially;  i.e.,  the  training  of the jlh  neuron  is  started  only \nafter  the  weight  vector  of the  (j_I)fh  neuron  has  converged.  Assume  that  all  the  j-I \nprevious  neurons  have  already  been  trained  and  their  weights  have  converged  to  the \n\n\fSelf-Organizing and Adaptive Generalized Eigen-Decomposition \n\n399 \n\noptimal weight vectors wi  for  i E (1 J-l]. To  extract the J'h  generalized eigenvector in  the \noutput  of the /h  neuron,  the  updating  model  for  this  neuron  should  be  constructed  by \nsubtracting  the  results  from  all  previously  computed j-I generalized  eigenvectors  from \nthe desired output dj  as below \n\n-\nT \nd j  = d j  - L  v i W i  x. \n\nj-I \n\ni=1 \n\n(1) \n\nThis process is equivalent to the deflation of the desired output. \nThe  scatter  matrices  Sm  and  Sb  can  be  obtained  from  x  and  d  as  Sm=E[xxT] and  Sb= \nMMT,  where  M=E[xd1).  We  need  to  extract the j1h  LOA  transform  Wj  that  satisfies  the \ngeneralized eigenvector equation  SbWj=AlmWj  such that  Aj  is  the J'h  largest  generalized \neigenvalue. The constrained MSE criterion at the network output is \n\nJh,Vj )=,lldj <~:v;wT x-vjWJxr]+ p{wJSmw j  -I). \n\nUsing (2), we obtain the update equation for Wj as \n\nw(J)  = w(J) + {Mv(J) - S  w(J)(w(J)T Mv(J\u00bb)- S  j~1 w(J)v(i)T v(J\u00bb) \n. \nhI \n\nm  k \n\nk \n\nk \n\nk \n\nk \n\nk \n\nk \n\nk \n\nm L.. \n;=1 \n\nDifferentiating  (2)  with  respect  to  vi'  and  equating  it  to  zero,  we  obtain  the  optimum \nvalue ofvj as MTWj. Substituting this Vj  in (3) we obtain \n\nw(J)  = w(J) + {s w(J) - S  w(J)(w(J)T S  w(J\u00bb) - S  j~1 wU)w(i)TS  w(J\u00bb) \n\nk \n\nk \n\nb  k \n\n. \n\nm  k \n\nb  k \n\nk \n\nb  k \n\n(4) \n\nk+1 \n\nk \n\nm L.. \ni=1 \n\nLet Wk be the matrix whose ith  column is w~). Then (4) can be written in matrix form  as \n\nWk+1  = Wk + r{SbWk -SmWkU~W[SbWk p, \n\n(2) \n\n(3) \n\n(5) \n\n(6) \n\nwhere UT[\u00b7]  sets all elements below the  diagonal of its matrix argument to  zero,  thereby \nmaking it upper triangular. \n\n2.2  ANOTHER SELF-ORGANIZING ALGORITHM FOR LDA \n\nIn  the  previous  analysis  for  a  two-layer  linear hetero-associative  network,  we  observed \nthat the optimum value for  V=WTM,  where the  jlh  column of Wand row of V are formed \nby  Wi  and  Vi  respectively.  It is,  therefore,  worthwhile  to  explore  the  gradient  descent \nprocedure on the error function below instead of (2) \n\nJ(W) = E[lld- MTWWTxI12} \n\nBy  differentiating  this  error  function  with  respect  to  W,  and  including  the  deflation \nprocess, we obtain the following update procedure for  W instead of (5) \n\nWk+1  = Wk + ~2SbWk - Sm WkUT[ W[ SbWk ] - SbWkUT[ W[ SmWk]). \n\n(7) \n\n3.  LDA AND GENERALIZED EIGEN-DECOMPOSITION \n\nSince LOA  consists of solving the  generalized eigenvector problem Sb<P=Sm<PA,  we  can \nnaturally  generalize  algorithms  (5)  and  (7)  to  obtain  adaptive  algorithms  for  the \ngeneralized eigen-decomposition  problem A<P=B<PA,  where A  and  B are  assumed  to  be \nsymmetric  and  positive  definite.  Here,  we  do  not  have  the  matrices  A  and  B.  Instead, \n\n\f400 \n\nC.  Chatterjee and V.  P.  Roychowdhury \n\nthere are available two sequences of random  vectors  {xk}  and  {Yk}  with limk~ooE[xp/] \n=A  and limk~~[Yky/]=B, where xk and Yk represent the online observations. \n\nFrom  (5),  we  obtain \ndecomposition \n\nthe  following  adaptive  algorithm  for  generalized  eigen(cid:173)\n\n(8) \n\nHere {17k}  is a sequence of scalar gains, whose properties are described in  Section 4.  The \nsequences  {Ak}  and  {Bk} are  instantaneous  values  of the  matrices A  and B respectively. \nAlthough  the  Ak  and  Bk  values  can  be  obtained  from  xk  and  Yk  as  xp/ and  YkY/ \nrespectively, our algorithm requires that at least one of the {Ak}  or {Bk} sequences have a \ndominated  convergence  property.  Thus,  the  {Ak}  and  {Bk}  sequences  may  be  obtained \nfrom  xp/ and YkY/ from the following algorithms \n\nAk  = Ak_1 +Yk(XkXk -Ak- I )  and  Bk  = Bk- I  +Yk(YkYk -Bk-d, \n\n(9) \n\nwhere Ao and Bo are symmetric, and  {Yk}  is a scalar gain sequence. \nAs  done  before,  we  can  generalize  (7)  to  obtain  the  following  adaptive  algorithm  for \ngeneralized eigen-decomposition from  a sequence of samples {Ak}  and {Bk} \n\nWk+1  = Wk + l7k(2Ak Wk - BkWkUT[ W[ AkWk ] - AkWkUT[ W[ BkWk ]). \n\n(10) \n\nAlthough  algorithms  (8)  and (10) were  derived  from  the  network  MSE  by  the  gradient \ndescent approach, this derivation does not guarantee their convergence.  In order to prove \ntheir  convergence,  we  use  stochastic  approximation  theory.  We  give  the  convergence \nresults only for algorithm (l0). \n\n4.  STOCHASTIC APPROX. CONVG. PROOF FOR ALG. (10) \n\nIn order to prove the con vergence of (10), we use stochastic approximation theory due to \nLjung (1977).  In stochastic approximation theory,  we  study the  asymptotic properties of \n(10) in terms of the ordinary differential equation (ODE) \n\n~ W(t)= 1!!! E[2AkW - BkWUT[ W T AkW]- AkWUT[ W T BkW]], \n\nwhere W(t)  is the continuous time counterpart of Wk with t denoting continuous time.  The \nmethod  of proof requires  the  following  steps:  (1)  establishing  a  set  of conditions  to  be \nimposed on A,  B,  A\",  B\",  and  17\",  (2) finding the stable stationary points of the ODE;  and \n(3) demonstrating that  Wk  visits a compact subset of the domain of attraction of a stable \nstationary point infinitely often. \n\nWe use Theorem  1 of Ljung (1977) for the convergence proof. The following  is a general \nset of assumptions for the convergence proof of (10): \nAssumption (AI).  Each  xk  and Yk  is  bounded with probability one,  and  limk~ooE[xp/] \n= A and limk~ooE[y kY k 1)  =  B, where A and B are positive definite. \nAssumption (A2).  {l7kE9t+}  satisfies  l7kJ..O,  Lk=Ol7k  =OO,Lk=Ol7k  <00  for some r>1  and \nlimk~oo sup(l7i l  -l7i~l) <00. \nAssumption (A3). The p  largest generalized eigenvalues of A  with respect to  B are each \nof unit mUltiplicity. \nLemma 1. Let Al and A2 hold.  Let w*  be a locally asymptotically stable (in  the sense of \nLiapunov) solution to the ordinary differential equation (ODE): \n\n\fSelf-Organizing and Adaptive Generalized Eigen-Decomposition \n\n~ W(t) = 2AW(t) - BW(t)U4W(t/ AW(t)] - AW(t)U4W(t/ BW(t)], \n\n401 \n\n(11) \n\nwith domain of attraction D(W).  Then if there is a  compact subset S of D(W) such that \nWk  E  S infinitely often,  then we have Wk ~ W with probability one as k ~ 00. \nWe  denote  A\\  >  ~ >  ...  >  Ap  ~ ...  ~ An  >  0  as  the  generalized  eigenvalues  of A  with \nrespect to  B,  and 4>;  as the generalized eigenvector corresponding to A;  such that 4>\\, ... ,4>n \nare  orthonormal  with  respect  to  B.  Let  <l>=[4>\\ ... 4>nl  and  A=diag(A\\, ... ,An)  denote  the \nmatrix of generalized eigenvectors and eigenvalues of A with respect to B.  Note that if 4>; \nis a generalized eigenvector, then d;4>;  (ld;l= 1) is also a generalized eigenvector. \n\n\u2022 \n\nIn the next two lemmas, we first prove that all the possible equilibrium points ofthe ODE \n(11)  are  up  to  an  arbitrary  permutation  of the  p  generalized  eigenvectors  of A  with \nrespect to  B corresponding to  the p  largest generalized eigenvalues.  We  next prove that \nall these equilibrium  points of the  ODE  (11) are unstable equilibrium  points,  except for \n[d\\4>\\  ...  dn4>nl, where Id;I=1  for i=I, ... ,p. \nLemma 2. For the ordinary differential equation (11),  let Al and A3 hold  Then  W=<l>DP \nare  equilibrium points  of (11),  where  D=[D\\IOV is  a  nXp  matrix  with DI  being a pXp \ndiagonal  matrix  with  diagonal  elements  d;  such  that  Id;l= 1  or  d;=O,  and P  is  a  nXn \narbitrary permutation matrix. \n\n\u2022 \n\nLemma  3.  Let  Al  and A3  hold  Then  W=<l>D  (where  D=[D\\101~  D\\ =diag(d\\, ... ,dp )' \nId;I=I)  are stable equilibrium points of the ODE (11).  In addition,  W=<l>DP  (d;=O for  i~p \nor P~J) are unstable equilibrium points of the ODE (11) . \n\n\u2022 \n\nLemma  4.  For  the  ordinary  differential  equation  (11),  let Al  and A3  hold  Then  the \npoints  W=<l>D \ni=I, ... ,p)  are \nasymptotically stable. \n\n(where  D=[D\\101~  D\\ =diag(d\\, ... ,dp )' \n\n\u2022 \n\nId;I=1 \n\nfor \n\nLemma 5. Let AI-A3 hold  Then there exists a  uniform upper boundfor 17k  such that  Wk \nis uniformly bounded w.p . I. \n\n\u2022 \n\nThe convergence of alg. (10) can now be established by referring to Theorem  1 of Ljung. \nTheorem  1.  Let A I-A3  hold  Assume that  with probability  one the process  {Wk}  visits \ninfinitely often a  compact subset of the domain of attraction of one of the asymptotically \nstable points <l>D.  Then with probability one \n\nlim  Wk  = <l>D. \n\nk~OCl \n\nProof. By Lemma 2, <l>D  (ld;I=I)  are asymptotically stable points of the ODE (11).  Since \nwe assume that  {Wk }  visits a compact subset of the domain of attraction of <l>D  infmitely \noften, Lemma 1 then implies the theorem. \n\n\u2022 \n\n5.  EXPERIMENT AL RESULTS \n\nWe  describe  the  performance  of algorithms  (8)  and  (10)  with  an  example  of online \ninterference cancellation in a high-dimensional signal, in a digital mobile communication \nproblem. The problem occurs when the desired user transmits a signal from a far distance \nto  the  receiver,  while  another user  simultaneously  transmits  very  near to  the  base.  For \ncommon receivers, the quality of the received signal  from  the  desired  user is  dominated \nby  interference from  the user close to the base. Due to the high rate and large dimension \nof the data, the system demands an accurate detection method for just a few data samples. \n\n\f402 \n\nC.  Chatterjee and V.  P.  Roychowdhury \n\nIf we  use  conventional  (numerical  analysis)  methods,  signal  detection  will  require  a \nsignificant part of the time slot allotted to a receiver,  accordingly reducing the  effective \ncommunication rate.  Adaptive generalized eigen-decomposition algorithms,  on the other \nhand, allow the tracking of slow changes, and directly performs signal detection. \n\nThe details of the data model can be found  in  Zoltowski et al.  (1996).  In this application, \nthe  duration  for  each  transmitted  code  is  127  IlS,  within which  we  have  lOllS  of signal \nand 1171ls of interference. We take  10  frequency  samples equi-spaced between -O.4MHz \nto  +O.4MHz.  Using  6  antennas,  the  signal  and  interference  correlation  matrices  are  of \ndimension 60X60 in the complex domain. \n\nWe  use  both  algorithms  (8)  and  (10)  for  the cancellation  of the  interference.  Figure  1 \nshows  the  convergence  of the  principal  generalized  eigenvector  and  eigenvalue.  The \nclosed  form  solution  is  obtained  after  collecting  all  of the  signal  and  interference \nsamples.  In  order to  measure  the  accuracy  of the  algorithms,  we compute the  direction \ncosine of the estimated principal generalized eigenvector and the generalized eigenvector \ncomputed  by  the  conventional  method.  The  optimum  value  is  one.  We  also  show  the \nestimated  principal  generalized  eigenvalue  in  Figure  1 b.  The  results  show  that  both \nalgorithms converge after the 4th bit of signal. \n\n-\n\nAlgonthm (1 0) \n\n-\n\n-\n\nAlgonlhm (8) \n\n1.1r--.----T---~--__r~r__-,--......., \n1.0  \u2022\u2022 \u2022 \u2022 \u2022\u2022\u2022.... .\u2022 \u2022\u2022\u2022.........\u2022\u2022\u2022\u2022.\u2022..\u2022\u2022.\u2022.. \u2022 \u2022...... \n\nCLOSl!D FORM SOUlTlON \n\n.-- - - - -\n\nlll \n\n~ 08  ~  / - -\n.... \n:; 07 \n\n... _09.rf\n~~: /' \n\nI \nI \n\n;:::  04 \n~ 03 \nQ  OJ \n\nIii \n~ \n\n-\n\nAlgonthm (10) \n\n-\n\n-\n\nAlgonlhm (8) \n\n35 ...----.---r--r----r-....-,-..,--.......,--...----.---, \n\n25 \n\n~20 \n~ 15 \n13 \n\n.... \n\n10 \n\nI \n\nIii \n~ \nI \n\nI \n\nIii \nI:: \nIII \n!ii  ~ \n\n0.1 \n\nO.OD\u00b7'------=5DD:-----lI~m----:-:I5DD'::-' \n\n\u00b0D\u00b7'------5DD~---~I~----IJ5DD---~DOO \n\nNUMBER OF SAMPLES \n\n(a) \n\nNUMBER OF  SAMPLES \n\n(b) \n\nFigure  1.  (a) Direction Cosine of Estimated First Principal Generalized Eigenvector,  and \n(b) Estimated First Principal Generalized Eigenvalue. \n\nReferences \n\nC.Chatterjee  and  V.Roychowdhury  (1996),  \"Statistical  Risk  Analysis  for  Classification  and \nFeature  Extraction  by  Multilayer  Perceptrons\",  Proceedings  IEEE  Int 'l  Conference  on  Neural \nNetworks,  Washington D.C. \nK.Fukunaga  (1990),  Introduction  to  Statistical  Pattern  Recognition,  2nd  Edition,  New  York: \nAcademic Press. \nP.Gallinari,  S.Thiria,  F.Badran,  F.Fogelman-Soulie  (1991),  \"On \nDiscriminant Analysis and Multilayer Perceptrons\", Neural Networks,  Vol.  4, pp. 349-360. \nL.Ljung (1977), \"Analysis of Recursive  Stochastic Algorithms\", IEEE Transactions  on Automatic \nControl, Vol.  AC-22, No. 4, pp.  551-575. \nA.R.Webb  and  D.Lowe  (1990),  \"The  Optimised  Internal  Representation  of Multilayer  Classifier \nNetworks Perfonns Nonlinear Discriminant Analysis\", Neural Networks, Vol.  3,  pp. 367-375. \nM.D.Zoltowski,  C.Chatterjee,  V.Roychowdhury and  J.Ramos (1996), \"Blind  Adaptive 2D RAKE \nReceiver  for  CDMA  Based  on  Space-Time MVDR  Processing\",  submitted  to  IEEE  Transactions \non Signal Processing. \n\nthe  Relations  Between \n\n\f", "award": [], "sourceid": 1210, "authors": [{"given_name": "Chanchal", "family_name": "Chatterjee", "institution": null}, {"given_name": "Vwani", "family_name": "Roychowdhury", "institution": null}]}