{"title": "Subspace-Based Face Recognition in Analog VLSI", "book": "Advances in Neural Information Processing Systems", "page_first": 225, "page_last": 232, "abstract": null, "full_text": "Subspace-Based Face Recognition in Analog\n\nVLSI\n\nGonzalo Carvajal, Waldo Valenzuela and Miguel Figueroa\nDepartment of Electrical Engineering, Universidad de Concepci\u00f3n\n\nCasilla 160-C, Correo 3, Concepci\u00f3n, Chile\n\n{gcarvaja, waldovalenzuela, miguel.\ufb01gueroa}@udec.cl\n\nAbstract\n\nWe describe an analog-VLSI neural network for face recognition based on\nsubspace methods. The system uses a dimensionality-reduction network\nwhose coe\ufb03cients can be either programmed or learned on-chip to per-\nform PCA, or programmed to perform LDA. A second network with user-\nprogrammed coe\ufb03cients performs classi\ufb01cation with Manhattan distances.\nThe system uses on-chip compensation techniques to reduce the e\ufb00ects of\ndevice mismatch. Using the ORL database with 12x12-pixel images, our\ncircuit achieves up to 85% classi\ufb01cation performance (98% of an equivalent\nsoftware implementation).\n\n1\n\nIntroduction\n\nSubspace-based techniques for face recognition, such as Eigenfaces [1] and Fisherfaces [2],\ntake advantage of the large redundancy present in most images to compute a lower-\ndimensional representation of their input data and stored patterns, and perform classi\ufb01ca-\ntion in the reduced subspace. Doing so substantially lowers the storage and computational\nrequirements of the face-recognition task.\nHowever, most techniques for dimensionality reduction require a high computational\nthroughput to transform images from the large input data space to the feature subspace.\nTherefore, software [3] even dedicated digital hardware implementations [4, 5] are too large\nand power-hungry to be used in highly portable systems. Analog VLSI circuits can com-\npute using orders of magnitude less power and die area than their digital counterparts,\nbut their performance is limited by signal o\ufb00sets, parameter mismatch, charge leakage and\nnonlinear behavior, particularly in large-scale systems. Traditional circuit-design techniques\ncan reduce these e\ufb00ects, but they increase power and area, rendering analog solutions less\nattractive.\nIn this paper, we present a neural network for face recognition which implements Principal\nComponents Analysis (PCA) and Linear Discriminant Analysis (LDA) for dimensionality\nreduction, and Manhattan distances and a loser-take-all (LTA) circuit for classi\ufb01cation.\nWe can download the network weights in a chip-in-the loop con\ufb01guration, or use on-chip\nlearning to compute PCA coe\ufb03cients. We use local adaptation to achieve good classi\ufb01cation\nperformance in the presence of device mismatch. The circuit die area is 2.2mm2 in a 0.35\u00b5m\nCMOS process, with an estimated power dissipation of 18mW. Using PCA reduction and\na hard classi\ufb01er, our network achieves up to 83% accuracy on the Olivetti Research Labs\n(ORL) face database [6] using 12x12-pixel images, which corresponds to 99% of the accuracy\nof a software implementation of the algorithm. Using LDA projections and a software Radial\nBasis Function (RBF) network on the hardware-computed distances yields 85% accuracy\n(98% of the software performance).\n\n1\n\n\f2 Eigenspace based face recognition methods\n\nThe problem of face recognition consists of assigning an identity to an unknown face by\ncomparing it to a database of labeled faces. However, the dimensionality of the input\nimages is usually so high that performing the classi\ufb01cation on the original data becomes\nprohibitively expensive.\nFortunately, human faces exhibit relatively regular statistics; therefore, their intrinsic di-\nmensionality is much lower than that of their images. Subspace methods transform the\ninput images to reduce their dimensionality, and perform the classi\ufb01cation task on this\nlower-dimensional feature space. In particular, the Eigenfaces [1] method performs dimen-\nsionality reduction using PCA, and classi\ufb01cation by choosing the stored face with the lowest\ndistance to the input data.\nPrincipal Components Analysis uses a linear transformation from the input space to the\nfeature space, which preserves most of the information (in the mean-square error sense)\npresent in the original vector. Consider a column vector x of dimension n, formed by\nthe concatenated columns of the input image. Let the matrix Xn\u00d7N = {x1, x2, . . . , xN}\nrepresent a set of N images, such as the image database available for a face recognition\ntask. PCA computes a new matrix Ym\u00d7N, with m < n:\n\nY = W*TX\n\n(1)\nThe columns of Y are the lower-dimensional projections of the original images in the feature\nspace. The columns of the orthogonal transformation matrix W\u2217 are the eigenvectors\nassociated to the m largest eigenvalues of the covariance matrix of the original image space.\nUpon presentation of a new face image, the Eigenfaces method \ufb01rst transforms this image\ninto the feature space using the transformation matrix W\u2217, and then computes the distance\nbetween the reduced image and each image class in the reference database. The image is\nclassi\ufb01ed with the identity of the closest reference pattern.\nFisherfaces [2] performs dimensionality reduction using Linear Discriminant Analysis (LDA).\nLDA takes advantage of labeled data to maximize the distance between classes in the pro-\njected subspace. Considering Xc , c = 1, . . . , Nc as subsets of X containing Ni images of\nthe same subject, LDA de\ufb01nes two matrices:\n\nSW =\n\n(xk \u2212 mi)(xk \u2212 mi)T , with mi =\n\nSB =\n\nNi(mi \u2212 m)(mi \u2212 m)T\n\ncX\n\nX\n\ni=1\n\nxk\u2208Xc\n\ncX\n\ni=1\n\nNiX\n\nk=1\n\n1\nNi\n\nxk\n\n(2)\n\n(3)\n\nwhere SW represents the scatter (variance) within classes, and SB is the scatter between\ndi\ufb00erent classes. To perform the dimensionality reduction of Eqn. (1), LDA constructs W\u2217\nsuch that its columns are the m largest eigenvectors of S\u22121\nW SB. This requires SW to be non-\nsingular, which is often not the case; therefore, LDA frequently uses a PCA preprocessing\nstage [2].\nFisherfaces can perform classi\ufb01cation using a hard classi\ufb01er on the computed distances\nbetween the test data and stored patterns in the LDA subspace, as in Eigenfaces, or it\ncan use a Radial Basis Function (RBF) network. RBF uses a hidden layer of neurons with\nGaussian activation functions to detect clusters in the projected subspace.\nTraditionally, the subspace method use Euclidian distances. However, our experiments show\nthat, as long as the dimensionality reduction preserves enough distance between classes,\nless computationally expensive distance metrics such as Manhattan distance are equally\ne\ufb00ective for classi\ufb01cation. The Manhattan distance between two vectors x = [x1 . . . xn]\nand y = [y1 . . . yn] is given by:\n\nd =\n\n|xi \u2212 yi|\n\n(4)\n\nnX\n\ni=1\n\n2\n\n\f(a) Architecture\n\n(b) Projection network\n\n(c) Distance computation\n\nFigure 1: Face-recognition hardware. (a) Architecture. A dimensionality-reduction network\nprojects a n-dimensional image onto m dimensions, and loser-take-all (LTA) circuit labels\nthe image by choosing the nearest stored face in the reduced space. (b) The dimensionality\nreduction network is an array of linear combiners with weights that have been pre-computed\nor learned on chip. (c) The distance circuit computes the Manhattan distance between the\nm projections of the test image and the stored face database. In our current implementation,\nn = 144, m = 39, and k = 40.\n\n3 Hardware Implementation\n\nFig. 1(a) shows the architecture of our face-recognition network. It follows the signal \ufb02ow\ndescribed in Section 2, where the n-dimensional test image x is \ufb01rst projected onto the m-\ndimensional feature space (test data y) using an array of m n-input analog linear combiners,\nshown in Fig. 1(b). The constant input c is a bias used to compensate for the o\ufb00set intro-\nduced by the analog multipliers. The network also stores the m projections of the database\nface set (the training set) in an array of analog memories. A distance computation block,\nshown in Fig. 1(c), computes the Manhattan distance between each labeled element in the\nstored training set and the reduced test data y. A loser-take-all (LTA) circuit, currently\nimplemented in software, selects the smallest distance and labels the test image with the\nselected class.\nThe linear combiners are based on the synapse shown in Fig. 2(a). An analog Gilbert mul-\ntiplier computes the product of each pixel of the input image, represented as a di\ufb00erential\nvoltage, and the local synaptic weight. An accurate transformation requires a multiplier\nresponse that is linear in the pixel value, therefore we designed the multipliers to maximize\nthe linearity of that input. Device mismatch introduces o\ufb00sets and gain variance across dif-\nferent multipliers in the network; we describe the calibration techniques used to compensate\nfor these e\ufb00ects in Section 4. The multipliers provide a di\ufb00erential current output, therefore\nwe can add them across a single neuron by connecting them to common wires.\nEach synaptic weight is stored in an analog nonvolatile memory cell [7] based on \ufb02oating-\ngate transistors, shown also in Fig. 2(a). The cell features linear weight-updates based on\ndigital pulses applied to the terminals inc and dec. Using local calibration, also based on\n\ufb02oating gates, we independently tune each synapse to achieve symmetric updates in the\n\n3\n\nLTAdatabaseinput image xntest data ymdistances1kface ID2...dimensionalityreduction+y1ymW1,1Wn,1W1,mWn,mb1b2x1cxn+............__abs()abs()abs()y1y2ymf1,if2,ifn,idist idatabase_+......\f(a) Hardware synapse\n\n(b) Distance circuit\n\nFigure 2: (a) The synapse is comprised by a Gilbert multiplier and a nonvolatile analog\nmemory cell with local calibration. The output currents are summed across each neuron.\n(b) Each component of the Manhattan distance is computed as the subtraction of the\ncorresponding principal components and an optional inversion based on the sign of the\nresult. The output currents are summed across all components.\n\npresence of device mismatch, and to make the update rates uniform across the entire chip.\nAs a result, the resolution of the memory cell exceeds 12 bits in a 0.35\u00b5m CMOS process.\nFig. 2(b) depicts the circuit used to compute the Manhattan distance between the test data\nand the stored patterns. Each projection of the training set is stored as a current in an analog\nmemory cell, simpler and smaller than the cell used in the dimensionality reduction network,\nand written using a self-limiting write process. The di\ufb00erence between each projection of\nthe pattern and the test input is computed by inverting the polarity of one of the signals\nand adding the currents. To compute the absolute value, a current comparator based on a\nsimple transconductance ampli\ufb01er determines the sign of the result and uses a 2\u00d72 crossbar\nswitch to invert the polarity of the outputs if needed.\nAs stated in Section 5, our current implementation considers 12\u00d712-pixel images (n = 144\nin Fig. 1). We compute 39 projections using PCA and LDA, and perform the classi\ufb01cation\nusing 40 Manhattan-distance units on the 39-dimensional projections. The next section\nanalyzes the e\ufb00ects of device mismatch on the dimensionality-reduction network.\n\n4 Analog implementation of dimensionality reduction networks\n\nThe arithmetic distortions introduced by the nonlinear transfer function of the analog mul-\ntipliers, coupled with the e\ufb00ects of device mismatch (o\ufb00sets and gains), a\ufb00ect the accuracy\nof the operations performed by the reduction network and become the limiting factor in\nthe classi\ufb01cation performance. In order to achieve good performance, we must calibrate the\nnetwork to compensate for the e\ufb00ect of these limitations.\nIn this section, we analyze and design solutions for two di\ufb00erent cases. First, we consider the\ncase when a computer performs PCA or LDA to determine W\u2217 o\ufb00-line, and downloads the\nweights onto the chip. Second, we analyze the performance of adaptive on-chip computation\nof PCA using a Hebbian-learning algorithm. In both cases, we design mechanisms that use\nlocal on-chip adaptation to compensate for the o\ufb00sets and gain variances introduced by\ndevice mismatch, thus improving classi\ufb01cation performance. In the following analysis we\nassume that the inputs have zero mean and have been normalized. Also, for simplicity, we\nassume that the inputs and weights are operating within the linear range of the multipliers.\nWe remove these assumptions when presenting experimental results. Thus, our analysis uses\na simpli\ufb01ed model of the analog multipliers given by:\n\no = (axx + \u03b3x)(aww + \u03b3w)\n\n(5)\nwhere o is the multiplier output, x and w are the inputs, \u03b3x and \u03b3w represent the input\no\ufb00sets, and ax and aw are the multiplier gains associated with each input. These parameters\nvary across di\ufb00erent multipliers due to device mismatch and are unknown at design time,\nand di\ufb03cult to determine even after circuit fabrication.\n\n4\n\nGilbertMultiplierFG Memory CellweightVw+Vw_I-I+inputVx+Vx_sumincdecFG Mem CelldatabaseelementselectCurrentComp.CrossbarswitchIf+If-Iy+Iy-From PCAIabs-Iabs+sumincdec\f4.1 Dimensionality reduction with precomputed weights\n\nLet us consider an analog linear combiner such as the one depicted in Fig. 1(b), which\ncomputes the \ufb01rst projection y of x, using the \ufb01rst column w\u2217 of the software precomputed\noptimal transformation W\u2217 of Eqn. (1). Using the simpli\ufb01ed multiplier linear model of\nEqn. (5), the linear combiner computes the \ufb01rst projection as:\n\ny = xT(AxAww\u2217 + Ax\u03b3w) + \u03b3T\n\nx (Aww\u2217 + \u03b3w)\n. . . awn]), \u03b3x = [\u03b3x1\n\n. . . axn]), Aw = diag([aw1\n\n(6)\n. . . \u03b3xn]T, and\nwhere Ax = diag([ax1\n\u03b3w = [\u03b3w1 . . . \u03b3wn]T represent the gains and o\ufb00sets of each multiplier. Eqn. (6) shows that\ndevice mismatch has two e\ufb00ects on the output: the \ufb01rst term modi\ufb01es the e\ufb00ective weight\nvalue of the network, and the second term represents an o\ufb00set added to the output (w\u2217 is\na constant).\nReplacing w\u2217 with an adaptive version wk, the structure becomes a classic adaptive linear\ncombiner which, using the optimal weights to generate a reference output signal, can be\ntrained using the well known Least-Mean Squares (LMS) algorithm. Adding a bias synapse\nb with constant input c and training the network with LMS, the weights converge to [7]:\n\nx (Aww\u2217 + \u03b3w) + c\u03b3b)(cab)\u22121\n\nw\u2217 = (AxAw)\u22121(w\u2217 \u2212 Ax\u03b3w)\nb\u2217 = \u2212(\u03b3T\n\n(7)\n(8)\nwhere ab and \u03b3b are the gain and o\ufb00set of the analog multiplier associated to the bias input\nc. These weight values fully compensate for the e\ufb00ects of gain mismatch and o\ufb00sets.\nIn our hardware implementation, we use m adaptive linear combiners to compute every\nprojection in the feature space, and calibrate these circuits using on-chip LMS local adapta-\ntion to compute and store the optimal weight values of Eqns. (7) and (8), achieving a good\napproximation of the optimal output Y. Fig. 3(a) shows our analog-VLSI implementation\nof LMS. We train the weight values in the memory cells by providing inputs and a reference\noutput to each linear combiner, and use an on-chip pulse-based compact implementation\nof the LMS learning rule. In order to improve the convergence of the algorithm, we draw\nthe inputs from a zero-mean random Gaussian distribution. Thus, the performance of the\ndimensionality reduction network is ultimately limited by the resolution of the memory\ncells, the reference noise, the learning rate of the LMS training stage and linearity of the\nmultipliers. This last e\ufb00ect can be controlled by restricting the dynamic range of the input\nto linear range of the multipliers.\nTo measure the accuracy of our implementation, we computed (in software) the \ufb01rst 10\nprincipal components of one half the Olivetti Research Labs (ORL) face database, reduced\nto 12x12 pixels, and used our on-chip implementation of LMS to train the hardware network\nto learn the coe\ufb03cients. We then measured the output of the circuit on the other half of\nthe database. Fig. 3(b) plots the RMS value of the error between the circuit output and the\nsoftware results, normalized to the RMS value of each principal component. The \ufb01gure also\nshows the error when we wrote the coe\ufb03cients onto the circuit in open-loop, without using\nLMS. In this case, o\ufb00set and gain mismatch completely obscure the information present\nin the signal. LMS training compensates for these e\ufb00ects, and reduces the error energy to\nbetween 0.25% and 1% of the energy of the signal. A di\ufb00erent experiment (not shown)\ncomputing LDA coe\ufb03cients yields equivalent results.\n\n4.2 On-chip PCA computation\n\nIn some cases, such as when the face-recognition network is integrated with a camera on a\nsingle chip, it may be necessary to train the face database on-chip. It is not practical for the\nchip to include the hardware resources to compute the optimal weights from the eigenvalue\nanalysis of the training set\u2019s covariance matrix, therefore we compute them on chip using\nthe standard Generalized Hebbian Algorithm (GHA). The computation of the \ufb01rst principal\ncomponent and the learning rule to update the weights at time k are:\n\nyk = xT\n\nk wk\n\n\u2206wk = \u00b5yk(xk \u2212 x0\nx0\n\nk = ykwk\n\nk)\n\n(9)\n(10)\n(11)\n\n5\n\n\f(a) LMS computation\n\n(b) Output error of PCA network\n\nFigure 3: Training the PCA network with LMS. (a) Block diagram of our LMS implementa-\ntion. We present random inputs to each linear combiner, and provide a reference output. A\npulse-based implementation of the LMS learning rule updates the memory cells. (b) RMS\nvalue of the error for the \ufb01rst 10 principal components, normalized to the RMS value of\neach PC.\n\nwhere \u00b5 is the learning rate of the algorithm and x0\nk is the reconstruction of the input\nxk from the \ufb01rst principal component. The distortion introduced to the output by gain\nmismatch and o\ufb00sets in Eqn. (9) is identical to Eqn. (6). Similarly to LMS, it is easy\nto show that a bias input c connected to a synapse b with an anti-Hebbian learning rule\n\u2206bk = \u00b5bcyk removes the constant o\ufb00set added to the output. Therefore, we can eliminate\nthe second term of Eqn. (6) and express the output as:\n\nyk = xT\n\nk (AxAwwk + Ax\u03b3w) = xT\n\nk wk\n\n(12)\n\nUsing analog multipliers to compute x0\n\nk, we obtain:\n\nx0\n\nk = yk(AyA0\n\nwwk + Ay\u03b30\n\nw) + \u03b3y(A0\n\nw, \u03b3y, and \u03b30\n\n(13)\nwhere Ay, A0\nw are the gains and o\ufb00sets associated with the multipliers used\nto compute ykwk. Replacing Eqns. (12) and (13) in Eqn. (10), we determine the e\ufb00ective\nlearning rule modi\ufb01ed by device mismatch:\n\u2206wk = \u00b5yk(x \u2212 yk(AyA0\n\nw)) = \u00b5yk(x \u2212 ykw0\nk)\n\nwwk + Ay\u03b30\n\nwwk + \u03b30\nw)\n\n(14)\nk, then Ax = Ay, Aw = A0\nw,\n\nIf we use the same analog multipliers to compute yk and x0\nand \u03b3w = \u03b30\n\nw, and the learning rule becomes:\n\n\u2206wk = \u00b5yk(x \u2212 ykwk)\n\nk. The analysis extends naturally to the higher-order principal components.\n\n(15)\nwhere yk and wk are the modi\ufb01ed weight and output de\ufb01ned in Eqn. (12). Eqn. (15) is\nequivalent to the original learning rule in Eqn. (10), but with a new weight vector modi\ufb01ed\nby device mismatch.\nA convergence analysis for Eqn. (15) is complicated, but by analogy to LMS we can show that\nthe weights indeed converge to the same values given in Eqns. (7) and (8), which compensate\nfor the e\ufb00ects of gain mismatch and o\ufb00set. Simulation results verify this assumption. Note\nthat this will only be the case if we use the same hardware multipliers to compute yk and\nx0\nFig. 4(a) shows our implementation of the GHA learning rule. The multiplexer shares the\nanalog multipliers between the computation of yk and x0\nk, and is controlled by a digital signal\nthat alternates its value during the computation and adaptation phases of the algorithm.\nUnlike LMS, GHA trains the algorithm using the images from the training set. Fig. 4(b)\nshows the normalized RMS value of the output error for the \ufb01rst 10 principal components.\nComparing it to Fig. 3(b), the error is signi\ufb01cantly higher than LMS, moving between 4%\nand 35% of the enery of the output. This higher error is due in part to the nonlinear\nmultiply in the computation of x0\nk, and because there is a strong dependency between\nthe learning rates used to update the bias synapse and the other weights in the network.\nHowever, as Section 5 shows, this error does not translate into a large degradation in the\nface classi\ufb01cation performance.\n\n6\n\nLMSlearning rulewito adderincdecXfrom outputyxi++yref=xTw*++_noise1234567891010\u2212410\u22122100102104Principal ComponentNorm. RMS error (log scale) No circuit calibrationOn\u2212chip LMS\f(a) GHA computation\n\n(b) Output error of PCA network\n\nFigure 4: Training the PCA network with GHA. (a) We reuse the multiplier to compute\nx0\nk and use a pulse-based implementation of the GHA rule. (b) RMS value of the error for\nthe \ufb01rst 10 principal components, normalized to the RMS value of each PC.\n\n5 Classi\ufb01cation Results\n\nWe designed and fabricated arithmetic circuits for the building blocks described in the\nprevious sections using a 0.35\u00b5m CMOS process, including analog memory cells, multipliers,\nand weight-update rules for LMS and GHA. We characterized these circuits in the lab and\nbuilt a software emulator that allows us to test the static performance of di\ufb00erent network\ncon\ufb01gurations with less than 0.5% error. We simulated the LTA circuit in software. Using\nthe emulator, we tested the performance of the face-recognition network on the Olivetti\nResearch Labs (ORL) database, consisting on 10 photos of each of 40 total subjects. We\nused 5 random photos of each subject for the training set and 5 for testing. Limitations\nin our circuit emulator forced us to reduce the images to 12 \u00d7 12 pixels. The estimated\npower consumption of the circuit with these 144 inputs and 39 projections is 18mW (540nJ\nper classi\ufb01cation with 30\u00b5s settling time), and the layout area is 2.2mm2. These numbers\nrepresent a 2\u20135x reduction in area and more than 100x reduction in power compated to\nstandard cell-based digital implementations [4, 5].\nFig. 5(a) shows the classi\ufb01cation performance of the network using PCA for dimensionality\nreduction, versus the number of principal components in the subspace. First, we tested\nthe network using PCA for dimensionality reduction. The \ufb01gure shows the performance\nof a software implementation of PCA with Euclidean distances, hardware PCA trained\nwith LMS and software-computed weights, and hardware PCA trained with on-chip GHA.\nBoth hardware implementations use Manhattan distances and a software LTA. The plots\nshow the mean of the classi\ufb01cation accuracy computed for each of the 40 individuals in the\ndatabase. The error bars show one standard deviation above and below the mean. The\nsoftware implementation peaks at 84% classi\ufb01cation accuracy, while the hardware LMS and\nGHA implementations peak at 83% and 79%, respectively. Note that GHA performs only\nslightly worse than LMS, mainly because we compute and store the principal components\nof the training set in the face database using the same PCA network used to reduce the\ndimensionality of the test images, which helps to preserve the distance between classes in\nthe feature space. The standard deviations are similar in all cases. Using an uncalibrated\nnetwork brings the performance below 5%, mainly due to the o\ufb00sets in the multipliers which\nchange the PCA projection and take the signals outside of their nominal operating range.\nFig. 5(a) shows the classi\ufb01cation results using the LDA in the dimensionality reduction\nnetwork. The results are slightly better than PCA, and the error bars show also a lower\nvariance. The performance of the software implementation of LDA and an a hard-classi\ufb01er\nbased on Euclidean distances is 83%. The LMS-trained hardware network with Manhattan\ndistances and a software LTA yields 82%. Replacing the LTA with a software RBF classi\ufb01er,\nthe chip achieves 85% classi\ufb01cation performance, while the software implementation (not\nshown) peaks at 87%. Using 40x40-pixel images and 39 projections, the software LDA\nnetwork with RBF achieves more than 98% classi\ufb01cation accuracy. Therefore, our current\nresults are limited by the resolution of the input images.\n\n7\n\nGHAlearning rulewito adderincdecMUXXfrom outputCompute /updateyxi1234567891010\u2212410\u22122100102104Principal ComponentNorm. RMS error (log scale) No circuit calibrationOn\u2212chip GHA\f(a) Classi\ufb01cation performance for PCA\n\n(b) Classi\ufb01cation performance for LDA\n\nFigure 5: Classi\ufb01cation performance for a 12 \u00d7 12\u2013pixel version of the ORL database ver-\nsus number of projections, using PCA and LDA for dimensionality reduction. Computing\ncoe\ufb03cients o\ufb00-chip and writing them on the chip using LMS yields between 83% and 85%\nclassi\ufb01cation performance for PCA and LDA, respectively. This represents 98%-99% of the\nperformance of a software implementation.\n\n6 Conclusions\n\nWe presented an analog-VLSI network for face-recognition using subspace methods. We\nanalyzed the e\ufb00ects of device mismatch on the performance of the dimensionality-reduction\nnetwork and tested two techniques based on local adaptation which compensate for gain\nmismatch and o\ufb00sets. We showed that using LMS to train the network on precomputed\ncoe\ufb03cients to perform PCA or LDA performs better than using GHA to learn PCA coe\ufb03-\ncients on chip. Ultimately, both techniques perform similarly in the face-classi\ufb01cation task\nwith the ORL database, achieving a classi\ufb01cation performance of 83%-85% (98%-99% of a\nsoftware implementation of the algorithms). Simulation results show that the performance\nis currently limited by the resolution of the input images. We are currently working on the\nintegration LTA and RBF classi\ufb01ers on chip, and on support of higher-dimensional inputs.\n\nAcknowledgments\n\nThis work was funded by the Chilean government through FONDECYT grant No. 1070485.\nThe authors would like to thank Dr. Seth Bridges for his valuable contribution to this work.\n\nReferences\n[1] M. Turk and A. Pentland. Face Recognition Using Eigenfaces. Proc. of IEEE Conf. on Computer\n\nVision and Pattern Recognition, pages 586\u2013591, 1991.\n\n[2] Peter Belhumeur, Joao Hespanha, and David J. Kriegman. Eigenfaces vs. Fisherfaces: Recog-\nnition Using Class Speci\ufb01c Linear Projection\". IEEE Transactions on Pattern Analysis and\nMachine Intelligence, 19(7):711\u2013720, 1997.\n\n[3] A. U. Batur, B. E. Flinchbaugh, and M. H. Hayes IIl. A DSP-Based approach for the implemen-\ntation of face recognition algorithms. In IEEE International Conference on Acoustics, Speech,\nand Signal Processing, 2003. Proceedings. (ICASSP \u201903), volume 2, pages 253\u2013256, 2003.\n\n[4] N. Shams, I. Hosseini, M. Sadri, and E. Azarnasab. Low Cost FPGA-Based Highly Accu-\nrate Face Recognition System Using Combined Wavelets Withs Subspace Methods. In IEEE\nInternational Conference on Image Processing, 2006, pages 2077\u20132080, 2006.\n\n[5] C. S. S. Prasanna, N. Sudha, and V. Kamakoti. A Principal Component Neural Network-Based\nFace Recognition System and Its ASIC Implementation. In VLSI Design, pages 795\u2013798, 2005.\n[6] Ferdinando Samaria and Andy Harter. Parameterisation of a Stochastic Model for Human Face\nIn IEEE Workshop on Applications of Computer Vision, Sarasota (Florida),\n\nIdenti\ufb01cation.\nDecember 1994.\n\n[7] Miguel Figueroa, Esteban Matamala, Gonzalo Carvajal, and Seth Bridges. Adaptive Signal\nIn IEEE Computer Society\n\nProcessing in Mixed-Signal VLSI with Anti-Hebbian Learning.\nAnnual Symposium on VLSI, pages 133\u2013138, Karlsruhe, Germany, 2006. IEEE.\n\n8\n\n51015202530354000.20.40.60.81Number of Principal ComponentesClassification Performance PCA+dist (SW)PCA with LMS+ dist (HW)GHA+dist (HW)51015202530354000.20.40.60.81Number of LDA ProjectionsClassification Performance LDA+dist+LTA (SW)LDA+dist+LTA (HW)LDA+dist (HW)+RBF (SW)\f", "award": [], "sourceid": 805, "authors": [{"given_name": "Gonzalo", "family_name": "Carvajal", "institution": null}, {"given_name": "Waldo", "family_name": "Valenzuela", "institution": null}, {"given_name": "Miguel", "family_name": "Figueroa", "institution": null}]}