{"title": "Illumination-Invariant Face Recognition with a Contrast Sensitive Silicon Retina", "book": "Advances in Neural Information Processing Systems", "page_first": 769, "page_last": 776, "abstract": "", "full_text": "Illumination-Invariant Face Recognition with a \n\nContrast Sensitive Silicon Retina \n\nJoachim M. Buhmann \n\nRheinische Friedrich-Wilhelms-U niversitiit \nInstitut fUr Informatik II, RomerstraBe 164 \n\n0-53117 Bonn, Germany \n\nMartin Lades \n\nRuhr-Universitiit Bochum \nInstitut fiir Neuroinformatik \n0-44780 Bochum, Germany \n\nFrank Eeckman \n\nLawrence Livermore National Laboratory \n\nISCR, P.D.Box 808, L-426 \n\nLivermore, CA 94551 \n\nAbstract \n\nChanges in lighting conditions strongly effect the performance and reli(cid:173)\nability of computer vision systems. We report face recognition results \nunder drastically changing lighting conditions for a computer vision sys(cid:173)\ntem which concurrently uses a contrast sensitive silicon retina and a con(cid:173)\nventional, gain controlled CCO camera. For both input devices the face \nrecognition system employs an elastic matching algorithm with wavelet \nbased features to classify unknown faces. To assess the effect of analog \non -chip preprocessing by the silicon retina the CCO images have been \n\"digitally preprocessed\" with a bandpass filter to adjust the power spec(cid:173)\ntrum. The silicon retina with its ability to adjust sensitivity increases \nthe recognition rate up to 50 percent. These comparative experiments \ndemonstrate that preprocessing with an analog VLSI silicon retina gen(cid:173)\nerates image data enriched with object-constant features. \n\n1 Introdnction \n\nNeural computation as an information processing paradigm promises to enhance artificial \npattern recognition systems with the learning capabilities of the cerebral cortex and with the \n\n769 \n\n\f770 \n\nBuhmann, Lades, and Eeckman \n\nadaptivity of biological sensors. Rebuilding sensory organs in silicon seems to be particu(cid:173)\nlarly promising since their neurophysiology and neuroanatomy, including the connections \nto cortex, are known in great detail. This knowledge might serve as a blueprint for the design \nof artificial sensors which mimic biological perception. Analog VLSI retinas and cochleas, \nas designed by Carver Mead (Mead, 1989; Mahowald, Mead, 1991) and his collaborators \nin a seminal research program, will ultimately be integrated in vision and communication \nsystems for autonomous robots and other intelligent information processing systems. \n\nThe study reported here explores the influence of analog retinal preprocessing on the \nrecognition performance of a face recognition system. Face recognition is a challenging \nclassification task where object inherent distortions, like facial expressions and perspective \nchanges, have to be separated from other image variations like changing lighting conditions. \nPreprocessing with a silicon retina is expected to yield an increased recognition rate since the \nfirst layers of the retina adjust their local contrast sensitivity and thereby achieve invariance \nto variations in lighting conditions. \n\nOur face recognizer is equipped with a silicon retina as an adaptive camera. For comparison \npurposes all images are registered simultaneously by a conventional CCD camera with \nautomatic gain control. Galleries with images of 109 different test persons each are taken \nunder three different lighting conditions and two different viewing directions (see Fig. 1). \nThese different galleries provide separate statistics to measure the sensitivity of the system \nto variations in light levels or contrast and image changes due to perspective distortions. \n\nNaturally, the performance of an object recognition system depends critically on the classifi(cid:173)\ncation strategy pursued to identify unknown objects in an image with the models stored in a \ndatabase. The matching algorithm selected to measure the performance enhancing effect of \nretinal preprocessing deforms prototype faces in an elastic fashion (Buhmann et aI., 1989; \nBuhmann et al., 1990; Lades et al., 1993). Elastic matching has been shown to perform \nwell on the face classification task recognizing up to 80 different faces reliably (Lades et al., \n1993) and in a translation, size and rotation invariant fashion (Buhmann et aI., 1990). The \nface recognition algorithm was initially suggested as a simplified version of the Dynamic \nLink A rchitecture (von der Malsburg, 1981), an innovative neural classification strategy with \nfast changes in the neural connectivity during recognition stage. Our recognition results \nand conclusions are expected to be qualitatively typical for a whole range of face/object \nrecognition systems (Turk, Pentland, 1991; Yuille, 1991; Brunelli, Poggio, 1993), since any \nimage preprocessing with emphasis on object constant features facilitates the search for the \ncorrect prototype. \n\n2 The Silicon Retina \n\nThe silicon retina used in the recognition experiments models the interactions between \nreceptors and horizontal cells taking place in the outer plexiform layer of the vertebrate \nretina. All cells and their interconnections are explicitly represented in the chip so that \nthe following description simultaneously refers to both biological wetware and silicon \nhardware. Receptors and horizontal cells are electrically coupled to their neighbors. The \nweak electrical coupling between the receptors smoothes the image and reduces the in(cid:173)\nfluence of voltage offsets between adjacent receptors. The horizontal cells have a strong \nlateral electrical coupling and compute a local background average. There are reciprocal \nexcitatory-inhibitory synapses between the receptors and the horizontal cells. The horizon(cid:173)\ntal cells use shunting inhibition to adjust the membrane conductance of the receptors and \n\n\fIllumination-Invariant Face Recognition with a Contrast Sensitive Silicon Retina \n\n771 \n\nthereby adjust their sensitivity locally. This feedback interaction produces an antagonistic \ncenter/surround organization of receptive fields at the output The center is represented \nby the weakly coupled excitatory receptors and the surround by the more strongly coupled \ninhibitory horizontal cells. The center/surround organization removes the average intensity \nand expands the dynamic range without response compression. Furthennore, it enhances \nedges. \n\nIn contrast to this architecture, a conventional CCD camera can be viewed as a very primitive \nretina with only one layer of non-interacting detectors. There is no DC background removal, \ncausing potential over- and underexposure in parts of the image which reduces the useful \ndynamic range. A mechanical iris has to be provided to adjust the mean luminance level \nto the appropriate setting. Since cameras are designed for faithful image registration rather \nthan vision, on-chip pixel processing, if provided at all, is used to improve the camera \nresolution and signal-to-noise ratio. \n\nThree adjustable parameters allow us to fine tune the retina chip for an object recognition \nexperiment: (i) the diffusivity of the cones (ii) the diffusivity ofthe horizontal cells (iii) the \nleak in the horizontal cell membrane. Changes in the diffusivities affect the shape of the \nreceptive fields, e.g., a large diffusivity between cones smoothes out edges and produces a \nblurred image. The other extreme of large diffusivity between horizontal cells pronounces \nedges and enhances the contrast gain . The retina chip has a resolution of 90 x 92 pixels, \nit was designed by (Boahen, Andreou, 1992) and fabricated in 2flm n-well technology by \nMOSIS. \n\n3 Elastic Matching Algorithm for Face Recognition \n\nElastic matching is a pattern classification strategy which explicitly accounts for local \ndistortions. A prototype template is elastically deformed to measure local deviations from a \nnew, unknown pattern. The amount of deformation and the similarity oflocal image features \nprovide us with a decision criterion for pattern classification. The rubbersheet-like behavior \nof the prototype transformation makes elastic matching a particularly attractive method for \nface recognition where ubiquitous local distortions are caused for example by perspective \nchanges and different facial expressions. Originally, the technique was developed for \nhandwritten character recognition (Burr, 1981). The version of elastic matching employed \nfor our face recognition experiments is based on attributed graph matching. A detailed \ndescription with a plausible interpretation in neural networks terms is published in (Lades \net al., (993). Each prototype face is encoded as a planar graph with feature vectors attached \nto the vertices of the graph and metric information attached to the edges. The feature vectors \nextract local image information at pixel Xi in a multiscale fashion, i.e., they are functions \nof wavelet coefficients. Each feature vector establishes a correspondence between a vertex \ni of a prototype graph and a pixel Xi in the image. The components of a feature vector are \ndefined as the magnitudes of the convolution of an image with a set of two-dimensional, \nDC free Gaussian kernels centered at pixel Xi. The kernels with the form \n\n1/!'k (X) = (72 exp \n\nfl \n\n(flx2) [ ( - ) \n\n- 2(72 \n\nexp ikX - exp (-(72/2) \n\n(I) \n\n1 \n\nare parameterized by the wave vector k defining their orientations and their sizes. To \nconstruct a self-similar set of filter functions we select eight different orientations and five \n\n\f772 \n\nBuhmann, Lades, and Eeckman \n\ndifferent scales according to \n\nk(v, tt) = ~ Tv/2 (cos( itt), sin( itt)) \n\n(2) \n\nwith v E {O, ... ,4};tt E {O, ... , 7}. The multi-resolution data format represents local \ndistortions in a robust way, i.e., only feature vectors in the vicinity x of an image distortion \nare altered by the changes. The edge labels encode metric information, in particular we \nchoose the difference vectors AXij == Xi - Xj as edge labels. \nTo generate a new prototype graph for the database, the center of a new face is determined \nby matching a generic face template to it. A 7 x 10 rectangular grid with 10 pixel spacing \nbetween vertices and edges between adjacent vertices is then centered at that point. The \nsaliency of image points is taken into account by deforming that generic grid so that each \nvertex is moved to the nearest pixel with a local maximum in feature vector length. \n\nThe classification of an unknown face as one of the models in the database or its rejection \nas an unclassified object is achieved by computing matching costs and distortion costs. \nThe matching costs are designed to maximize the similarity between feature vector J;M of \nvertex i in the model graph (M) and feature vector Jl (Xi) associated with pixel Xi in the \nnew image (I). The cosine of the angle between both feature vectors \n\nS(JI(x) jM) = \n\n'\" \n\n-[...., \nJ (Xi) . Ji \n\n-M \n\nIlf1(Xi)IIIIJ;M II \n\n(3) \n\nis suited as a similarity function for elastic matching since global contrast changes in images \nonly scale feature vectors but do not rotate them. Besides maximizing the similarity between \nfeature vectors the elastic matching algorithm penalizes large distortions. The distortion \ncost term is weighted by a factor ,\\ which can be interpreted as a prior for expected \ndistortions. The combined matching cost function which is used in the face recognition \nsystem compromises between feature similarity and distortion, i.e, it minimizes the cost \nfunction \n\n(4) \n\nfor the model M in the face database with respectto the correspondence points {xf}. (i, j) \nin Eq. (4) denotes that index j runs over the neighborhood of vertex i and index i runs \nover all vertices. By minimizing Eq. (4) the algorithm assigns pixel x; in the new image \nI to vertex i in the prototype graph M. Numerous classification experiments revealed that \na steepest descent algorithm is sufficient to minimize cost function (4) although it is non(cid:173)\nconvex and local minima may cause non-optimal correspondences with reduced recognition \nrates. \n\nDuring a recognition experiment all prototype graphs in the database are matched to the \nnew image. A new face is classified as prototype A if H A is minimal and if the significance \ncriterion \n\n(5) \n\nis fulfilled. The average costs (Ji) and their standard deviation LH are calculated excluding \nmatch A. This heuristic is based on the assumption that a new face image strongly \n\n\fIllumination-Invariant Face Recognition with a Contrast Sensitive Silicon Retina \n\nn3 \n\nl> gr.tl rr.m.1 \nl>gr.rr~ \nl>~\"~ \nl>~.tr . . llo'\" \n~1Ib.ka-:2to,.. \n> \n\nWorkstation \n\nDatacube \n\n~ .... ~ \n\nFigure I: Laboratory setup of the face recognition experiments. \n\ncorrelates with the correct prototype but the matching costs to all the other prototype faces \nis approximately Gaussian distributed with mean (1l) and standard deviation I.H. The \nthreshold parameter 0 is used to limit the rate of false positive matches, i.e., to exclude \nsignificant matches to wrong prototypes. \n\n4 Face Recognition Results \n\nTo measure the recognition rate of the face recognition system using a silicon retina or a \nCCD camera as input devices, pictures of 109 different persons are taken under 3 different \nlighting conditions and 2 different viewing directions. This setup allows us to quantify the \ninfluence of changes in lighting conditions on the recognition performance separate from \nthe influence of perspective distortions. Figure 2 shows face images of one person taken \nunder two different lighting setups. The images in Figs. 2a,c with both lights on are used \nas the prototype images for the respective input devices. To test the influence of changing \nlighting conditions the left light is switched off. The faces are now strongly illuminated \nfrom the right side. The CCD camera images (Figs. 2a,b) document the drastic changes of \nthe light settings. The corresponding responses of the silicon retina shown in Figs. 2c,d \nclearly demonstrate that the local adaptivity of the silicon retina enables the recognition \nsystem to extract object structure from the bright and the dark side of the face. For control \npurposes all recognition experiments have been repeated with filtered CCD camera images. \nThe filter was adjusted such that the power spectra of the retina chip images and the filtered \nCCD images are identical. The images (e,f) are filtered versions of the images (a,b). It \nis evident that information in the dark part of image (b) has been erased due to saturation \neffects of the CCD camera and cannot be recovered by any local filtering procedure. \n\nWe first measure the performance of the silicon retina under uniform lighting conditions, \n\n\fb \n\n.... \n\n- ,. '\\. \n~ ... '. \n.. \nC .. ~ \n.... \n... \n.~ -\n. \n..... \n\n\u2022 . \n.. \n\u2022 \n1 \n\u2022 \n\n\u2022 \n... \n\n~ \n\n~ \nIt \n\n~ \n\nFigure 2: (a) Conventional CCD camera images (a,b) and silicon retina image (c,d) under \ndifferent lighting conditions. The images (e,O are filtered CCD camera images with a \npower spectrum adjusted to the images in (c,d). The images (a,c) are used to generate the \n\n\fIllumination-Invariant Face Recognition with a Contrast Sensitive Silicon Retina \n\n775 \n\nTable 1: (a) Face recognition results in a well illuminated environment and (b) in an \nenvironment with drastic changes in lighting conditions. \n\na \n\nb \n\nf. p. rate \n100% \n100/0 \n50/0 \n10/0 \n1000/0 \n10% \n50/0 \n10/0 \n\nsilicon retina cony. CCD \n\nfilt. CCD \n\n83.5 \n81.7 \n76.2 \n71.6 \n96.3 \n96.3 \n96.3 \n93.6 \n\n86.2 \n83.5 \n82.6 \n79.8 \n80.7 \n76.2 \n72.5 \n64.2 \n\n85.3 \n84.4 \n80.7 \n75.2 \n78.0 \n75.2 \n72.5 \n62.4 \n\ni.e., both lamps are on and the person looks 20-30 degrees to the right. The recognition \nsystem has to deal with perspective distortions only. A gallery of 109 faces is matched \nto a face database of the same 109 persons. Table la shows that the recognition rate \nreaches values between 80 and 90 percent if we accept the best match without checking \nits significance. Such a decision criterion is unpractical for many applications since it \ncorresponds to a false positive rate (f. p. rate) of 100 percent. If we increase the threshold E> \nto limit false positive matches to less than 1 percent the face recognizer is able to identify \nthree out of four unknown faces. Filtering the CCD imagery does not hurt the recognition \nperformance as the third column in Table 1a demonstrates. All necessary information for \nrecognition is preserved in the filtered CCD images. \n\nThe situation changes dramatically when we switch off the lamp on the left side of the \ntest person. We compare a test gallery of persons looking straight ahead, but illuminated \nonly from the right side, to our model gallery. Table 1 b summarizes the recognition results \nfor different false positive rates. The advantage of using a silicon retina are 20 to 45 \npercent higher recognition rates than for a system with a CCD camera. For a false positive \nrate below one percent a silicon retina based recognition system identifies two third more \npersons than a conventional system. Filtering does not improve the recognition rate of a \nsystem that uses a CCD camera as can be seen in the third column. \n\nOur comparative face recognition experiment clearly demonstrates that a face recognizer \nwith a retina chip is performing substantially better than conventional CCD camera based \nsystems in environments with uncontrolled, substantially changing lighting conditions. \nRetina-like preprocessing yields increased recognition rates and increased significance \nlevels. We expect even larger discrepancies in recognition rates if object without a bilateral \nsymmetry have to be classified. In this sense the face recognition task does not optimally \nexplore the potential of adaptive preprocessing by a silicon retina. Imagine an object \nrecognition task where the most significant features for discrimination are hardly visible \nor highly ambiguous due to poor illumination. High error rates and very low significance \nlevels are an inevitable consequence of such lighting conditions. \n\nThe limited resolution and poor signal-to-noise ratio of silicon retina chips are expected to \nbe improved by a new generation of chips fabricated in 0.7 /lm CMOS technology with a \n\n\f776 \n\nBuhmann, Lades, and Eeckman \n\npotential resolution of256 x 256 pixels. Lighting conditions as simulated in ourrecognition \nexperiment are ubiquitous in natural environments. Autonomous robots and vehicles or \nsurveillance systems are expected to benefit from the silicon retina technology by gaining \nrobustness and reliability. Silicon retinas and more elaborate analog VLSI chips for low \nlevel vision are expected to be an important component of an Adaptive Vision System. \n\nAcknowledgement: It is a pleasure to thank K. A. Boahen for providing us with the \nretina chips. We acknowledge stimulating discussions with C. von der Malsburg and C. \nMead. This work was supported by the German Ministry of Science and Technology \n(lTR-8800-H 1) and by the Lawrence Livermore National Laboratory (W-7405-Eng-48). \n\nReferences \n\nBoahen, K., Andreou, A. 1992. A Contrast Sensitive Silicon Retina with Reciprocal \n\nSynapses. Pages 764-772 of: NIPS91 Proceedings. IEEE. \n\nBrunelli, R., Poggio, T. (1993). Face Recognition: Features versus Templates. IEEE Trans. \n\non Pattern Analysis Machine Intelligence, 15, 1042-1052. \n\nBuhmann, J., Lange, J., von der Malsburg, C. 1989. Distortion Invariant Object Recognition \nby Matching Hierarchically Labeled Graphs. Pages I 155-159 of' Proc. llCNN, \nWashington. IEEE. \n\nBuhmann, J., Lades, M., von der Malsburg, C. 1990. Size and Distortion Invariant Object \nRecognition by Hierarchical Graph Matching. Pages II 411-416 of' Proc. llCNN, \nSanDiego. IEEE. \n\nBurr, D. J. (1981). Elastic Matching of Line Drawings. IEEE Trans. on Pat. An. Mach. \n\nIntel., 3, 708-713. \n\nLades, M., Vorbriiggen, J.C., Buhmann, J., Lange, J., von der Malsburg, C., Wurtz, R.P., \nKonen, W. (1993). Distortion Invariant Object Recognition in the Dynamic Link \nArchitecture. IEEE Transactions on Computers, 42, 300-311. \n\nMahowald, M., Mead, C. (1991). The Silicon Retina. Scientific American, 264(5), 76. \nMead, C. (1989). Analog VLSI and Neural Systems. New York: Addison Wesley. \nTurk, M., Pentland, A. (1991). Eigenfaces for Recognition. J. Cog. Sci., 3, 71-86. \nvon der Malsburg, Christoph. 1981. The Correlation Theory of Brain Function. Internal \n\nReport. Max-Planck-Institut, Biophys. Chern., Gottingen, Germany. \n\nYuille, A. (1991). Deformable Templates for Face Recognition. J. Cog. Sci., 3, 60-70. \n\n\f", "award": [], "sourceid": 876, "authors": [{"given_name": "Joachim", "family_name": "Buhmann", "institution": null}, {"given_name": "Martin", "family_name": "Lades", "institution": null}, {"given_name": "Frank", "family_name": "Eeckman", "institution": null}]}