{"title": "A Network Mechanism for the Determination of Shape-From-Texture", "book": "Advances in Neural Information Processing Systems", "page_first": 953, "page_last": 960, "abstract": null, "full_text": "A Network Mechanism for the Determination of \n\nShape-From-Texture \n\nKo Sakai and Leif H. Finkel \n\nDepartment of Bioengineering and \nInstitute of Neurological Sciences \n\nUniversity of Pennsylvania \n\n220 South 33rd Street, Philadelphia, PA 19104-6392 \n\nko@ganymede.seas.upenn.edu, leif@ganymede.seas.upenn.edu \n\nAbstract \n\nWe propose a computational model for how the cortex discriminates \nshape and depth from texture. The model consists of four stages: (1) \nextraction of local spatial frequency, (2) frequency characterization, (3) \ndetection of texture compression by normalization, and (4) integration \nof the normalized frequency over space. The model accounts for a \nnumber of psychophysical observations including experiments based on \nnovel random textures. These textures are generated from white noise \nand manipulated in Fourier domain in order to produce specific \nfrequency spectra. Simulations with a range of stimuli, including real \nimages, show qualitative and quantitative agreement with human \nperception. \n\n1 INTRODUCTION \nThere are several physical cues to shape and depth which arise from changes in projection \nas a surface curves away from view, or recedes in perspective. One major cue is the \norderly change in the spatial frequency distribution of texture along the surface. In \nmachine vision approaches, various techniques such as Fourier transformation or wavelet \ndecomposition are used to determine spatial frequency spectra across a surface. The \ndetermination of the transformation relating these spectra is a difficult problem, and \nseveral techniques have been proposed such as an affine transformation (Super and Bovik \n\n953 \n\n\f954 \n\nSakai and Finkel \n\n1992) or a momentum method (Krumm and Shafer 1992). We address the question of \nhow a biological system which has access only to limited spatial frequency infonnation \nand has constrained computational capabilities can nonetheless accurately detennine \nshape and depth from texture. For example, the visual system might avoid the direct \ncomparison of frequency spectra themselves and instead rely on a simpler \ncharacterization of the spectra such as the mean frequency, peak frequency, or the \ngradient of a frequency component (Sakai and Finkel 1993; Turner, Gerstein, Bajcsy \n1991). In order to study what frequency infonnation is actually utilized by humans, we \ncreated novel random texture patterns and carried out psychophysical experiments with \nthese stimuli. These patterns are generated by manipulating the frequency components of \nwhite noise stimuli in the Fourier domain so as to produce stimuli with exactly specified \nfrequency spectra. Based on these experiments, we propose a network mechanism for the \nperception of shape-from-texture which takes into account physiological and anatomical \nconstraints as well as computational considerations. \n\n2 MODEL FOR SHAPE FROM TEXTURE \nThe model consists of four major processes: extraction of the local spatial frequency at \neach orientation, frequency characterization, detennination of texture compression by \nfrequency nonnalization, and the integration of the nonnalized frequency over space. A \nschematic illustration of the model is shown in figure 1. Our psychophysical experiments \nsuggest that the visual system may use spatially averaged peak frequency for \nThe change of surface orientation is \ncharacterizing the frequency distribution. \ndetermined from the locally aligned compression of texture which is detected by \nfrequency normalization followed by lateral inhibition among different orientations. \nDepth is then computed from the integration of the normalized frequency over space. \nThe model is implemented in feed-forward distributed networks and simulated using the \nNEXUS neural network simulator (Sajda, Sakai, Yen and Finkel 1993). \n\n3 MOTIVATION FOR EACH STAGE OF THE MODEL \nThe frequency extraction is carried out by units modeling complex cells in area VI. \nThese units have subunits with On and Off center difference of Gaussian(DOG) masks \ntuned to specific frequencies and orientations. The units take local maximum of the \nsubunits. As in energy-based models (Bergen and Adelson 1989; Malik and Perona \n1990), these units accomplish some major aspects of complex cell functions in the space \ndomain including invariance to the direction of contrast and spatial phase. \nThe second stage of the model extracts spatially averaged peak frequency. \nIn order to \nexamine what frequency infonnation is actually utilized by humans, we created random \ntexture patterns with specific frequency spectra generated by manipulating the frequency \ncomponents of a white noise pattern in Fourier domain. Figure 2 shows a vertical \ncylinder and a tilted perspective plane constructed by this technique from white noise. \nWe are able to see the three dimensional shape of the cylinder in (1). The stimuli were \nconstructed by making each frequency component undergo a step change at some \n\n\fA Network Mechanism for the Determination of Shape-from-Texture \n\n955 \n\nEarly Vision \n\nStage \n\nFrequency \n\nCharacterizatIOn \n\nFrequency \n\nNormalization and \nLateral Inhibition \n\nIntegration \n\nFigure 1. A schematic illustration of the shape-from-texture model consisting of four \nmajor stages. The early vision stage models major spatial properties of complex cells in \norder to decompose local spatial frequency. The second stage characterizes the \nfrequency by the spatially averaged peak frequency. The third stage detects locally \naligned texture compression by normalizing frequency and taking lateral inhibition \namong orientation channels. The last stage determines 3D depth by integrating the \namount of texture compression - which corresponds to the local surface slant. Indices \"f' \nand \"0\" denote frequency and orientation channels, respectively. max, min, ave, and LI \nstand for taking maximum, minimum, average, and lateral inhibition. The vertical bar \nindicates that the function is processed independently within each of denoted channels. \n\n\f956 \n\nSakai and Finkel \n\nposition along the cylinder; higher frequencies undergo the change at positions closer to \nthe cylinder's edges. Since the gradient of each frequency component is always either \nzero or infinity, this suggests that gradients of individual frequency components over \nspace do not serve as a dominant cue for three dimensional shape perception. Similar \nexperiments have been conducted using various stimuli with controlled frequency \nspectra. The results of these experiments suggest that averaged peak frequency is a \nstrong cue for the human perception of three dimensional shape and depth. \n\nThe third stage of the model normalizes local frequencies by the global lowest frequency \non the surface. We assume that the region containing the global lowest frequency is the \nfrontal plane standing vertically with respect to the viewer. One of the justifications for \nthis assumption can be seen in simple artificial images shown in figure 3. In both (l) and \n(2), the bottom region looks vertical to us, and the planes above this region looks slanted, \nalthough the patterns of the center region of (1) and the lower region of (2) are identical. \nFrom a computational point of view, the normalization of frequency corresponds to an \napproximation of the relation between local slant and spatial frequency. Depth, Z, as a \nfunction of X (see figure 4) is given by: \n\nZ(x) = JX tan { cos-I ( Fo ) }dx = J x \n\nxo \n\nF(x) \n\nXo \n\neq.(l) \n\nwhere Fo is the global lowest frequency. Considering a boundary condition, Z(x) = 0, if \nF(x) = Fo, the integrand can be reasonably approximated by (F(x) - Fo) I Fo . The second \nstage of the model actually computes this value, and a later stage carries out the \nintegration. \n\nFigure 2. Random texture patterns generated by manipulating the frequency components \nof white noise in Fourier domain. A horizontal cylinder embedded in white noise (1) , \nand a tilted plane (2). \n\n\fA Network Mechanism for the Determination of Shape-frorn-Texture \n\n957 \n\nThe second half of this stage detects the local alignment of texture compression. This \nlocal alignment is detected by taking the lateral inhibition of normalized frequencies \namong different orientations. Recent psychophysical experiments (Todd and Akerstrom \n1987; Cumming, Johnson, and Parker 1993) show that the compression of texture in a \nsingle orientation is a cue for the perception of shape-from-texture. We can confirm this \nresult from figure 5. Three images on the top of this figure have compression in a single \norientation, but those on the bottom do not. We clearly see smooth three dimensional \nellipsoids from the top images but not from the bottom images. \nThe last stage of the model computes the integral of the nonnalized frequency in order to \nobtain depth. This integration begins from the region with lowest spatial frequency and \nfollows the path of the local steepest descent in spatial frequency . \n\n~---~~.-..~---~ \n\n.-..-. ... ~.-..-... --\n......... ~ ... ~ ... --\n~ ... ~-.~--\n.-..-. ..................... ... \n\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 \n\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 \n\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 \n\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 \n\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 \n\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 \n\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 \n\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 \n\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 \n\n............... --... ~--\n............ -..-. ... ~--\n... ~~ ............ ... \n... ~.-..-..-......... --\n... ......... .-.-...-, ...... \n\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 \n\u2022 \u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 \n\u2022 \u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 \n\u2022 \u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 \n\u2022 \u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 \n\nFigure 3. Objects consist of three planes(left), and two planes(right). In both stimuli, the \nbottom regions look vertical to us, and the planes above this region look slanted, although \nthe patterns of the center region of (1) and the lower region of (2) are identical. \n\nDepth: Z(x) \n\nZ(X-V \n\nXo \n\nx \n\nFigure 4. The coordinate system for the equation (1). Depth, Z, is given as a function of \nposition, X. \n\n\f958 \n\nSakai and Finkel \n\n4 SIMULATIONS \nA quantitative test of the model was carried out by constructing ellipsoids with different \neccentricities and texture patterns shown in figure 5. Results are plotted in figure 6. For \nthe regular ellipsoids, there is a linear relation between real depth and that determined by \nthe model. This linear relation agrees with psychophysical experiments (Todd and \nAkerstrom 1987; Biilthoff 1991) showing similar human performance for such stimuli. \nAll of the irregular texture patterns produced little perception of depth, in agreement with \nhuman performance. \nMany artificial and real images have been tested with the model and show good \nagreement with human perception. For an example, a real image of a part of cantaloupe, \nand its computed depth are shown in figure 7. Real images were obtained with a CCD \ncamera and were input to NEXUS via an Imaging Technology's S151 image processor. \n\net!~, \n\n.. \n:. ...... ~ \n..... \u2022 1 \u2022 \n\u2022\u2022\u2022 \u2022 \u2022\u2022\u2022 \u2022\u2022 \n,.,.. \n.,.. . .. , \n\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022 \n, .... ,. \n, .. .:\". \n_ \u2022\u2022\u2022\u2022\u2022 \u00b7.tr \n\n' .. ... ~ \n.. \n-, ... \" \n... , \n-'1' \\ \n, ... r \n-\n\nJ\" \u2022\u2022\u2022\u2022\u2022 \u2022 ',(cid:173)\nJ./,.. \u2022\u2022\u2022 I,' \n,II \u2022 \u2022 \u2022 \u2022 ,,-\n\",t\", ','-\n\n..!ii.,. .. ,.,_ \n\nI ' I ..... \u00b7 \n, \n\n, \n\n\u2022 \n\n' .. \n... . :'~. \n... . .. ' \n\u2022 \u2022 \n\u2022\u2022 \u2022\u2022 \u2022\u2022 \n. ,\" .' .. ' .. \n.. -... .. \n\u2022\u2022\u2022\u2022\u2022 \u2022\u2022 \n\n\u2022 \u2022 \u2022 \u2022 \u2022 \u00b70 ' \n\n'0 \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 \n\nFigure 5. (Top) Regular ellipsoids with eccentricities of 1,2, and 4. (Bottom) Irregular \ntexture patterns: (left) no compression with regular density change, (middle) randomly \noriented regular compression, (right) pan-orientational regular compression. \n\n\fA Network Mechanism for the Determination of Shape-from-Texture \n\n959 \n\n400 \n\n..c 300 \n..... \n0.. \nQ) \nQ \n-e 200 \n~ \nC';$ \n.-f/.) \n\"'3 \nE 100 \n\n, , \no \no \n\n, , \n\n, , \n\n, \n\n.' \n\n, \n, \n\n\" , \n\n, , \n\n, \n\n-.... Ia- -- --. ~-\n\n, \n\n1 \n\n3 \n\n4 \n2 \nEccentricity \n\n5 \n\n6 \n\nc \n\nregular ellipsoids \n\n\u2022 no compression \n\u2022 randomly oriented \n\u2022 pan-orientational \n\ncompression \n\ncompression \n\nFigure 6. Depth perceived by the model as a function of actual eccentricity. The \nsimulated depth of regular ellipsoids shows a linear relation to the actual depth. Irregular \npatterns produced little depth, in agreement with human perception. \n\nFigure 7. An example of the model's response to a real image. A part of cantaloupe (left), \nand its depth computed by the model(right). \n\n5 CONCLUSIONS \n(1) We propose a biologically-based network model of shape-from-texture based on the \ndetermination of change in spatial frequency. \n\n(2) Preliminary psychophysical evidence suggests that the spatially averaged peak \nfrequency is employed to characterize the spatial frequency distribution rather than using \na frequency spectrum or each component of frequency. \n\n\f960 \n\nSakai and Finkel \n\n(3) This characterization is validated by psychophysical experiments using novel random \ntextures with specified frequency spectra. The patterns are generated from white noise \nand manipulated in Fourier domain in order to realize specific frequency characteristics. \n\n(4) The model has been tested with a number of artificial stimuli and real images taken \nby video camera. Responses show qualitative and quantitative agreements with human \nperception. \n\nAcknowledgments \nThis work is supported by grants from The Office of Naval Research (NOOOI4-90-J-1864, \nNOOOI4-93-1-0681), The Whitaker Foundation, and The McDonnell-Pew Program in \nCognitive Neuroscience. \n\nReferences \nSuper, B.J. and Bovik, A.C. (1992), Shape-from-texture by wavelet-based measurement \nof local spectral moments. Proc. IEEE CVPR 1992, p296-300 \n\nKrumm, J. and Shafer, S.A. (1992), Shape from periodic texture using the spectrogram. \nProc. IEEE CVPR 1992, p284-289 \nSakai, K. and Finkel, L.H. (1994), A cortical mechanism underlying the perception of \nshape-from-texture. In F.Eeckman, et al.(ed.), Computation and Neural Systems 1993 , \nNorwell, MA: Kluwer Academic Publisher [in press] \n\nSajda, P., Sakai, K., Yen, S-c., and Finkel, L.H. (1993), In Skrzypek, J. (ed.), Neural \nNetwork Simulation Environments, Norwell, MA: Kluwer Academic Publisher[in press] \nBergen, J.R. and Adelson, E.H. (1988), Visual texture segmentation and early vision. \nNature, 333, p363-364 \n\nMalik, J. and Perona, P. (1990), Preattentive texture discrimination with early vision \nmechanisms. J. Opt. Soc. Am., A Vol.7, No.5, p923-932 \n\nCumming, B.G., Johnston, E.B., and Parker, A.J. (1993), Effects of different texture cues \non curved surfaces viewed stereoscopically. Vision Res. Vol.33, N05, p827-838 \n\nTodd, J. T. and Akerstrom, R.A. (1987), Perception of three-dimensional form from \npatterns of optical texture. Journal of Experimental Psychology, vol. I 3, No.2, p242-255, \nTurner, M.R., Gerstein, G.L., and Bajcsy, R. (1991), Underestimation of visual texture \nslant by human observers: a model. Bioi. Cybern. 65, p215-226 \n\nBtilthoff, H.H. (1991), Shape from X: Psychophysics and computation. In Landy, M.S., \net al.(ed.) Computational Models of Visual Processing, Cambridge, MA: MIT press, \np305-330 \n\n\f", "award": [], "sourceid": 842, "authors": [{"given_name": "K\u00f4", "family_name": "Sakai", "institution": null}, {"given_name": "Leif", "family_name": "Finkel", "institution": null}]}