{"title": "Spatiotemporal Coupling and Scaling of Natural Images and Human Visual Sensitivities", "book": "Advances in Neural Information Processing Systems", "page_first": 859, "page_last": 865, "abstract": null, "full_text": "Spatiotemporal Coupling and Scaling of \n\nNatural Images and Human Visual \n\nSensitivities \n\nDawei W. Dong \n\nCalifornia Institute of Technology \n\nMail Code 139-74 \n\nPasadena, CA 91125 \n\ndawei@hope.caltech.edu \n\nAbstract \n\nWe study the spatiotemporal correlation in natural time-varying \nimages and explore the hypothesis that the visual system is con(cid:173)\ncerned with the optimal coding of visual representation through \nspatiotemporal decorrelation of the input signal. Based on the \nmeasured spatiotemporal power spectrum, the transform needed to \ndecorrelate input signal is derived analytically and then compared \nwith the actual processing observed in psychophysical experiments. \n\n1 \n\nIntroduction \n\nThe visual system is concerned with the perception of objects in a dynamic world. \nA significant fact about natural time-varying images is that they do not change ran(cid:173)\ndomly over space-time; instead image intensities at different times and/or spatial \npositions are highly correlated. We measured the spatiotemporal correlation func(cid:173)\ntion - equivalently the power spectrum - of natural images and we find that it is \nnon-separable, i.e., coupled in space and time, and exhibits a very interesting scaling \nbehaviour. When expressed as a function of an appropriately scaled frequency vari(cid:173)\nable, the spatiotemporal power spectrum is given by a simple power-law. We point \nout that the same kind of spatiotemporal coupling and scaling exists in human vi(cid:173)\nsual sensitivity measured in psychophysical experiments. This poses the intriguing \nquestion of whether there is a quantitative relationship between the power spectrum \nof natural images and visual sensitivity. We answer this question by showing that \nthe latter can be predicted from measurements of the power spectrum. \n\n\f860 \n\nD. W. Dong \n\n2 Spatiotemporal Coupling and Scaling \n\nInterest in properties of time-varying images dates back to the early days of develop(cid:173)\nment of the television [1]. But systematic studies have not been possible previously \nprimarily due to technical obstacles, and our knowledge of the regularities of time(cid:173)\nvarying images has so far been very limited. \n\n. ~ \n\",,'- ... } \n, \n\n~ , \n\nr \n<. )~ \n't \n--=,.1 \n\nFigure 1: Natural time-varying images are highly correlated in space and time. Shown on \nthe top are two frames of a motion scene separated by thirty three milliseconds. These \ntwo frames are highly repetitive, in fact the light intensities of most corresponding pixels \nare similar. Shown on the bottom are light increase (on the left) and light decrease (on \nthe right) between the above two snapshots indicated by greyscale of pixels (white means \nno change). One can immediately see that only a small portion of the image changes \nsignificantly over this time scale. Our methods have been described previously [3J. To \nsummerize, more than one thousand segments of videos on 8mm video tape (NTSC format \nRGB) are digitized to 8 bits greyscale using a Silicon Graphics Video board with default \nfactory settings. Two types of segments are analyzed. The first are segments from movies \non video tapes (e.g. \"Raiders of the Lost Ark\", \"Uncommon Valor\"). The second type of \nsegments that we analyzed are videos made by the authors. The scene of the moving egret \nshown here is taken at Central Park in New York City. \n\nWe have systematically measured the two point correlation matrix or covariance \nmatrix of lOoxlOox2s (horizontalxverticalxtemporal digitized to 64x64x64) seg(cid:173)\nments of natural time-varying images by averaging over 1049 movie segments. An \nexample of two consecutive frames from a typical segment is given in Figure 1. The \nFourier transform of the correlation matrix, or the power spectrum, turns out to \nbe a non-separable function of spatial and temporal frequencies and exhibits an \ninteresting scaling behaviour. From our measurements (see Figure 2) we find \n\nR(j,w) = R(jw) \n\nwhere 1w is a scaled frequency which is simply the spatial frequency 1 scaled by \nG(wl1), a function of the ratio of temporal and spatial frequencies, i.e., 1w = \nG(wl1)1. This behaviour is revealed most clearly by plotting the power spectrum \nas a function of 1 for fixed wi 1 ratio: the curves for different wi 1 ratios are just a \nhorizontal shift from each other. \n\n\fSpatiotemporal Coupling/Scaling of Natural Images & VISual Sensitivity \n\n861 \n\nA \n\n10- 1 W = 0.9 Hz \n\n10- 2 \n\n~ \n.....:; \n~ \n\n10- 3 \n\n10- 4 \n\nw/I = 7\u00b0/s \n\nw/I=2.3\u00b0/s \n\nB \n\n10- 1 \n\n10-3 \n\n10- 4 \n\n1 \n\n1 \n\nratios of temporal and spatial frequencies -\n\n0.1 \nSpatial Frequency I (cycle/degree) \n\n0.1 \nSpatial Frequency I (cycle/degree) \nFigure 2: Spatiotemporal power spectra of natural time-varying images. (A) plotted as a \nfunction of spatial frequency for three temporal frequencies (0.9, 3, 10) Hz; (:8) plotted for \n(0.8, 2.3, 7) degree/second. \nthree velocities -\nThere are some important conclusions that can be drawn from this measurement. First, \nit is obvious that the power spectrum cannot be separated into pure spatial and pure \ntemporal parts; space and time are coupled in a non-trivial way. The power spectrum at \nlow temporal frequency decreases more rapidly with increasing spatial frequency. Second, \nunderlying this data is an interesting scaling behaviour which can be easily seen from the \ncurves for constant w / I ratios: each curve is simply shifted horizontally from each other in \nthe log-log plot. Thus curves for constant w/ I ratio overlap with each other when shifted \nby an amount of G(w/J) , Le., when plotted against a scaled frequency Iw = G(w/f)I. \nThe similar spatio-temporal coupling and scaling for hunam visual sensitivity is shown in \nFigure 3. \n\nInterestingly, the human visual system seems to be designed to take advantage \nof such regularity in natural images. The spatiotemporal contrast sensitivity of \nhuman K(f, w), i.e., the visual responses to a sinewave grating of spatial frequency \nf modulated at temporal frequency w, exhibits the same kind of spatiotemporal \ncoupling and scaling (see Figure 3), \n\nK(f, w) = K(fw). \n\nAgain, when the contrast sensitivity curves are plotted as a function of f for fixed \nwi f ratios, the curves have the same shape and are only shifted from each other [2]. \nA \n\nB \n\nw =2 Hz \n\n100 \n\n~ \n\n;3 \n\n::5 \n~ \n\n10 \n\n100 \n\n~ \n.....:; \n~ \n\n10 \n\n0.1 \n\n1 \n\n10 \n\n0.1 \n\n1 \n\n10 \n\nSpatial Frequency I (cycle/degree) \n\nSpatial Frequency I (cycle/degree) \nFigure 3: Spatiotemporal contrast sensitivities of human vision. (A) plotted as a function \nof spatial frequency for two temporal frequencies (2, 13) Hz; (B) plotted for two w/ I \nratios (0.15, 3) degree/second. The solid lines in both A and B are the empirical fits. \nThe experimental data points and empirical fitting curves are from reference [2]. First, \nit can be seen that the human visual sensitivity curve is band-pass filter at low temporal \nfrequency and approaches low-pass filter for higher temporal frequency. The space and \ntime are coupled. Second, it is clear that the curves for different w / I ratios have the same \nshape and are only shifted horizontally from each other in the log-log plot. Again, curves \nfor constant w/I ratio overlap with each other when shifted by an amount of G(w/f) , \ni.e., when plotted against a scaled frequency Iw = G(w/f)I. The similar behaviour of \nspatiotemporal coupling and scaling for the power spectra of natural images is shown in \nFigure 2. \n\n\f862 \n\nD. W. Dong \n\n3 Relative Motion of Visual Scene \n\nWhy does the human visual sensitivity have the same spatiotemporal coupling and \nscaling as natural images? \n\nThe intuition underlying the spatiotemporal coupling and scaling of natural images \nis that when viewing a real visual scene the natural eye and/or body movements \ntranslate the entire scene across the retina and every spatial Fourier component \nof the scene moves at the same velocity. Thus it is reasonable to assume that \nfor constant velocity, Le., wi 1 ratio, the power spectrum show the same universal \nbehaviour. This assumption is tested quantitatively in the following. \n\nOur measurements reveal that the spatiotemporal power spectrum has a simple \nform \n\nR(fw) '\" 1;;;3 \n\nwhich is shown in Figure 6A. This behaviour can be accounted for if the dominant \ncomponent in the temporal signal comes from motion of objects with static power \nspectra of Rs(f)'\" 1-2 \u2022 The static power spectra for the same collection of images is \nmeasured by treating frames as snapshots (Figure 4A); the measurement confirmed \nthe above assumption and is in agreement with earlier works on the statistical \nproperties of static natural images [5, 6, 7]. \nIt is easy to derive that for a rotationally symmetric static spectrum Rs (f) = KIP \n(K is a constant), the spatiotemporal power spectrum of moving images is \n\n(1) \nwhere P( 7) is the function of velocity distribution, which is shown as the solid curve \nin Figure 4B (measured independently from the optical flows between frames). \n\nR(f,w) = pP(j)' \n\nK w \n\nA \n\n10- 2 \n\n0.1 \nSpatial Frequency f (cycle/degree) \n\n1 \n\nB \n\n:::c: 10- 1 \n., \n........ \n...... \n..-.. \n~ \n....:; \n~ 10-3 \n\n10-5~----~----------------~ \n\n1 \n\n10 \n\nv, w / f (degree/second) \n\nFigure 4: Spatial power spectrum and velocity distribution. (A) the measured spatial \npower spectrum of snap shot images, which shows that Rs(f) rv K/ P is a good approx(cid:173)\nimation to the spectrum; (B) the measured velocity distribution P(v) (solid curve), in \nwhich the data of Figure 2 for the power spectrum were replotted as a function of w / f \nafter multiplication by j3 -\n\nall the data points fall on the P( v) curve. \n\nIn summary, the measured spatiotemporal power spectrum is dominated by images \nof spatial power spectrum'\" 1/12 moving with a velocity distribution P(v) '\" \n1/(v + vO)2 (similar velocity distribution has been proposed earlier [8, 3] . Thus \nR(f, w) = KI 13(wl 1 + VO)2 and G(wl f) '\" (wi 1 + VO)2/3. \n\n\fSpatiotemporal Coupling/Scaling of Natural Images & Visual Sensitivity \n\n863 \n\nBased on the assumption that the visual system is optimized to transmit information \nfrom natural scenes, we have derived and pointed out in references [3, 4] that the \nspatiotemporal contrast sensitivity K is a function of the power spectrum R, and \nthus the spatiotemporal coupling and scaling of R of natural images translates \ndirectly to the spatiotemporal coupling and scaling of K of visual sensitivity i.e., R \nis a function of f w only, so is K. \n\n4 Spatiotemporal Decorrelation \n\nThe theory of spatiotemporal decorrelation is based on ideas of optimal coding from \ninformation theory: decorrelation of inputs to make statistically independent repre(cid:173)\nsentations when signal is strong and smoothing where noise is significant. The end \nresult is that by chosing the correct degree of decorrelation the signal is compressed \nby elimination of what is irrelevant without significant loss of information. \n\nThe following relationship can be derived for the visual sensitivity K and the power \nspectrum R in the presence of noise power N: \n\nThe figure below illustrates the predicted filter for the case of white noise (constant \nN). \n\n0.1 \n\n1 \n\n10 \n\nFigure 5: Predicted optimal filter (curve I): in the low noise regime, it is given by whitening \nfilter R- 1/ 2 (curve II), which achieves spatiotemporal decorrelationj while at high noise \nregime it asymptotes the low-pass filter (curve III) which suppresses noise. \n\nAs shown in Figure 6, the relation between the contrast sensitivity and the power \nspectrum predicts \n\nK(fw)\"\"\" 1 + Nf~ \n\n( \n\nfw \n\n)~ \n\nin which N is the power of the white noise. This prediction is compared with psy(cid:173)\nchophysical data in Figure 6B where we have used the scaling function G(w/ f) = \n(w/ f + VO)2/3 which has the same asymptotic behaviour as we have shown for the \nnatural time-varying images [3]. We find that for Vo = 1 degree/second, the human \n\n\f864 \n\nD. W. Dong \n\ncontrast sensitivity curves for w/ f from 0.1 to 4 degree/second, measured in refer(cid:173)\nence [2], overlap very well with the theoretical prediction from the power spectrum \nof our measurements. \nA \n\nB \n\n10- 1 \n\n10-3 \n\n10-4 \n\n100 \n\n~ \n...:; \n\"-' \n~ \n\n10 \n\n0.1 \n\n1 \n\n10 \n\nScaled Frequency 1 w \n\n~~----------~~----~--~ \n\n0.1 \n\n1 \n\nScaled Frequency 1 w \n\nFigure 6: Relation between the power spectrum of natural images and the human visual \nsensitivities. (A) the measured spatiotemporal power spectrum (Figure 2B) replotted as \na function of the scaled frequency can be fit very well by R '\" 1;;3 (solid line); (B) the \nspatiotemporal contrast sensitivities of human vision (Figure 3B) replotted as a function \nof the scaled frequency can be fit very well by our theoretical prediction (solid line). \nOur theory on the relation between the visual sensitivity K and the power spectrum of \nnatural time-varyin~ images R in the presence of noise power N has been described in \ndetail in reference [4] . To summarize, the visual sensitivity in Fourier space is simply \nK = R- 1/ 2(1 + N/ R) - 3/ 2. In a linear system, this is proportional to the visual response \nto a sinewave of spatial frequency 1 modulated at temporal frequency w, Le., the contrast \nsensitivity curves shown in Figure 3. In the case of white noise, Le., N is independent of \n1 and w, K depends on 1 and w through the power spectrum R . Since R is a function \nof the scaled frequency Iw only, so is K . From our measurement R '\" I~ , thus K \n'\" \nI:,P(1 + N/~)-3/2 . This curve is plotted in the figure as the solid line with N = 0.01 . \nThe agreement is very good. \n\n5 Conclusions and Discussions \n\nA simple relati9nship is revealed between the statistical structure of natural time(cid:173)\nvarying images and the spatiotemporal sensitivity of human vision. The existence \nof this relationship supports the hypothesis that visual processing is optimized to \ncompress as much information as possible about the outside world into the limited \ndynamic range of the visual channels. \n\nWe should point out that this scaling behaviour is expected to break down for very \nhigh temporal and spatial frequency where the effect of the temporal and spatial \nmodulation function of the eye [9, 10] cannot be ignored. \n\nFinally while our predictions show that, in general, the human visual sensitivity \nis strongly space-time coupled, we do predict a regime where decoupling is a good \napproximation. This is based on the fact that in the regime of relatively high \ntemporal frequency and relatively low spatial frequency we find that the power \nspectrum of natural images is separable into spatial and temporal parts [3]. In a \nprevious work we have used this decoupling to model response properties of cat \nLGN cells where we have shown that these can be accounted for by the theoretical \nprediction based on the power spectrum in that regime [4]. \n\nAcknowledgements \nThe author gratefully acknowledges the discussions with Dr. Joseph Atick. \n\n\fSpatiotemporal Coupling/Scaling of Natural Images & Visual Sensitivity \n\n865 \n\nReferences \n\n[1] Kretzmer ER, 1952. Statistics of television signals. The bell system technical \n\njournal. 751-763. \n\n[2] Kelly DR, 1979 Motion and vision. II. Stabilized spatio-temporal threshold \n\nsurface. J. Opt. Soc. Am. 69, 1340-1349. \n\n[3] Dong DW, Atick JJ, 1995 Statistics of natural time-varying images. Network: \n\nComputation in Neural Systems, 6, 345-358. \n\n[4] Dong DW, Atick JJ, 1995 Temporal decorrelation: a theory of lagged and \nnonlagged responses in the lateral geniculate nucleus. Network: Computation \nin Neural Systems, 6, 159-178. \n\n[5] Burton GJ, Moorhead JR, 1987. Color and spatial structure in natural scenes. \n\nApplied Optics. 26(1): 157-170. \n\n[6] Field DJ, 1987. Relations between the statistics of natural images and the \n\nresponse properties of cortical cells .. J. Opt. Soc. Am. A 4: 2379-2394. \n\n[7] Ruderman DL, Bialek W , 1994. Statistics of natural images: scaling in the \n\nwoods. Phy. Rev. Let. 73(6): 814-817. \n\n[8] Van Rateren JR, 1993. Spatiotemporal Contrast sensitivity of early vision. \n\nVision Res. 33(2): 257-267. \n\n[9] Campbell FW, Gubisch RW, 1966. Optical quality of the human eye. J. Phys(cid:173)\n\niol. 186: 558-578. \n\n[10] Schnapf JL, Baylor DA, 1987. Row photoreceptor cells respond to light. Sci(cid:173)\n\nentific American 256(4): 40-47. \n\n\f", "award": [], "sourceid": 1188, "authors": [{"given_name": "Dawei", "family_name": "Dong", "institution": null}]}