{"title": "Spatiotemporal Coupling and Scaling of Natural Images and Human Visual Sensitivities", "book": "Advances in Neural Information Processing Systems", "page_first": 859, "page_last": 865, "abstract": null, "full_text": "Spatiotemporal Coupling and  Scaling of \n\nNatural Images and Human Visual \n\nSensitivities \n\nDawei W.  Dong \n\nCalifornia Institute of Technology \n\nMail Code 139-74 \n\nPasadena,  CA 91125 \n\ndawei@hope.caltech.edu \n\nAbstract \n\nWe  study  the spatiotemporal  correlation  in  natural  time-varying \nimages  and  explore  the  hypothesis  that  the visual system  is  con(cid:173)\ncerned  with  the  optimal  coding  of visual  representation  through \nspatiotemporal  decorrelation  of  the  input  signal.  Based  on  the \nmeasured spatiotemporal power spectrum, the transform needed to \ndecorrelate input signal is derived analytically and then compared \nwith the actual processing observed in psychophysical experiments. \n\n1 \n\nIntroduction \n\nThe visual system is concerned with the perception of objects in a  dynamic world. \nA significant fact about natural time-varying images is that they do not change ran(cid:173)\ndomly  over space-time;  instead  image  intensities  at  different  times  and/or spatial \npositions are  highly  correlated.  We  measured the spatiotemporal correlation func(cid:173)\ntion - equivalently  the  power spectrum - of natural images  and we  find  that  it  is \nnon-separable, i.e., coupled in space and time, and exhibits a very interesting scaling \nbehaviour.  When expressed as a function of an appropriately scaled frequency  vari(cid:173)\nable, the spatiotemporal power spectrum is  given by a simple power-law.  We  point \nout that the same kind of spatiotemporal coupling and scaling exists in human vi(cid:173)\nsual sensitivity measured in psychophysical experiments.  This poses the intriguing \nquestion of whether there is a quantitative relationship between the power spectrum \nof natural images and visual sensitivity.  We  answer  this question  by showing that \nthe latter can be predicted from  measurements of the power spectrum. \n\n\f860 \n\nD.  W.  Dong \n\n2  Spatiotemporal Coupling and Scaling \n\nInterest in properties of time-varying images dates back to the early days of develop(cid:173)\nment of the television  [1].  But systematic studies have not been possible previously \nprimarily due to technical obstacles,  and our knowledge of the regularities of time(cid:173)\nvarying images has so far  been very limited. \n\n. ~ \n\",,'- ... } \n, \n\n~ , \n\nr \n<. )~ \n't \n--=,.1 \n\nFigure 1:  Natural time-varying images are highly correlated in space and time.  Shown on \nthe top  are  two  frames  of a  motion scene  separated  by  thirty  three  milliseconds.  These \ntwo frames  are highly repetitive, in  fact  the light  intensities of most  corresponding pixels \nare similar.  Shown on  the bottom are  light  increase  (on  the  left)  and light  decrease  (on \nthe right)  between the above two snapshots indicated by greyscale of pixels  (white means \nno  change).  One  can  immediately  see  that  only  a  small  portion  of  the  image  changes \nsignificantly  over  this  time  scale.  Our  methods  have  been  described  previously  [3J.  To \nsummerize, more than one thousand segments of videos on 8mm video tape (NTSC format \nRGB)  are digitized to 8  bits greyscale  using a  Silicon  Graphics  Video board with  default \nfactory settings.  Two types of segments are analyzed.  The first are segments from  movies \non video tapes (e.g.  \"Raiders of the Lost  Ark\",  \"Uncommon Valor\").  The second type of \nsegments that we analyzed are videos made by the authors.  The scene of the moving egret \nshown here is taken at Central Park in New York City. \n\nWe  have  systematically  measured  the  two  point  correlation  matrix  or  covariance \nmatrix of lOoxlOox2s  (horizontalxverticalxtemporal digitized to 64x64x64)  seg(cid:173)\nments of natural time-varying  images by averaging over  1049  movie segments.  An \nexample of two consecutive frames from a  typical segment is given in Figure 1.  The \nFourier  transform  of the  correlation  matrix,  or  the  power  spectrum,  turns  out  to \nbe  a  non-separable  function  of spatial  and  temporal  frequencies  and  exhibits  an \ninteresting scaling behaviour.  From our measurements (see Figure 2)  we  find \n\nR(j,w) = R(jw) \n\nwhere  1w  is  a  scaled  frequency  which  is  simply  the  spatial frequency  1 scaled  by \nG(wl1),  a  function  of  the  ratio  of  temporal  and  spatial  frequencies,  i.e.,  1w  = \nG(wl1)1.  This behaviour is  revealed  most  clearly by plotting the power spectrum \nas a  function  of 1 for  fixed wi 1 ratio:  the curves for  different wi 1 ratios are just a \nhorizontal shift from  each other. \n\n\fSpatiotemporal Coupling/Scaling of Natural Images &  VISual Sensitivity \n\n861 \n\nA \n\n10- 1  W  = 0.9  Hz \n\n10- 2 \n\n~ \n.....:; \n~ \n\n10- 3 \n\n10- 4 \n\nw/I = 7\u00b0/s \n\nw/I=2.3\u00b0/s \n\nB \n\n10- 1 \n\n10-3 \n\n10- 4 \n\n1 \n\n1 \n\nratios of temporal and spatial frequencies -\n\n0.1 \nSpatial Frequency I  (cycle/degree) \n\n0.1 \nSpatial Frequency I  (cycle/degree) \nFigure 2:  Spatiotemporal power spectra of natural time-varying images.  (A)  plotted as a \nfunction of spatial frequency for three temporal frequencies  (0.9, 3, 10)  Hz; (:8) plotted for \n(0.8,  2.3,  7)  degree/second. \nthree velocities -\nThere are some important conclusions that can  be drawn  from  this  measurement.  First, \nit  is  obvious  that  the  power  spectrum  cannot  be  separated  into  pure  spatial  and  pure \ntemporal parts; space and  time are coupled in  a  non-trivial way.  The power spectrum at \nlow temporal frequency  decreases more rapidly with increasing spatial frequency.  Second, \nunderlying this data is an interesting scaling  behaviour which can be easily seen from the \ncurves for constant w / I  ratios:  each curve is simply shifted horizontally from each other in \nthe log-log plot.  Thus curves for  constant w/ I  ratio overlap with each other when shifted \nby  an  amount  of G(w/J) ,  Le.,  when  plotted  against  a  scaled  frequency  Iw  =  G(w/f)I. \nThe similar spatio-temporal coupling and scaling for  hunam visual sensitivity is shown in \nFigure 3. \n\nInterestingly,  the  human  visual  system  seems  to  be  designed  to  take  advantage \nof such  regularity  in  natural  images.  The  spatiotemporal  contrast  sensitivity  of \nhuman K(f, w), i.e., the visual responses to a sinewave grating of spatial frequency \nf  modulated  at  temporal  frequency  w,  exhibits  the  same  kind  of spatiotemporal \ncoupling and scaling (see Figure 3), \n\nK(f, w)  =  K(fw). \n\nAgain,  when the contrast sensitivity curves  are  plotted as  a  function of f  for  fixed \nwi f  ratios, the curves have the same shape and are only shifted from each other [2]. \nA \n\nB \n\nw  =2 Hz \n\n100 \n\n~ \n\n;3 \n\n::5 \n~ \n\n10 \n\n100 \n\n~ \n.....:; \n~ \n\n10 \n\n0.1 \n\n1 \n\n10 \n\n0.1 \n\n1 \n\n10 \n\nSpatial Frequency I  (cycle/degree) \n\nSpatial Frequency I  (cycle/degree) \nFigure 3:  Spatiotemporal contrast sensitivities of human vision.  (A) plotted as a  function \nof spatial  frequency  for  two  temporal  frequencies  (2,  13)  Hz;  (B)  plotted  for  two  w/ I \nratios  (0.15,  3)  degree/second.  The  solid  lines  in  both  A  and  B  are  the  empirical  fits. \nThe  experimental  data points  and  empirical  fitting  curves  are  from  reference  [2].  First, \nit  can  be seen that the human visual sensitivity curve is band-pass filter  at low  temporal \nfrequency  and  approaches  low-pass  filter  for  higher  temporal  frequency.  The space  and \ntime are coupled.  Second, it is clear that the curves for different w / I  ratios have the same \nshape and are only shifted  horizontally from each other in the log-log plot.  Again,  curves \nfor  constant  w/I ratio  overlap  with  each  other  when  shifted  by  an  amount  of G(w/f) , \ni.e.,  when  plotted  against  a  scaled  frequency  Iw  = G(w/f)I.  The similar  behaviour of \nspatiotemporal  coupling and scaling for  the power spectra of natural  images is shown  in \nFigure 2. \n\n\f862 \n\nD.  W.  Dong \n\n3  Relative Motion of Visual Scene \n\nWhy does the human visual sensitivity have the same spatiotemporal coupling and \nscaling as natural images? \n\nThe intuition underlying the spatiotemporal coupling and scaling of natural images \nis  that  when  viewing  a  real  visual  scene  the  natural eye  and/or body  movements \ntranslate  the  entire  scene  across  the  retina  and  every  spatial  Fourier  component \nof  the  scene  moves  at  the  same  velocity.  Thus  it  is  reasonable  to  assume  that \nfor  constant velocity,  Le.,  wi 1 ratio,  the power  spectrum show  the same universal \nbehaviour.  This assumption is tested quantitatively in the following. \n\nOur  measurements  reveal  that  the  spatiotemporal  power  spectrum  has  a  simple \nform \n\nR(fw) '\" 1;;;3 \n\nwhich is  shown in Figure 6A.  This behaviour can be accounted for  if the dominant \ncomponent in the temporal signal comes  from  motion of objects with static  power \nspectra of Rs(f)'\" 1-2 \u2022  The static power spectra for  the same collection of images is \nmeasured by treating frames as snapshots (Figure 4A); the measurement confirmed \nthe  above  assumption  and  is  in  agreement  with  earlier  works  on  the  statistical \nproperties of static natural images  [5,  6,  7]. \nIt is easy to derive that for  a rotationally symmetric static spectrum Rs (f) =  KIP \n(K is  a  constant), the spatiotemporal power spectrum of moving images is \n\n(1) \nwhere P( 7) is the function of velocity distribution, which is shown as the solid curve \nin Figure 4B  (measured independently from  the optical flows  between frames). \n\nR(f,w) =  pP(j)' \n\nK  w \n\nA \n\n10- 2 \n\n0.1 \nSpatial Frequency f  (cycle/degree) \n\n1 \n\nB \n\n:::c:  10- 1 \n., \n........ \n...... \n..-.. \n~ \n....:; \n~ 10-3 \n\n10-5~----~----------------~ \n\n1 \n\n10 \n\nv,  w / f  (degree/second) \n\nFigure  4:  Spatial  power  spectrum  and  velocity  distribution.  (A)  the  measured  spatial \npower spectrum of snap shot  images,  which shows that  Rs(f) rv  K/ P  is  a  good  approx(cid:173)\nimation  to  the  spectrum;  (B)  the  measured  velocity  distribution  P(v)  (solid  curve),  in \nwhich  the data of Figure  2  for  the power  spectrum  were  replotted  as  a  function  of w / f \nafter multiplication by j3 -\n\nall the data points fall on the P( v)  curve. \n\nIn summary, the measured spatiotemporal power spectrum is dominated by images \nof spatial  power  spectrum'\"  1/12  moving  with  a  velocity  distribution  P(v)  '\" \n1/(v + vO)2  (similar  velocity  distribution  has  been  proposed  earlier  [8,  3]  .  Thus \nR(f, w)  =  KI 13(wl 1 + VO)2  and G(wl f) '\" (wi 1 + VO)2/3. \n\n\fSpatiotemporal Coupling/Scaling of Natural Images &  Visual Sensitivity \n\n863 \n\nBased on the assumption that the visual system is optimized to transmit information \nfrom  natural scenes,  we  have  derived  and pointed out in references  [3,  4]  that the \nspatiotemporal contrast sensitivity  K  is  a  function  of the power spectrum  R,  and \nthus  the  spatiotemporal  coupling  and  scaling  of  R  of  natural  images  translates \ndirectly to the spatiotemporal coupling and scaling of K  of visual sensitivity i.e.,  R \nis  a  function of f w  only, so is  K. \n\n4  Spatiotemporal Decorrelation \n\nThe theory of spatiotemporal decorrelation is based on ideas of optimal coding from \ninformation theory:  decorrelation of inputs to make statistically independent repre(cid:173)\nsentations when signal is strong and smoothing where  noise is  significant.  The end \nresult is that by chosing the correct degree of decorrelation the signal is compressed \nby  elimination of what is  irrelevant without significant loss of information. \n\nThe following relationship can be derived for the visual sensitivity K  and the power \nspectrum R  in the presence of noise power  N: \n\nThe figure below illustrates the predicted filter for  the case of white noise (constant \nN). \n\n0.1 \n\n1 \n\n10 \n\nFigure 5:  Predicted optimal filter (curve I):  in the low noise regime, it is given by whitening \nfilter  R- 1/ 2  (curve II),  which  achieves  spatiotemporal  decorrelationj  while  at  high  noise \nregime it asymptotes the low-pass filter  (curve III)  which suppresses noise. \n\nAs  shown in Figure 6,  the relation  between the contrast sensitivity and the power \nspectrum predicts \n\nK(fw)\"\"\"  1 + Nf~ \n\n( \n\nfw \n\n)~ \n\nin which  N  is  the power of the white noise.  This prediction is  compared with psy(cid:173)\nchophysical data in Figure 6B  where we  have  used  the scaling function  G(w/ f) = \n(w/ f + VO)2/3  which  has the same asymptotic behaviour as  we  have shown for  the \nnatural time-varying images  [3].  We find  that for  Vo  =  1 degree/second, the human \n\n\f864 \n\nD.  W.  Dong \n\ncontrast sensitivity curves for  w/ f  from  0.1  to 4 degree/second, measured in refer(cid:173)\nence [2],  overlap very well with the theoretical prediction from  the power spectrum \nof our measurements. \nA \n\nB \n\n10- 1 \n\n10-3 \n\n10-4 \n\n100 \n\n~ \n...:; \n\"-' \n~ \n\n10 \n\n0.1 \n\n1 \n\n10 \n\nScaled Frequency 1 w \n\n~~----------~~----~--~ \n\n0.1 \n\n1 \n\nScaled Frequency 1 w \n\nFigure 6:  Relation  between the power spectrum of natural images and  the human visual \nsensitivities.  (A)  the measured spatiotemporal power spectrum  (Figure 2B)  replotted  as \na  function  of the scaled  frequency  can  be fit  very  well  by  R '\" 1;;3  (solid  line);  (B)  the \nspatiotemporal  contrast sensitivities of human vision  (Figure 3B) replotted  as  a  function \nof  the  scaled  frequency  can  be  fit  very  well  by  our  theoretical  prediction  (solid  line). \nOur  theory  on the relation  between the visual  sensitivity  K  and  the  power  spectrum of \nnatural  time-varyin~ images  R  in  the  presence of noise  power  N  has  been  described  in \ndetail  in  reference  [4] .  To  summarize,  the  visual  sensitivity  in  Fourier  space  is  simply \nK  =  R- 1/ 2(1  + N/ R) - 3/ 2.  In a  linear system, this is  proportional to the visual response \nto a sinewave of spatial frequency 1 modulated at temporal frequency w, Le.,  the contrast \nsensitivity  curves shown in  Figure 3.  In the case of white noise,  Le.,  N  is  independent of \n1 and  w,  K  depends on 1 and  w  through  the power spectrum  R .  Since  R  is  a  function \nof the  scaled  frequency  Iw only,  so  is  K .  From  our  measurement  R  '\" I~ ,  thus  K \n'\" \nI:,P(1 + N/~)-3/2 .  This curve is  plotted  in the figure  as  the solid  line  with  N  =  0.01 . \nThe agreement is  very good. \n\n5  Conclusions and  Discussions \n\nA simple  relati9nship  is  revealed between the statistical structure of natural time(cid:173)\nvarying images  and the spatiotemporal sensitivity of human vision.  The existence \nof this relationship supports the  hypothesis  that visual processing  is  optimized  to \ncompress as  much information as  possible about the outside world  into the limited \ndynamic range of the visual channels. \n\nWe should point out that this scaling behaviour is  expected to break down for very \nhigh  temporal  and spatial frequency  where  the  effect  of the  temporal  and spatial \nmodulation function of the eye  [9,  10]  cannot be ignored. \n\nFinally  while  our  predictions  show  that,  in  general,  the  human  visual  sensitivity \nis  strongly space-time coupled, we  do predict a  regime  where decoupling is  a  good \napproximation.  This  is  based  on  the  fact  that  in  the  regime  of  relatively  high \ntemporal  frequency  and  relatively  low  spatial  frequency  we  find  that  the  power \nspectrum of natural images  is  separable  into spatial and  temporal  parts  [3].  In  a \nprevious  work  we  have  used  this  decoupling  to  model  response  properties  of cat \nLGN cells where we  have shown that these can be accounted for  by the theoretical \nprediction based on the power spectrum in that regime  [4]. \n\nAcknowledgements \nThe author gratefully acknowledges the discussions  with Dr.  Joseph Atick. \n\n\fSpatiotemporal Coupling/Scaling of Natural Images &  Visual Sensitivity \n\n865 \n\nReferences \n\n[1]  Kretzmer ER,  1952.  Statistics of television signals.  The bell system technical \n\njournal.  751-763. \n\n[2]  Kelly  DR,  1979  Motion  and vision.  II.  Stabilized  spatio-temporal  threshold \n\nsurface.  J.  Opt.  Soc.  Am.  69,  1340-1349. \n\n[3]  Dong DW, Atick JJ, 1995 Statistics of natural time-varying images.  Network: \n\nComputation in  Neural Systems, 6,  345-358. \n\n[4]  Dong  DW,  Atick  JJ,  1995  Temporal  decorrelation:  a  theory  of lagged  and \nnonlagged responses in the lateral geniculate nucleus.  Network:  Computation \nin Neural Systems, 6,  159-178. \n\n[5]  Burton GJ, Moorhead JR,  1987.  Color and spatial structure in natural scenes. \n\nApplied Optics.  26(1):  157-170. \n\n[6]  Field DJ,  1987.  Relations  between the statistics of natural images  and the \n\nresponse  properties of cortical cells ..  J.  Opt. Soc.  Am.  A  4:  2379-2394. \n\n[7]  Ruderman  DL,  Bialek W ,  1994.  Statistics of natural  images:  scaling  in  the \n\nwoods.  Phy.  Rev.  Let.  73(6):  814-817. \n\n[8]  Van  Rateren  JR,  1993.  Spatiotemporal  Contrast  sensitivity  of early  vision. \n\nVision Res.  33(2):  257-267. \n\n[9]  Campbell FW, Gubisch RW,  1966. Optical quality of the human eye.  J. Phys(cid:173)\n\niol.  186:  558-578. \n\n[10]  Schnapf JL,  Baylor DA,  1987.  Row  photoreceptor cells respond  to light.  Sci(cid:173)\n\nentific American  256(4):  40-47. \n\n\f", "award": [], "sourceid": 1188, "authors": [{"given_name": "Dawei", "family_name": "Dong", "institution": null}]}