{"title": "Learning a Color Algorithm from Examples", "book": "Neural Information Processing Systems", "page_first": 622, "page_last": 631, "abstract": null, "full_text": "622 \n\nLEARNING A COLOR ALGORITHM FROM EXAMPLES \n\nAnya C. Hurlbert and Tomaso A. Poggio \n\nArtificial Intelligence Laboratory and Department of Brain and Cognitive Sciences, \n\nMassachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA \n\nABSTRACT \n\nA lightness algorithm that separates surface reflectance from illumination in a \nMondrian world is synthesized automatically from a set of examples, pairs of input \n(image irradiance) and desired output (surface reflectance). The algorithm, which re(cid:173)\nsembles a new lightness algorithm recently proposed by Land, is approximately equiva(cid:173)\nlent to filtering the image through a center-surround receptive field in individual chro(cid:173)\nmatic channels. The synthesizing technique, optimal linear estimation, requires only \none assumption, that the operator that transforms input into output is linear. This \nassumption is true for a certain class of early vision algorithms that may therefore be \nsynthesized in a similar way from examples. Other methods of synthesizing algorithms \nfrom examples, or \"learning\", such as backpropagation, do not yield a significantly dif(cid:173)\nferent or better lightness algorithm in the Mondrian world. The linear estimation and \nbackpropagation techniques both produce simultaneous brightness contrast effects. \n\nThe problems that a visual system must solve in decoding two-dimensional images \ninto three-dimensional scenes (inverse optics problems) are difficult: the information \nsupplied by an image is not sufficient by itself to specify a unique scene. To reduce \nthe number of possible interpretations of images, visual systems, whether artificial \nor biological, must make use of natural constraints, assumptions about the physical \nproperties of surfaces and lights. Computational vision scientists have derived effective \nsolutions for some inverse optics problems (such as computing depth from binocular \ndisparity) by determining the appropriate natural constraints and embedding them in \nalgorithms. How might a visual system discover and exploit natural constraints on its \nown? We address a simpler question: Given only a set of examples of input images and \ndesired output solutions, can a visual system synthesize. or \"learn\", the algorithm that \nconverts input to output? We find that an algorithm for computing color in a restricted \nworld can be constructed from examples using standard techniques of optimal linear \nestimation. \n\nThe computation of color is a prime example of the difficult problems of inverse \noptics. We do not merely discriminate betwN'n different wavelengths of light; we assign \n\n@ American Institute of Physics 1988 \n\n\f623 \n\nroughly constant colors to objects even though the light signals they send to our eyes \nchange as the illumination varies across space and chromatic spectrum. The compu(cid:173)\ntational goal underlying color constancy seems to be to extract the invariant surface \nspectral reflectance properties from the image irradiance, in which reflectance and iI-\" \nlumination are mixed 1 \u2022 \n\nLightness algorithms 2-8, pioneered by Land, assume that the color of an object \n\ncan be specified by its lightness, or relative surface reflectance, in each of three inde(cid:173)\npendent chromatic channels, and that lightness is computed in the same way in each \nchannel. Computing color is thereby reduced to extracting surface reflectance from the \nimage irradiance in a single chromatic channel. \n\nThe image irra.diance, s', is proportional to the product of the illumination inten(cid:173)\n\nsity e' and the surface reflectance r' in that channel: \n\n(1 ) \nThis form of the image intensity equation is true for a Lambertian reflectance model, \nin which the irradiance s' has no specular components, and for appropriately chosen \ncolor channels 9. Taking the logarithm of both sides converts it to a sum: \n\ns' (x, y) = r' (x, y )e' (x, y). \n\ns(x, y) = rex, y) + e(x,y), \n\n(2) \n\nwhere s = loges'), r = log(r') and e = log(e'). \n\nGiven s(x,y) alone, the problem of solving Eq. 2 for r(x,y) is underconstrained. \nLightness algorithms constrain the problem by restricting their domain to a world of \nMondrians, two-dimensional surfaces covered with patches of random colors2 and by \nexploiting two constraints in that world: (i) r'(x,y) is unifonn within patches but \nhas sharp discontinuities at edges between patches and (ii) e' (x, y) varies smoothly \nacross the Mondrian. Under these constraints, lightness algorithms can recover a good \napproximation to r( x, y) and so can recover lightness triplets that label roughly constant \ncolors 10. \n\nWe ask whether it is possible to synthesize from examples an algorithm that ex\u00b7 \n\ntracts reflectance from image irradiance. and whether the synthesized algorithm will re(cid:173)\nsemble existing lightness algorithms derived from an explicit analysis of the constraints. \nWe make one assumption, that the operator that transforms irradiance into reflectance \nis linear. Under that assumption, motivated by considerations discussed later, we use \noptimal linear estimation techniques to synthesize an operator from examples. The \nexamples are pairs of images: an input image of a Mondrian under illumination that \nvaries smoothly across space and its desired output image that displays the reflectance \nof the Mondrian without the illumination. The technique finds the linear estimator \nthat best maps input into desired output. in the least squares sense. \n\nFor computational convenience we use one-dimensional \"training vectors\" that \nrepresent vertical scan lines across the ~londrian images (Fig. 1). We generate many \n\n\f624 \n\n100 ,\n\n~ \n\n1S0t\u00b7-~\u00b7\n~~~--------------~~--~--~~ \n\n'. \n\n, \n\nllO \n\n100 \n\n100 \n\n110 \n\nSO \n\n100 \nlilt., \n\na \n\nInput d.t. \n\na \n\nISO \n\nSO \n\n101 \n\ni;[:2:=:0 :~kfhJfEirQ b \nf~l \n\nc \n\n100 \nI\"'\" \n\n)01 \np/Jte' \n\nUI \n\no \n\nZOO \n\n110 \n\n100 \n\nZSO \n\nISO \n\nZOO \n\n100 \n\nISO \n\n100 \n\nUO \n\n100 \n\n0 \n\nI I \n\n100 \n\n50 \n\nOlltPllt lIlll.l .. ll.a \n\n)00 \np,xe' \n\n.1 \n\n110 \n\nlot \np'.\" \n\nFig. 1. (a) The input data, a one-dimensional vector 320 pixels long. Its random \nMondrian reflectance pattern is superimposed on a linear illumination gradient with \na random slope and offset. (b) shows the corresponding output solution, on the left \nthe illumination and on the right reBectance. We used 1500 such pairs of input(cid:173)\noutput examples (each different from the others) to train the operator shown in Fig. \n2. (c) shows the result obtained by the estimated operator when it acts on the input \ndata (a), not part of the training set. On the left is the illumination and on the \nright the reflectance, to be compared with (b). This result is fairly typical: in some \ncases the prediction is even better, in others it is worse. \n\ndifferent input vectors s by adding together different random T and e vectors, according \nto Eq. 2. Each vector r represents a pattern of step changes across space, corresponding \nto one column of a reHectance image. The step changes occur at random pixels and \nare of random amplitude between set minimum and maximum values. Each vector t \nrepresents a smooth gradient across space with a random offset and slope, correspondin~ \nto one column of an illumination image. We th~n arrange the training vectors sand r \nas the columns of two matrices Sand R, resp~ti\u00b7 .. ely. Our goal is then to compute the \noptimal solution L of \n\nwhere L is a linear operator represented as a matrix. \n\nLS = R \n\n\f625 \n\nIt is well known that the solution of this equation that is optimal in the least \n\nsquares sense is \n\n( 4) \nwhere S+ is the Moore-Penrose pseudoinverse 11. We compute the pseudoinverse by \noverconstraining the problem - using many more training vectors than there are number \nof pixels in each vector - and using the straightforward formula that applies in the \noverconstrained case 12: S+ = ST(SST)-l. \n\nThe operator L computed in this way recovers a good approximation to the correct \noutput vector r when given a new s, not part of the training set, as input (Fig. Ic). \nA second operator, estimated in the same way, recovers the illumination e. Acting on \na random two-dimensional Mondrian L also yields a satisfactory approximation to the \ncorrect output image. \n\nOur estimation scheme successfully synthesizes an algorithm that performs the \nlightness computation in a Mondrian world. What is the algorithm and what is its \nrelationship to other lightness algorithms? To answer these questions we examine the \nstructure of the matrix L. We assume that, although the operator is not a convolution \noperator, it should approximate one far from the boundaries of the image. That is, \nin its central part, the operator should be space-invariant, performing the same action \non each point in the image. Each row in the central part of L should therefore be \nthe same as the row above but displaced by one element to the right. Inspection of \nthe matrix confirmes this expectation. To find the form of L in its center, we thus \naverage the rows there, first shifting them appropriately. The result, shown in Fig. 2, \nis a space-invariant filter with a narrow positive peak and a broad, shallow, negative \nsurround. \n\nInterestingly, the filter our scheme synthesizes is very similar to Land's most recent \nretinex operator 5, which divides the image irradiance at each pixel by a weighted \naverage of the irradiance at all pixels in a large surround and takes the logarithm of \nthat result to yield lightness 13. The lightness triplets computed by the retinex operator \nagree well with human perception in a Mondrian world. The retinex operator and our \nmatrix L both differ from Land's earlier retinex algorithms, which require a non-linear \nthresholding step to eliminate smooth gradients of illumination. \n\nThe shape of the filter in Fig. 2, particularly of its large surround, is also sugges(cid:173)\n\ntive of the \"nonclassical\" receptive fields that have been found in V 4, a cortical area \nimplicated in mechanisms underlying color constancy 14-17. \n\nThe form of the space-invariant filter is similar to that derived in our earlier formal \nanalysis of the lightness problem 8. It is qualitatively the same as that which results \nfrom the direct application of regularization methods exploiting the spatial constraints \non reflectance and illumination described above 9.18.19. The Fourier transform of the \nfilter of Fig. 2 is approximately a bandpass filter that cuts out low frequencies due \n\n\f626 \n\n0 -(.) \nC -s::. \n\n~ \nC' \n\n~ \n\n.2' \n~ \n\na \n\n-80 \n\n-80 \n\n0 \n\nPi xe Is \n\n-----------\n\no \n\nPixels \n\n+80 \n\nFig. 2. The space-invariant part of the estimated operator, obtained by shifting and \naveraging the rows of a 160-pixel-wide central square of the matrix L, trained on a set \nof 1500 examples with linear illumination gradients (see Fig. 1). When logarithmic \nillumination gradients are used , a qualitatively similar receptive field is obtained. In \na separate experiment we use a training set of one-dimensional Mondrians with either \nlinear illumination gradients or slowly varying sinusoidal illumination components \nwith random wavelength, phase and amplitude. T he resulting filter is shown in \nthe inset. The surrounds of both filters extend beyond the range we can estimate \nreliably, the range we show here. \n\nto slow gradients of illumination and preserves intennediate frequencies due to step \nIn contrast, the operator that recovers the illumination, e. \nchanges in reflectance. \n\\Ve stress that the entire operator L is not a \ntakes the form of a low-pass filter. \nspace-invariant filter. \n\nIn this context, it is clear that the shape of the estimated operator should vary with \nthe type of illumination gradient in the training set. We synthesize a second operator \nusing a new set of examples that contain equal numbers of vectors with random, sinu(cid:173)\nsoidally varying illumination components and VE\"(tors with random, linear illumination \ngradients. Whereas the first operator, synthE.>Sized from examples with strictly linear \nillumination gradients, has a broad negative surround that remains virtually constant \nthroughout its extent, the new operator's surround (Fig. 2, inset) has a smaller ext(,111 \n\n\f627 \n\nand decays smoothly towards zero from its peak negative value in its center. \n\nWe also apply the operator in Fig. 2 to new input vectors in which the density \nand amplitude of the step changes of reflectance differ greatly from those on which the \noperator is trained. The operator performs well, for example, on an input vector rep(cid:173)\nresenting one column of an image of a small patch of one reflectance against a uniform \nbackground of a different reflectance, the entire image under a linear illumination gra(cid:173)\ndient. This result is consistent with psychophysical experiments that show that color \nconstancy of a patch holds when its Mondrian background is replaced by an equivalent \ngrey background 20. \n\nThe operator also produces simultaneous brightness contrast, as expected from the \nshape and sign of its surround. The output reflectance it computes for a patch of fixed \ninput reflectance decreases linearly with increasing average irradiance of the input test \nvector in which the patch appears. Similarly, to us, a dark patch appears darker when \nagainst a light background than against a dark one. \n\nThis result takes one step towards explaining such illusions as the Koffka Ring 21. \nA uniform gray annulus against a bipartite background (Fig. 3a) appears to split into \ntwo halves of different lightnesses when the midline between the light and dark halves \nof the background is drawn across the annulus (Fig. 3b). The estimated operator \nacting on the Koffka Ring of Fig. 3b reproduces our perception by assigning a lower \noutput reflectance to the left half of the annulus (which appears darker to us) than to \nthe right half 22. Yet the operator gives this brightness contrast effect whether or not \nthe midline is drawn across the annulus (Fig. 3c). Becau~e the opf'rator can perform \nonly a linear transformation between the input and output images, it is not surprising \nthat the addition of the midline in the input evokes so little change in the output. \nThese results demonstrate that the linear operator alone cannot compute lightness in \nall worlds and suggest that an additional operator might be necessary to mark and \nguide it within bounded regions. \n\nOur estimation procedure is motivated by our previous observation 9.23,18 that \nstandard regularization algorithms 19 in early vision define linear mappings between \ninput and output and therefore can be estimated associatively under certain condi\u00b7 \ntions. The technique of optimal linear estimation that we use is closely related to \noptimal Bayesian estimation 9. If we were to assume from the start that the optimal \nlinear operator is space-invariant, we could considerably simplify (and streamline) the \ncomputation by using standard correlation te<:hniques 9.24. \n\nHow does our estimation technique compare with other methods of \"learning\" a \nlightness algorithm? We can compute the r~ularized pseudoinverse using gradient \ndescent on a \"neural\" network 25 with linf'ar units. Since the pseudoinverse is lhf\" \nunique best linear approximation in the L1 norm. a gradient descent method that \n\n\f628 \n\nminimizes the square error between the actual output and desired output of a fully \nconnected linear network is guaranteed to converge, albeit slowly. Thus gradient de(cid:173)\nscent in weight space converges to the same result as our first technique, the global \nminimum. \n\na \n\nb \n\n0 \n\n.It \n\nc .n \n... ~~ \n~:== \n\n:i II ... \n, \nu \n, \ne. sa \n\ninput data \n. \n. \n\noutput reflectance - with edge \n\n~ ~ _ \n\npixel \n\n. \n\n.sa \n\nut \n\n-\n\nsa \n\n_ \n\nut \n\n_ \n\n~'~ \n:i'I~~ \n... I' \n_ . . \n=.. \n- ..\ne. \" ~ ~ _ \n\n' \nut \n\n~.\n\n_ \n\n. \n\n( \n\nI \n\n, \n\noutput reflectance - without edge \n\n_ \n\nFig. 3. (a) Koffka Ring. (b) Koftka Ring with \n(c) Horizontal \nmidline drawn across annulus. \nscan lines across Koffka Ring. \nTop: Scan \nline starting at arrow in (b). \nMiddle: Scan \nline at corresponding location in the output of \nlinear operator acting on (b). Bottom: Scan line \nat same location in the output of operator acting \non (a). \n\n\f629 \n\nWe also compare the linear estimation technique with a \"backpropagation\" net(cid:173)\n\nwork: gradient descent on a 2-layer network with sigmoid units 25 (32 inputs, 32 \n\"hidden units\", and 32 linear outputs), using training vectors 32 pixels long. The net(cid:173)\nwork requires an order of magnitude more time to converge to a stable configuration \nthan does the linear estimator for the same set of 32-pixel examples. The network's \nperformance is slightly, yet consistently, better, measured as the root-mean-square er(cid:173)\nror in output, averaged over sets of at least 2000 new input vectors. Interestingly, the \nbackpropagation network and the linear estimator err in the same way on the same \ninput vectors. It is possible that the backpropagation network may show considerable \ninprovement over the linear estimator in a world more complex than the Mondrian one. \nWe are presently examining its performance on images with real-world features such \nas shading, shadows, and highlights26. \n\nWe do not think that our results mean that color constancy may be learned during \na critical period by biological organisms. It seems more reasonable to consider them \nsimply as a demonstration on a toy world that in the course of evolution a visual system \nmay recover and exploit natural constraints hidden in the physics of the world. The \nsignificance of our results lies in the facts that a simple statistical technique may be used \nto synthesize a lightness algorithm from examples; that the technique does as well as \nother techniques such as backpropagation; and that a similar technique may be used for \nother problems in early vision. Furthermore, the synthesized operator resembles both \nLand's psychophysically-tested retinex operator and a neuronal nonclassical receptive \nfield. The operator's properties suggest that simultaneous color (or brightness) contrast \nmight be the result of the visual system's attempt to discount illumination gradients \n27 \n\nREFERENCES AND NOTES \n\n1. Since we do not have perfect color constancy, our visual system must not extract \nreflectance exactly. The limits on color constancy might reveal limits on the underlying \ncomputation. \n\n2. E.H. Land, Am. Sci. 52,247 (1964). \n3. E.H. Land and J.J. McCann, J. Opt. Soc. Am. 61, 1 {1971}. \n4. E.H. Land, in Central and Peripheral Mechanisms of Colour Vision, T. Ottoson \n\nand S. Zeki, Eds., (Macmillan, New York, 1985), pp. 5-17. \n\n5. E.H. Land, Proc. Nat. Acad. Sci. USA 83, 3078 (1986). \n6. B.K.P. Hom, Computer Graphics and Image Processing 3, 277 (1974). \n\n\f630 \n\n7. A. Blake, in Central and Peripheral Mechanisms of Colour Vision, T. Ottoson \n\nand S. Zeki, Eds., (Macmillan, New York, 1985), pp. 45-59. \n\n8. A. Hurlbert, J. Opt. Soc. Am. A 3,1684 (1986). \n9. A. Hurlbert and T. Poggio, ArtificiaLIntelligence Laboratory Memo 909, (M.LT., \n\nCambridge, MA, 1987). \n\n10. r'{x,y) can be recovered at best only to within a constant, since Eq. 1 \nis invariant under the transformation of r' int.o ar' and e' into a-ie', where a is a \nconstant. \n\n11. A. Albert, Regression and the Moore-Penrose Pseudoinllerse, (Academic Press, \n\nNew York, 1972). \n\n12. The pseudoinverse, and therefore L, may also be computed by recursive tech(cid:173)\n\nniques that improve its form as more data become available ll . \n\n13. Our synthesized filter is not exactly identical with Land's: the filter of Fig. \n\n2 subtracts from the value at each point the average value of the logarithm of irra(cid:173)\ndiance at all pixels, rather than the logarithm of the average values. The estimated \noperator is therefore linear in the logarithms, whereas Land's is not. The numerical \ndifference between the outputs of the two filters is small in most cases (Land, personal \ncommunication), and both agree well with psychophysical results. \n\n14. R. Desimone, S.J. Schein, J. Moran and L.G. Ungerleider, Vision Res. 25, \n\n441 (1985). \n\n15. H.M. Wild, S.R. Butler, D. Carden and J.J. Kulikowski, Nature (London) 313, \n\n133 (1985). \n\n16. S.M. Zeki, Neuroscience 9, 741 (1983). \n17. S.M. Zeki, Neuroscience 9, 767 (1983). \n18. T. Poggio, et. al, in Proceedings Image Understanding Workshop, L. Bau(cid:173)\n\nmann, Ed., (Science Applications International Corporation, McLean, VA, 1985), pp.' \n25-39. \n\n19. T. Poggio, V. Torre and C. Koch, Nature (London) 317,314 (1985). \n20. A. Valberg and B. Lange-Malecki, Investigative Ophthalmology and Visual \n\nScience Supplement 28, 92 (1987). \n\n21. K. Koffka, Principles of Gestalt Psychology, (Harcourt, Brace and Co., New \n\nYork, 1935). \n\n22. Note that the operator achieves this effect by subtracting a non-existent illu(cid:173)\n\nmination gradient from the input signal. \n\n23. T. Poggio and A. Hurlbert, Artificial Intelligence Laboratory Working Paper \n\n264, (M.LT., Cambridge, MA, 1984). \n\n24. Estimation of the operator on two-dimensional examples is possible, but com(cid:173)\n\nputationally very expensive if done in the same way. The present computer simulations \nrequire several hours when run on standard serial computers. The two-dimensional case \n\n\f631 \n\nwill need much more time (our one-dimensional estimation scheme runs orders of mag(cid:173)\nnitude faster on a CM-1 Connection Machine System with 16K-processors). \n\n25. D. E. Rumelhart, G.E. Hinton and R.J. Williams, Nature (London) 323, 533 \n\n(1986 ). \n\n26. A. Hurlbert, The Computation of Color, Ph.D. Thesis, M.l. T., Cambridge, \n\nMA, in preparation. \n\n2i. We are grateful to E. Land, E. Hildreth, .J. Little, F. Wilczek and D. Hillis \nfor reading the draft and for useful discussions. A. Rottenberg developed the routines \nfor matrix operations that we used on the Connection Machine. T. Breuel wrote the \nbackpropagation simulator. \n\n\f", "award": [], "sourceid": 17, "authors": [{"given_name": "Tomaso", "family_name": "Poggio", "institution": null}, {"given_name": "Anya", "family_name": "Hurlbert", "institution": null}]}