{"title": "Active Preference Learning with Discrete Choice Data", "book": "Advances in Neural Information Processing Systems", "page_first": 409, "page_last": 416, "abstract": "We propose an active learning algorithm that learns a continuous valuation model from discrete preferences. The algorithm automatically decides what items are best presented to an individual in order to find the item that they value highly in as few trials as possible, and exploits quirks of human psychology to minimize time and cognitive burden. To do this, our algorithm maximizes the expected improvement at each query without accurately modelling the entire valuation surface, which would be needlessly expensive. The problem is particularly difficult because the space of choices is infinite. We demonstrate the effectiveness of the new algorithm compared to related active learning methods. We also embed the algorithm within a decision making tool for assisting digital artists in rendering materials. The tool finds the best parameters while minimizing the number of queries.", "full_text": "Active Preference Learning with Discrete Choice Data\n\nEric Brochu, Nando de Freitas and Abhijeet Ghosh\n\nDepartment of Computer Science\nUniversity of British Columbia\n\n{ebrochu, nando, ghosh}@cs.ubc.ca\n\nVancouver, BC, Canada\n\nAbstract\n\nWe propose an active learning algorithm that learns a continuous valuation model\nfrom discrete preferences. The algorithm automatically decides what items are\nbest presented to an individual in order to \ufb01nd the item that they value highly in\nas few trials as possible, and exploits quirks of human psychology to minimize\ntime and cognitive burden. To do this, our algorithm maximizes the expected\nimprovement at each query without accurately modelling the entire valuation sur-\nface, which would be needlessly expensive. The problem is particularly dif\ufb01cult\nbecause the space of choices is in\ufb01nite. We demonstrate the effectiveness of the\nnew algorithm compared to related active learning methods. We also embed the\nalgorithm within a decision making tool for assisting digital artists in rendering\nmaterials. The tool \ufb01nds the best parameters while minimizing the number of\nqueries.\n\n1 Introduction\n\nA computer graphics artist sits down to use a simple renderer to \ufb01nd appropriate surfaces for a\ntypical re\ufb02ectance model. It has a series of parameters that must be set to control the simulation:\n\u201cspecularity\u201d, \u201cFresnel re\ufb02ectance coef\ufb01cient\u201d, and other, less-comprehensible ones. The parame-\nters interact in ways dif\ufb01cult to discern. The artist knows in his mind\u2019s eye what he wants, but he\u2019s\nnot a mathematician or a physicist \u2014 no course he took during his MFA covered Fresnel re\ufb02ectance\nmodels. Even if it had, would it help? He moves the specularity slider and waits for the image\nto be generated. The surface is too shiny. He moves the slider back a bit and runs the simulation\nagain. Better. The surface is now appropriately dull, but too dark. He moves a slider down. Now\nit\u2019s the right colour, but the specularity doesn\u2019t look quite right any more. He repeatedly bumps the\nspecularity back up, rerunning the renderer at each attempt until it looks right. Good. Now, how to\nmake it look metallic...?\nProblems in simulation, animation, rendering and other areas often take such a form, where the\ndesired end result is identi\ufb01able by the user, but parameters must be tuned in a tedious trial-and-\nerror process. This is particularly apparent in psychoperceptual models, where continual tuning is\nrequired to make something \u201clook right\u201d. Using the animation of character walking motion as an\nexample, for decades, animators and scientists have tried to develop objective functions based on\nkinematics, dynamics and motion capture data [Cooper et al., 2007]. However, even when expen-\nsive mocap is available, we simply have to watch an animated \ufb01lm to be convinced of how far we\nstill are from solving the gait animation problem. Unfortunately, it is not at all easy to \ufb01nd a mapping\nfrom parameterized animation to psychoperceptual plausibility. The perceptual objective function is\nsimply unknown. Fortunately, however, it is fairly easy to judge the quality of a walk \u2014 in fact, it is\ntrivial and almost instantaneous. The application of this principle to animation and other psychoper-\nceptual tools is motivated by the observation that humans often seem to be forming a mental model\nof the objective function. This model enables them to exploit feasible regions of the parameter space\nwhere the valuation is predicted to be high and to explore regions of high uncertainty. It is our the-\n\n1\n\n\fFigure 1: An illustrative example of the difference between models learned for regression vesus optimization.\nThe regression model \ufb01ts the true function better overall, but doesn\u2019t \ufb01t at the maximum better than anywhere\nelse in the function. The optimization model is less accurate overall, but \ufb01ts the area of the maximum very\nwell. When resources are limited, such as an active learning environment, it is far more useful to \ufb01t the area\nof interest well, even at the cost of overall predictive performance. Getting a good \ufb01t for the maximum will\nrequire many more samples using conventional regression.\nsis that the process of tweaking parameters to \ufb01nd a result that looks \u201cright\u201d is akin to sampling a\nperceptual objective function, and that twiddling the parameters to \ufb01nd the best result is, in essence,\noptimization. Our objective function is the psycho-perceptual process underlying judgement \u2014 how\nwell a realization \ufb01ts what the user has in mind. Following the econometrics terminology, we refer\nto the objective as the valuation. In the case of a human being rating the suitability of a simulation,\nhowever, it is not possible to evaluate this function over the entire domain. In fact, it is in gen-\neral impossible to even sample the function directly and get a consistent response! While it would\ntheoretically be possible to ask the user to rate realizations with some numerical scale, such meth-\nods often have problems with validity and reliability. Patterns of use and other factors can result\nin a drift effect, where the scale varies over time [Siegel and Castellan, 1988]. However, human\nbeings do excel at comparing options and expressing a preference for one over others [Kingsley,\n2006]. This insight allows us to approach the optimization function in another way. By presenting\ntwo or more realizations to a user and requiring only that they indicate preference, we can get far\nmore robust results with much less cognitive burden on the user [Kendall, 1975]. While this means\nwe can\u2019t get responses for a valuation function directly, we model the valuation as a latent func-\ntion, inferred from the preferences, which permits an active learning approach [Cohn et al., 1996;\nTong and Koller, 2000].\nThis motivates our second major insight \u2014 it is not necessary to accurately model the entire ob-\njective function. The problem is actually one of optimization, not regression (Figure 1). We can\u2019t\ndirectly maximize the valuation function, so we propose to use an expected improvement function\n(EIF) [Jones et al., 1998; Sasena, 2002]. The EIF produces an estimate of the utility of knowing the\nvaluation at any point in the space. The result is a principled way of trading off exploration (showing\nthe user examples unlike any they have seen) and exploitation (trying to show the user improvements\non examples they have indicated preference for). Of course, regression-based learning can produce\nan accurate model of the entire valuation function, which would also allow us to \ufb01nd the best valua-\ntion. However, this comes at the cost of asking the user to compare many, many examples that have\nno practical relation what she is looking for, as we demonstrate experimentally in Sections 3 and\n4. Our method tries instead to make the most ef\ufb01cient possible use of the user\u2019s time and cognitive\neffort.\nOur goal is to exploit the strengths of human psychology and perception to develop a novel frame-\nwork of valuation optimization that uses active preference learning to \ufb01nd the point in a parameter\nspace that approximately maximizes valuation with the least effort to the human user. Our goal is\nto of\ufb02oad the cognitive burden of estimating and exploring different sets of parameters, though we\ncan incorporate \u201cslider twiddling\u201d into the framework easily. In Section 4, we present a simple, but\npractical application of our model in a material design gallery that allows artists to \ufb01nd particular\nappearance rendering effects. Furthermore, the valuation function can be any psychoperceptual pro-\ncess that lends itself to sliders and preferences: the model can support an animator looking for a\nparticular \u201ccartoon physics\u201d effect, an artist trying to capture a particular mood in the lighting of a\nscene, or an electronic musician looking for a speci\ufb01c sound or rhythm. Though we use animation\nand rendering as motivating domains, our work has a broad scope of application in music and other\narts, as well as psychology, marketing and econometrics, and human-computer interfaces.\n\n2\n\nregression modeloptimization modelmodeltrue function\f1.1 Previous Work\n\nProbability models for learning from discrete choices have a long history in psychology and econo-\nmetrics [Thurstone, 1927; Mosteller, 1951; Stern, 1990; McFadden, 2001]. They have been studied\nextensively for use in rating chess players, and the Elo system [ \u00b4El\u02ddo, 1978] was adopted by the\nWorld Chess Federation FIDE to model the probability of one player defeating another. Glickman\nand Jensen [2005] use Bayesian optimal design for adaptively \ufb01nding pairs for tournaments. These\nmethods all differ from our work in that they are intended to predict the probability of a prefer-\nence outcome over a \ufb01nite set of possible pairs, whereas we work with in\ufb01nite sets and are only\nincidentally interested in modelling outcomes.\nIn Section 4, we introduce a novel \u201cpreference gallery\u201d application for designing simulated materials\nin graphics and animation to demonstrate the practical utility of our model. In the computer graphics\n\ufb01eld, the Design Gallery [Marks et al., 1997] for animation and the gallery navigation interface for\nBidirectional Re\ufb02ectance Distribution Functions (BRDFs) [Ngan et al., 2006] are artist-assistance\ntools most like ours. They both uses non-adaptive heuristics to \ufb01nd the set of input parameters to be\nused in the generation of the display. We depart from this heuristic treatment and instead present a\nprincipled probabilistic decision making approach to model the design process.\nParts of our method are based on [Chu and Ghahramani, 2005b], which presents a prefer-\nence learning method using probit models and Gaussian processes. They use a Thurstone-\n[Chu\nMosteller model, but with an innovative nonparametric model of the valuation function.\nand Ghahramani, 2005a] adds active learning to the model, though the method presented there\ndiffers from ours in that realizations are selected from a \ufb01nite pool to maximize informative-\nness. More importantly, though, this work, like much other work in the \ufb01eld [Seo et al., 2000;\nGuestrin et al., 2005], is concerned with learning the entire latent function. As our experiments\nshow in Section 3, this is too expensive an approach for our setting, leading us to develop the new\nactive learning criteria presented here.\n\n2 Active Preference Learning\n\nBy querying the user with a paired comparison, one can estimate statistics of the valuation function\nat the query point, but only at considerable expense. Thus, we wish to make sure that the samples\nwe do draw will generate the maximum possible improvement.\nOur method for achieving this goal iterates the following steps:\n\n1. Present the user with a new pair and record the choice: Augment the training set of paired choices\n\nwith the new user data.\n\n2. Infer the valuation function: Here we use a Thurstone-Mosteller model with Gaussian processes.\nSee Sections 2.1 and 2.2 for details. Note that we are not interested in predicting the value of the\nvaluation function over the entire feasible domain, but rather in predicting it well near the optimum.\n3. Formulate a statistical measure for exploration-exploitation: We refer to this measure as the\nexpected improvement function (EIF). Its maximum indicates where to sample next. EI is a function\nof the Gaussian process predictions over the feasible domain. See Section 2.3.\n\n4. Optimize the expected improvement function to obtain the next query point: Finding the maxi-\n\nmum of the EI corresponds to a constrained nonlinear programming problem. See Section 2.3.\n\n2.1 Preference Learning Model\n\nAssume we have shown the user M pairs of items. In each case, the user has chosen which item she\nlikes best. The dataset therefore consists of the ranked pairs D = {rk (cid:31) ck; k = 1, . . . , M}, where\nthe symbol (cid:31) indicates that the user prefers r to c. We use x1:N = {x1, x2, . . . , xN}, xi \u2208 X \u2286 Rd,\nto denote the N elements in the training data. That is, rk and ck correspond to two elements of x1:N .\nOur goal is to compute the item x (not necessarily in the training data) with the highest user valuation\nin as few comparisons as possible. We model the valuation functions u(\u00b7) for r and c as follows:\n\nu(rk) = f(rk) + erk\nu(ck) = f(ck) + eck,\n\n3\n\n(1)\n\n\f2 exp(cid:0)\u2212 1\n\n2 f T K\u22121f(cid:1),\n\nwhere the noise terms are Gaussian: erk \u223c N (0, \u03c32) and eck \u223c N (0, \u03c32). Following [Chu and\nGhahramani, 2005b], we assign a nonparametric Gaussian process prior to the unknown mean valua-\ntion: f(\u00b7) \u223c GP (0, K(\u00b7,\u00b7)). That is, at the N training points. p(f) = |2\u03c0K|\u2212 1\nwhere f = {f(x1), f(x2), . . . , f(xN )} and the symmetric positive de\ufb01nite covariance K has en-\ntries (kernels) Kij = k(xi, xj). Initially we learned these parameters via maximum likelihood, but\nsoon realized that this was unsound due to the scarcity of data. To remedy this, we elected to use\nsubjective priors using simple heuristics, such as expected dataset spread. Although we use Gaus-\nsian processes as a principled method of modelling the valuation, other techniques, such as wavelets\ncould also be adopted.\nRandom utility models such as (1) have a long and in\ufb02uential history in psychology and the study\nof individual choice behaviour in economic markets. Daniel McFadden\u2019s Nobel Prize speech [Mc-\nFadden, 2001] provides a glimpse of this history. Many more comprehensive treatments appear in\nclassical economics books on discrete choice theory.\nUnder our Gaussian utility models, the probability that item r is preferred to item c is given by:\n\n(cid:21)\n(cid:20) f(rk) \u2212 f(ck)\n(cid:82) dk\u2212\u221e exp(cid:0)\u2212a2/2(cid:1) da is the cumulative function of the standard Normal dis-\n\nwhere \u03a6 (dk) = 1\u221a\ntribution. This model, relating binary observations to a continuous latent function, is known as the\nThurstone-Mosteller law of comparative judgement [Thurstone, 1927; Mosteller, 1951]. In statistics\nit goes by the name of binomial-probit regression. Note that one could also easily adopt a logis-\ntic (sigmoidal) link function \u03d5 (dk) = (1 + exp (\u2212dk))\u22121. In fact, such choice is known as the\nBradley-Terry model [Stern, 1990]. If the user had more than two choices one could adopting a\nmultinomial-probit model. This multi-category extension would, for example, enable the user to\nstate no preference for any of the two items being presented.\n\nP (rk (cid:31) ck) = P (u(rk) > u(ck)) = P (eck \u2212 erk < f(rk) \u2212 f(ck)) = \u03a6\n\n\u221a\n2\u03c3\n\n2\u03c0\n\n,\n\n2.2\n\nInference\n\n\u221a\n\nThat is, we want to compute p(f|D) \u221d p(f)(cid:81)M\n\nk=1 p(dk|f), where dk = f (rk)\u2212f (ck)\n\nOur goal is to estimate the posterior distribution of the latent utility function given the discrete data.\n. Although\nthere exist sophisticated variational and Monte Carlo methods for approximating this distribution,\nwe favor a simple strategy: Laplace approximation. Our motivation for doing this is the simplicity\nand computational ef\ufb01ciency of this technique. Moreover, given the amount of uncertainty in user\nvaluations, we believe the choice of approximating technique plays a small role and hence we expect\nthe simple Laplace approximation to perform reasonably in comparison to other techniques. The\napplication of the Laplace approximation is fairly straightforward, and we refer the reader to [Chu\nand Ghahramani, 2005b] for details.\nFinally, given an arbitrary test pair, the predicted utility f (cid:63) and f are jointly Gaussian. Hence, one\ncan obtain the conditional p(f (cid:63)|f) easily. Moreover, the predictive distribution p(f (cid:63)|D) follows by\n\nstraightforward convolution of two Gaussians: p(f (cid:63)|D) =(cid:82) p(f (cid:63)|f)p(f|D)df. One of the criticisms\n\nof Gaussian processes, the fact that they are slow with large data sets, is not a problem for us, since\nactive learning is designed explicitly to minimize the number of training data.\n\n2\u03c3\n\n2.3 The Expected Improvement Function\n\nNow that we are armed with an expression for the predictive distribution, we can use it to decide\nwhat the next query should be. In loose terms, the predictive distribution will enable us to balance the\ntradeoff of exploiting and exploring. When exploring, we should choose points where the predicted\nvariance is large. When exploiting, we should choose points where the predicted mean is large (high\nvaluation).\nLet x(cid:63) be an arbitrary new instance. Its predictive distribution p(f (cid:63)(x(cid:63))|D) has suf\ufb01cient statis-\nM AP )\u22121k(cid:63)}, where, now, k(cid:63)T =\ntics {\u00b5(x(cid:63)) = k(cid:63)T K\u22121f M AP , s2(x(cid:63)) = k(cid:63)(cid:63) \u2212 k(cid:63)T (K + C\u22121\n[k(x(cid:63), x1)\u00b7\u00b7\u00b7 k(x(cid:63), xN )] and k(cid:63)(cid:63) = k(x(cid:63), x(cid:63)). Also, let \u00b5max denote the highest estimate of the\npredictive distribution thus far. That is, \u00b5max is the highest valuation for the data provided by the\nindividual.\n\n4\n\n\fFigure 2: The 2D test function (left), and the estimate of the function based on the results of a typical run of 12\npreference queries (right). The true function has eight local and one global maxima. The predictor identi\ufb01es the\nregion of the global maximum correctly and that of the local maxima less well, but requires far fewer queries\nthan learning the entire function.\n\nThe probability of improvement at a point x(cid:63) is simply given by a tail probability:\n\np(f (cid:63)(x(cid:63)) \u2264 \u00b5max) = \u03a6\n\n(cid:18) \u00b5max \u2212 \u00b5(x(cid:63))\n\n(cid:19)\n\ns(x(cid:63))\n\n,\n\nwhere f (cid:63)(x(cid:63)) \u223c N (\u00b5(x(cid:63)), s2(x(cid:63))). This statistical measure of improvement has been widely used\nin the \ufb01eld of experimental design and goes back many decades [Kushner, 1964]. However, it is\nknown to be sensitive to the value of \u00b5max. To overcome this problem, [Jones et al., 1998] de\ufb01ned\nthe improvement over the current best point as I(x(cid:63)) = max{0, \u00b5(x(cid:63)) \u2212 \u00b5max}, which resulted in\nan expected improvement of\n\n(cid:26) (\u00b5max \u2212 \u00b5(x(cid:63)))\u03a6(d) + s(x(cid:63))\u03c6(d)\n\nEI(x(cid:63)) =\n\n0\n\nif s > 0\nif s = 0\n\n.\n\ns(x(cid:63))\n\nwhere d = \u00b5max\u2212\u00b5(x(cid:63))\nTo \ufb01nd the point at which to sample, we still need to maximize the constrained objective EI(x(cid:63))\nover x(cid:63). Unlike the original unknown cost function, EI(\u00b7) can be cheaply sampled. Furthermore,\nfor the purposes of our application, it is not necessary to guarantee that we \ufb01nd the global maximum,\nmerely that we can quickly locate a point that is likely to be as good as possible. The original EGO\nwork used a branch-and-bound algorithm, but we found it was very dif\ufb01cult to get good bounds\nover large regions. Instead we use DIRECT [Jones et al., 1993], a fast, approximate, derivative-\nfree optimization algorithm, though we conjecture that for larger dimensional spaces, sequential\nquadratic programming with interior point methods might be a better alternative.\n\n3 Experiments\n\nThe goal of our algorithm is to \ufb01nd a good approximation of the maximum of a latent function using\npreference queries. In order to measure our method\u2019s effectiveness in achieving this goal, we create\na function f for which the optimum is known. At each time step, a query is generated in which\ntwo points x1 and x2 are adaptively selected, and the preference is found, where f(x1) > f(x2) \u21d4\nx1 (cid:31) x2. After each preference, we measure the error, de\ufb01ned as \u0001 = fmax \u2212 f(argmaxx f\u2217(x)),\nthat is, the difference between the true maximum of f and the value of f at the point predicted to be\nthe maximum. Note that by design, this does not penalize the algorithm for drawing samples from\nX that are far from argmaxx, or for predicting a latent function that differs from the true function.\nWe are not trying to learn the entire valuation function, which would take many more queries \u2013 we\nseek only to maximize the valuation, which involves accurate modelling only in the areas of high\nvaluation.\nWe measured the performance of our method on three functions \u2013 2D, 4D and 6D. By way of demon-\nstration, Figure 2 shows the actual 2D functions and the typical prediction after several queries. The\ntest functions are de\ufb01ned as:\n\nf2d = max{0, sin(x1) + x1/3 + sin(12x1) + sin(x2) + x2/3 + sin(12x2) \u2212 1}\n\nd(cid:88)\n\ni=1\n\nf4d,6d =\n\nsin(xi) + xi/3 + sin(12xi)\n\n5\n\n00.20.40.60.8100.20.40.60.8100.511.522.5300.20.40.60.8100.20.40.60.81\u22124\u2212202468x 10\u2212400.20.40.60.8100.20.40.60.8100.20.40.60.8100.20.40.60.81\fFigure 3: The evolution of error for the estimate of the optimum on the test functions. The plot shows the error\nevolution \u0001 against the number of queries. The solid line is our method; the dashed is a baseline comparison\nin which each query point is selected randomly. The performance is averaged over 20 runs, with the error bars\nshowing the variance of \u0001.\n\nall de\ufb01ned over the range [0, 1]d. We selected these equations because they seem both general and\ndif\ufb01cult enough that we can safely assume that if our method works well on them, it should work on a\nlarge class of real-world problems \u2014 they have multiple local minima to get trapped in and varying\nlandscapes and dimensionality. Unfortunately, there has been little work in the psychoperception\nliterature to indicate what a good test function would be for our problem, so we have had to rely to\nan extent on our intuition to develop suitable test cases.\nThe results of the experiments are shown in Figure 3. In all cases, we simulate 50 queries using our\nmethod (here called maxEI). As a baseline, we compare against 50 queries using the maximum\nvariance of the model (maxs), which is a common criterion in active learning for regression [Seo\net al., 2000; Chu and Ghahramani, 2005a]. We repeated each experiment 20 times and measured\nthe mean and variance of the error evolution. We \ufb01nd that it takes far fewer queries to \ufb01nd a good\nresult using maxEI in all cases.\nIn the 2D case, for example, after 20 queries, maxEI already\nhas better average performance than maxs achieves after 50, and in both the 2D and 4D scenarios,\nmaxEI steadily improves until it \ufb01nd the optima, while maxs soon reaches a plateau, improving only\nslightly, if at all, while it tries to improve the global \ufb01t to the latent function. In the 6D scenario,\nneither algorithm succeeds well in \ufb01nding the optimum, though maxEI clearly comes closer. We\nbelieve the problem is that in six dimensions, the space is too large to adequately explore with so few\nqueries, and variance remains quite high throughout the space. We feels that requiring more than 50\nuser queries in a real application would be unacceptable, so we are instead currently investigating\nextensions that will allow the user to direct the search in higher dimensions.\n\n4 Preference Gallery for Material Design\n\nProperly modeling the appearance of a material is a necessary component of realistic image syn-\nthesis. The appearance of a material is formalized by the notion of the Bidirectional Re\ufb02ectance\nDistribution Function (BRDF). In computer graphics, BRDFs are most often speci\ufb01ed using vari-\nous analytical models observing the physical laws of reciprocity and energy conservation while also\nexhibiting shadowing, masking and Fresnel re\ufb02ectance phenomenon. Realistic models are therefore\nfairly complex with many parameters that need to be adjusted by the designer. Unfortunately these\nparameters can interact in non-intuitive ways, and small adjustments to certain settings may result\nin non-uniform changes in appearance. This can make the material design process quite dif\ufb01cult for\nthe end user, who cannot expected to be an expert in the \ufb01eld of appearance modeling.\nOur application is a solution to this problem, using a \u201cpreference gallery\u201d approach, in which users\nare simply required to view two or more images rendered with different material properties and\nindicate which ones they prefer. To maximize the valuation, we use an implementation of the model\ndescribed in Section 2. In practice, the \ufb01rst few examples will be points of high variance, since little\nof the space is explored (that is, the model of user valuation is very uncertain). Later samples tend\nto be in regions of high valuation, as a model of the user\u2019s interest is learned.\nWe use our active preference learning model on an example gallery application for helping users\n\ufb01nd a desired BRDF. For the purposes of this example, we limit ourselves to isotropic materials and\nignore wavelength dependent effects in re\ufb02ection. The gallery uses the Ashikhmin-Shirley Phong\n\n6\n\n102030400.01.02.03.04.04D function\u01eb10203040102030400.00.20.40.60.81.02D function4.05.06.07.08.06D functionpreference queries\fTable 1: Results of the user study\n\nalgorithm\nlatin hypercubes\nmaxs\nmaxEI\n\ntrials n (mean \u00b1 std)\n18.40 \u00b1 7.87\n50\n17.87 \u00b1 8.60\n50\n8.56 \u00b1 5.23\n50\n\nmodel [Ashikhmin and Shirley, 2000] for the BRDFs which was recently validated to be well suited\nfor representing real materials [Ngan et al., 2005]. The BRDFs are rendered on a sphere under high\nfrequency natural illumination as this has been shown to be the desired setting for human preception\nof re\ufb02ectance [Fleming et al., 2001]. Our gallery demonstration presents the user with two BRDF\nimages at a time. We start with four predetermined queries to \u201cseed\u201d the parameter space, and after\nthat use the learned model to select gallery images. The GP model is updated after each preference\nis indicated. We use parameters of real measured materials from the MERL database [Ngan et al.,\n2005] for seeding the parameter space, but can draw arbitrary parameters after that.\n\n4.1 User Study\n\nTo evaluate the performance of our application, we have run a simple user study in which the gen-\nerated images are restricted to a subset of 38 materials from the MERL database that we deemed to\nbe representative of the appearance space of the measured materials. The user is given the task of\n\ufb01nding a single randomly-selected image from that set by indicating preferences. Figure 4 shows a\ntypical user run, where we ask the user to use the preference gallery to \ufb01nd a provided target image.\nAt each step, the user need only indicate the image they think looks most like the target. This would,\nof course, be an unrealistic scenario if we were to be evaluating the application from an HCI stance,\nbut here we limit our attention to the model, where we are interested here in demonstrating that with\nhuman users maximizing valuation is preferable to learning the entire latent function.\nUsing \ufb01ve subjects, we compared 50 trials using the EIF to select the images for the gallery (maxEI),\n50 trials using maximum variance (maxs, the same criterion as in the experiments of Section 3), and\n50 trials using samples selected using a randomized Latin hypercube algorithm. In each case, one of\nthe gallery images was the image with the highest predicted valuation and the other was selected by\nthe algorithm. The algorithm type for each trial was randomly selected by the computer and neither\nthe experimenter nor the subjects knew which of the three algorithms was selecting the images. The\nresults are shown in Table 1. n is the number clicks required of the user to \ufb01nd the target image.\nClearly maxEI dominates, with a mean n less than half that of the competing algorithms. Interest-\ningly, selecting images using maximum variance does not perform much better than random. We\nsuspect that this is because maxs has a tendency to select images from the corners of the param-\neter space, which adds limited information to the other images, whereas Latin hypercubes at least\nguarantees that the selected images \ufb01ll the space.\nActive learning is clearly a powerful tool for situations where human input is required for learning.\nWith this paper, we have shown that understanding the task \u2014 and exploiting the quirks of human\ncognition \u2014 is also essential if we are to deploy real-world active learning applications. As peo-\nple come to expect their machines to act intelligently and deal with more complex environments,\nmachine learning systems that can collaborate with users and take on the tedious parts of users\u2019\ncognitive burden has the potential to dramatically affect many creative \ufb01elds, from business to the\narts to science.\n\nReferences\n[Ashikhmin and Shirley, 2000] M. Ashikhmin and P. Shirley. An anisotropic phong BRDF model. J. Graph.\n\nTools, 5(2):25\u201332, 2000.\n\n[Chu and Ghahramani, 2005a] W. Chu and Z. Ghahramani. Extensions of Gaussian processes for ranking:\n\nsemi-supervised and active learning. In Learning to Rank workshop at NIPS-18, 2005.\n\n[Chu and Ghahramani, 2005b] W. Chu and Z. Ghahramani. Preference learning with Gaussian processes. In\n\nICML, 2005.\n\n[Cohn et al., 1996] D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models.\n\nJournal of Arti\ufb01cial Intelligence Research, 4:129\u2013145, 1996.\n\n7\n\n\fT arget\n\n1(cid:46)\n\n3(cid:46)\n\n2(cid:46)\n\n4(cid:46)\n\nFigure 4: A shorter-than-average but otherwise typical run of the preference gallery tool. At each (numbered)\niteration, the user is provided with two images generated with parameter instances and indicates the one they\nthink most resembles the target image (top-left) they are looking for. The boxed images are the user\u2019s selections\nat each iteration.\n\n[Cooper et al., 2007] S. Cooper, A. Hertzmann, and Z. Popovi\u00b4c. Active learning for motion controllers. In\n\nSIGGRAPH, 2007.\n\n\u00b4A. \u00b4El\u02ddo. The Rating of Chess Players: Past and Present. Arco Publishing, New York, 1978.\n\n[ \u00b4El\u02ddo, 1978]\n[Fleming et al., 2001] R. Fleming, R. Dror, and E. Adelson. How do humans determine re\ufb02ectance properties\nIn CVPR Workshop on Identifying Objects Across Variations in Lighting,\n\nunder unknown illumination?\n2001.\n\n[Glickman and Jensen, 2005] M. E. Glickman and S. T. Jensen. Adaptive paired comparison design. Journal\n\nof Statistical Planning and Inference, 127:279\u2013293, 2005.\n\n[Guestrin et al., 2005] C. Guestrin, A. Krause, and A. P. Singh. Near-optimal sensor placements in Gaussian\n\nprocesses. In Proceedings of the 22nd International Conference on Machine Learning (ICML-05), 2005.\n\n[Jones et al., 1993] D. R. Jones, C. D. Perttunen, and B. E. Stuckman. Lipschitzian optimization without the\n\nLipschitz constant. J. Optimization Theory and Apps, 79(1):157\u2013181, 1993.\n\n[Jones et al., 1998] D. R. Jones, M. Schonlau, and W. J. Welch. Ef\ufb01cient global optimization of expensive\n\nblack-box functions. J. Global Optimization, 13(4):455\u2013492, 1998.\n\n[Kendall, 1975] M. Kendall. Rank Correlation Methods. Grif\ufb01n Ltd, 1975.\n[Kingsley, 2006] D. C. Kingsley. Preference uncertainty, preference re\ufb01nement and paired comparison choice\n\nexperiments. Dept. of Economics, University of Colorado, 2006.\n\n[Kushner, 1964] H. J. Kushner. A new method of locating the maximum of an arbitrary multipeak curve in the\n\npresence of noise. Journal of Basic Engineering, 86:97\u2013106, 1964.\n\n[Marks et al., 1997] J. Marks, B. Andalman, P. A. Beardsley, W. Freeman, S. Gibson, J. Hodgins, T. Kang,\nB. Mirtich, H. P\ufb01ster, W. Ruml, K. Ryall, J. Seims, and S. Shieber. Design galleries: A general approach to\nsetting parameters for computer graphics and animation. Computer Graphics, 31, 1997.\n\n[McFadden, 2001] D. McFadden. Economic choices. The American Economic Review, 91:351\u2013378, 2001.\n[Mosteller, 1951] F. Mosteller. Remarks on the method of paired comparisons: I. the least squares solution\n\nassuming equal standard deviations and equal correlations. Psychometrika, 16:3\u20139, 1951.\n\n[Ngan et al., 2005] A. Ngan, F. Durand, and W. Matusik. Experimental analysis of BRDF models. In Pro-\n\nceedings of the Eurographics Symposium on Rendering, pages 117\u2013226, 2005.\n\n[Ngan et al., 2006] A. Ngan, F. Durand, and W. Matusik. Image-driven navigation of analytical BRDF models.\n\nIn T. Akenine-M\u00a8oller and W. Heidrich, editors, Eurographics Symposium on Rendering, 2006.\n\n[Sasena, 2002] M. J. Sasena. Flexibility and Ef\ufb01ciency Enhancement for Constrained Global Design Opti-\n\nmization with Kriging Approximations. PhD thesis, University of Michigan, 2002.\n\n[Seo et al., 2000] S. Seo, M. Wallat, T. Graepel, and K. Obermayer. Gaussian process regression: active data\n\nselection and test point rejection. In Proceedings of IJCNN 2000, 2000.\n\n[Siegel and Castellan, 1988] S. Siegel and N. J. Castellan. Nonparametric Statistics for the Behavioral Sci-\n\nences. McGraw-Hill, 1988.\n\n[Stern, 1990] H. Stern. A continuum of paired comparison models. Biometrika, 77:265\u2013273, 1990.\n[Thurstone, 1927] L. Thurstone. A law of comparative judgement. Psychological Review, 34:273\u2013286, 1927.\n[Tong and Koller, 2000] S. Tong and D. Koller. Support vector machine active learning with applications to\n\ntext classi\ufb01cation. In Proc. ICML-00, 2000.\n\n8\n\n\f", "award": [], "sourceid": 902, "authors": [{"given_name": "Brochu", "family_name": "Eric", "institution": null}, {"given_name": "Nando", "family_name": "Freitas", "institution": null}, {"given_name": "Abhijeet", "family_name": "Ghosh", "institution": null}]}