{"title": "Active Learning For Identifying Function Threshold Boundaries", "book": "Advances in Neural Information Processing Systems", "page_first": 163, "page_last": 170, "abstract": "", "full_text": "Active Learning For Identifying Function\n\nThreshold Boundaries\n\nCenter for Automated Learning and Discovery\n\nBrent Bryan\n\nCarnegie Mellon University\n\nPittsburgh, PA 15213\n\nJeff Schneider\nRobotics Institute\n\nCarnegie Mellon University\n\nPittsburgh, PA 15213\n\nbryanba@cs.cmu.edu\n\nschneide@cs.cmu.edu\n\nRobert C. Nichol\n\nInstitute of Cosmology and Gravitation\n\nUniversity of Portsmouth\nPortsmouth, PO1 2EG, UK\n\nChristopher J. Miller\n\nObservatorio Cerro Tololo\n\nObservatorio de AURA en Chile\n\nLa Serena, Chile\n\nbob.nichol@port.ac.uk\n\ncmiller@noao.edu\n\nChristopher R. Genovese\nDepartment of Statistics\n\nCarnegie Mellon University\n\nPittsburgh, PA 15213\n\nLarry Wasserman\n\nDepartment of Statistics\n\nCarnegie Mellon University\n\nPittsburgh, PA 15213\n\ngenovese@stat.cmu.edu\n\nlarry@stat.cmu.edu\n\nAbstract\n\nWe present an ef\ufb01cient algorithm to actively select queries for learning\nthe boundaries separating a function domain into regions where the func-\ntion is above and below a given threshold. We develop experiment selec-\ntion methods based on entropy, misclassi\ufb01cation rate, variance, and their\ncombinations, and show how they perform on a number of data sets. We\nthen show how these algorithms are used to determine simultaneously\nvalid 1 \u2212 \u03b1 con\ufb01dence intervals for seven cosmological parameters. Ex-\nperimentation shows that the algorithm reduces the computation neces-\nsary for the parameter estimation problem by an order of magnitude.\n\n1 Introduction\nIn many scienti\ufb01c and engineering problems where one is modeling some function over an\nexperimental space, one is not necessarily interested in the precise value of the function\nover an entire region. Rather, one is curious about determining the set of points for which\nthe function exceeds some particular value. Applications include determining the func-\ntional range of wireless networks [1], factory optimization analysis, and gaging the extent\nof environmental regions in geostatistics. In this paper, we use this idea to compute con\ufb01-\ndence intervals for a set of cosmological parameters that affect the shape of the temperature\npower spectrum of the Cosmic Microwave Background (CMB).\n\nIn one dimension, the threshold discovery problem is a root-\ufb01nding problem where no\n\n\fhints as to the location or number of solutions are given; several methods exist which can\nbe used to solve this problem (e.g. bisection, Newton-Raphson). However, one dimensional\nalgorithms cannot be easily extended to the multivariate case. In particular, the ideas of root\nbracketing and function transversal are not well de\ufb01ned [2]; given a particular bracket of a\ncontinuous surface, there will be an in\ufb01nite number of solutions to the equation f (~x) \u2212 t =\n0, since the solution in multiple dimensions is a set of surfaces, rather than a set of points.\nNumerous active learning papers deal with similar problems in multiple dimensions. For\ninstance, [1] presents a method for picking experiments to determine the localities of local\nextrema when the input space is discrete. Others have used a variety of techniques to reduce\nthe uncertainty over the problem\u2019s entire domain to map out the function (e.g. [3], and [4]),\nor locate the optimal value (e.g. [5]).\n\nWe are interested in locating the subset of the input space wherein the function is above\na given threshold. Algorithms that merely \ufb01nd a local optimum and search around it will\nnot work in general, as there may be multiple disjoint regions above the threshold. While\ntechniques that map out the entire surface of the underlying function will correctly identify\nthose regions which are above a given threshold, we assert that methods can be developed\nthat are more ef\ufb01cient at localizing a particular contour of the function. Intuitively, points\non the function that are located far from the boundary are less interesting, regardless of\ntheir variance. In this paper, we make the following contributions to the literature:\n\n\u2022 We present a method for choosing experiments that is more ef\ufb01cient than global\nvariance minimization, as well as other heuristics, when one is solely interested in\nlocalizing a function contour.\n\n\u2022 We show that this heuristic can be used in continuous valued input spaces, without\n\nde\ufb01ning a priori a set of possible experiments (e.g. imposing a grid).\n\n\u2022 We use our function threshold detection method to determine 1\u2212\u03b1 simultaneously\nvalid con\ufb01dence intervals of CMB parameters, making no assumptions about the\nmodel being \ufb01t and few assumptions about the data in general.\n\n2 Algorithm\nWe begin by formalizing the problem. Assume that we are given a bounded sample space\nS \u2282 Rn and a scoring function: f : S \u2192 R, but possibly no data points ({s, f (s)}, s \u2208 S).\nGiven a threshold t, we want to \ufb01nd the set of points S \u2032 where f is equal to or above the\nthreshold: {s \u2208 S \u2032|s \u2208 S, f (s) \u2265 t}. If f is invertible, then the solution is trivial. However,\nit is often the case that f is not trivially invertible, such as the CMB model mentioned in\n\u00a71. In these cases, we can discover S \u2032 by modeling S given some experiments. Thus, we\nwish to know how to choose experiments that help us determine S \u2032 ef\ufb01ciently.\n\nWe assume that the cost to compute f (s) given s is signi\ufb01cant. Thus, care should be taken\nwhen choosing the next experiment, as picking optimum points may reduce the runtime\nof the algorithm by orders of magnitude. Therefore, it is preferable to analyze current\nknowledge about the underlying function and select experiments which quickly re\ufb01ne the\nestimate of the function around the threshold of interest. There are several methods one\ncould use to create a model of the data, notably some form of parametric regression. How-\never, we chose to approximate the unknown boundary as a Gaussian Process (GP), as many\nforms of regression (e.g. linear) necessarily smooths the data, ignoring subtle features of\nthe function that may become pronounced with more data. In particular, we use ordinary\nkriging, a form of GPs, which assumes that the semivariogram (K(\u00b7, \u00b7) is a linear function\nof the distance between samples [6]; this estimation procedure assumes the the sampled\ndata are normal with mean equal to the true function and variance given by the sampling\nnoise . The expected value of K(si, sj) for si, sj \u2208 S, is can be written as\n\nE[K(si, sj)] =\n\nk\n\n2h\n\nn\n\nX\n\nl=1\n\n\u03b1l(sil \u2212 sjl)2i1/2\n\n+ c\n\n\fwhere k is a constant \u2014 known as the kriging parameter \u2014 which is an estimated limit\non the \ufb01rst derivate of the function, \u03b1l is a scaling factor for each dimension, and c is the\nvariance (e.g. experimental noise) of the sampled points. Since, the joint distribution of a\n\ufb01nite set of sampled points for GPs is Gaussian, the predicted distribution of a query point\nsq given a known set A is normal with mean and variance given by\n\n\u00b5sq = \u00b5A + \u03a3\u2032\nAq\u03a3\u22121\n\u03c32\nsq\n\n= \u03a3\u2032\n\nAA(yA \u2212 \u00b5A)\n\nAq\u03a3\u22121\nAA\u03a3Aq\n\n(1)\n(2)\n\nwhere \u03a3Aq denotes the column vector with the ith entry equal to K(si, sq), \u03a3AA denotes\nthe semivariance matrix between the elements of A (the ij element of \u03a3AA is K(si, sj)),\nyA denotes the column vector with the ith entry equal to f (si), the true value of the function\nfor each point in A, and \u00b5A is the mean of the yA\u2019s.\nAs given, prediction with GP requires O(n3) time, as an n \u00d7 n linear system of equations\nmust be solved. However, for many GPs \u2014 and ordinary kriging in particular \u2014 the\ncorrelation between two points decreases as a function of distance. Thus, the full GP\nmodel can be approximated well by a local GP, where only the k nearest neighbors of\nthe query point are used to compute the prediction value; this reduces the computation\ntime to O(k3 log(n)) per prediction, since O(log(n)) time is required to \ufb01nd the k-nearest\nneighbors using spatial indexing structures such as balanced kd-trees.\n\nSince we have assumed that experimentation is expensive, it would be ideal to iteratively\nanalyze the entire input space and pick the next experiment in such a manner that mini-\nmized the total number of experiments necessary. If the size of the parameter space (|S|)\nis \ufb01nite, such an approach may be feasible. However, if |S| is large or in\ufb01nite, testing all\npoints may be impractical. Instead of imposing some arbitrary structure on the possible ex-\nperimental points (such as using a grid), our algorithm chooses candidate points uniformly\nat random from the input space, and then selects the candidate point with the highest score\naccording to the metrics given in \u00a72.1. This allows the input space to be fully explored (in\nexpectation), and ensures that interesting regions of space that would have fallen between\nsuccessive grid points are not missed; in \u00a74 we show how imposing a grid upon the input\nspace results in just such a situation. While the algorithm is unable to consider the en-\ntire space for each sampling iteration, over multiple iterations it does consider most of the\nspace, resulting in the function boundaries being quickly localized, as can be seen in \u00a73.\n\n2.1 Choosing experiments from among candidates\n\nGiven a set of random input points, the algorithm evaluates each one and chooses the point\nwith the highest score as the location for the next experiment. Below is the list of evaluation\nmethods we considered.\nRandom: One of the candidate points is chosen uniformly at random. This method serves\nas a baseline for comparison,\nProbability of incorrect classi\ufb01cation: Since we are trying to map the boundary between\npoints above and below a threshold, we consider choosing the point from our random sam-\nple which has the largest probability of being misclassi\ufb01ed by our model. Using the dis-\ntribution de\ufb01ned by Equations 1 and 2, the probability, p, that the point is above the given\nthreshold can be computed. The point is predicted to be above the threshold if p > 0.5 and\nthus the expected misclassi\ufb01cation probability is min(p, 1 \u2212 p).\nEntropy: Instead of misclassi\ufb01cation probability we can consider entropy: \u2212p log2(p) \u2212\n(1 \u2212 p) log2(1 \u2212 p). Entropy is a monotonic function of the misclassi\ufb01cation rate so these\ntwo will not choose different experiments. They are listed separately because they have\ndifferent effects when mixed with other evaluations. Both entropy and misclassi\ufb01cation\n\n\fwill choose points near the boundary. Unfortunately, they have the drawback that once\nthey \ufb01nd a point near the boundary they continue to choose points near that location and\nwill not explore the rest of the parameter space.\nVariance: Both entropy and probability of incorrect classi\ufb01cation suffer from a lack of\nincentive to explore the space. To rectify this problem, we consider the variance of each\nquery point (given by Equation 2) as an evaluation metric. This metric is common in active\nlearning methods whose goal is to map out an entire function. Since variance is related\nto the distance to nearest neighbors, this strategy chooses points that are far from areas\ncurrently searched, and hence will not get stuck at one boundary point. However, it is well\nknown that such approaches tend to spend a large portion of their time on the edges of the\nparameter space and ultimately cover the space exhaustively [7].\nInformation gain: Information gain is a common myopic metric used in active learning.\nInformation gain at the query point is the same as entropy in our case because all run\nexperiments are assumed to have the same variance. Computing a full measure of informa-\ntion gain over the whole state space would provide an optimal 1-step experiment choice.\nIn some discrete or linear problems this can be done, but it is intractable for continuous\nnon-linear spaces. We believe the good performance of the evaluation metrics proposed\nbelow stems from their being heuristic proxies for global information gain or reduction in\nmisclassi\ufb01cation error.\nProducts of metrics: One way to rectify the problems of point policies that focus solely\non points near the boundary or points with large variance regardless of their relevance to\nre\ufb01ning the predictive model, is to combine the two measures. Intuitively, doing this can\nmimic the idea of information gain; the entropy of a query point measures the classi\ufb01cation\nuncertainty, while the variance is a good estimator of how much impact a new observation\nwould have in this region, and thus what fraction the uncertainty would be reduced. [1]\nproposed scoring points based upon the product of their entropy and variance to identify\nthe presence of local maxima and minima, a problem closely related to boundary detec-\ntion. We shall also consider scoring points based upon the product of their probability of\nincorrect classi\ufb01cation and variance. Note that while entropy and probability of incorrect\nclassi\ufb01cation are monotonically related, entropy times variance and probability of incorrect\nclassi\ufb01cation times variance are not.\nStraddle: Using the same intuition as for products of heuristics, we de\ufb01ne straddle heuris-\n, The straddle algorithm scores points highest\nthat are both unknown and near the boundary. As such, the straddle algorithm prefers points\nnear the threshold, but far from previous examples. The straddle score for a point may be\nnegative, which indicates that the model currently estimates the probability that the point\nis on a boundary is less than \ufb01ve percent. Since the straddle heuristic relies on the variance\nestimate, it is also subject to oversampling edge positions.\n\ntic, as straddle(sq) = 1.96\u02c6\u03c3q \u2212 (cid:12)(cid:12)\n\n\u02c6f (sq) \u2212 t(cid:12)(cid:12)\n\n3 Experiments\nWe now assess the accuracy with which our model reproduces a known function for the\npoint policies just described. This is done by computing the fraction of test points in which\nthe predictive model agrees with the true function about which side of the threshold the\ntest points are on after some \ufb01xed number of experiments. This process is repeated several\ntimes to account for variations due to the random sampling of the input space.\n\nThe \ufb01rst model we consider is a 2D sinusoidal function given by\n\nf (x, y) = sin(10x) + cos(4y) \u2212 cos(3xy)\n\nx \u2208 [0, 1],\n\ny \u2208 [0, 2],\n\nwith a boundary threshold of t = 0. This function and threshold were examined for the\nfollowing reasons: 1) the target threshold winds through the plot giving ample length to\n\n\f 2\n\n 1\n\n 0\n\nA\n\nB\n\nC\n\n 2\n\n 1\n\n 0\n\n 0\n\n 0.5\n\n 1\n\n 0\n\n 0.5\n\n 1\n\n 0\n\n 0.5\n\n 1\n\nFigure 1: Predicted function boundary (solid), true function boundary (dashed), and exper-\niments (dots) for the 2D sinusoid function after A) 50 experiments and B) 100 experiments\nusing the straddle heuristic and C) 100 experiments using the variance heuristic.\n\nTable 1: Number of experiments required to obtain 99% classi\ufb01cation accuracy for the 2D\nmodels and 95% classi\ufb01cation accuracy for the 4D model for various heuristics. Heuristics\nrequiring more than 10,000 experiments to converge are labeled \u201cdid not converge\u201d.\n\nRandom\nEntropy\nVariance\nEntropy\u00d7Var\nProb. Incor.\u00d7Std\nStraddle\n\n2D Sin.(1K Cand.)\n617 \u00b1 158\ndid not converge\n207 \u00b1 7\n117 \u00b1 5\n113 \u00b1 11\n106 \u00b1 5\n\n2D Sin.(31 Cand.)\n617 \u00b1 158\ndid not converge\n229 \u00b1 9\n138 \u00b1 6\n129 \u00b1 14\n123 \u00b1 6\n\n2D DeBoor\n\n7727 \u00b1 987\ndid not converge\n4306 \u00b1 573\n1621 \u00b1 201\n740 \u00b1 117\n963 \u00b1 136\n\n4D Sinusoid\n6254 \u00b1 364\n6121 \u00b1 1740\n2320 \u00b1 57\n1210 \u00b1 43\n1362 \u00b1 89\n1265 \u00b1 94\n\ntest the accuracy of the approximating model, 2) the boundary is discontinuous with several\nsmall pieces, 3) there is an ambiguous region (around (0.9, 1), where the true function is\napproximately equal to the threshold, and the gradient is small and 4) there are areas in\nthe domain where the function is far from the threshold and hence we can ensure that the\nalgorithm is not oversampling in these regions.\n\nTable 1 shows the number of experiments necessary to reach a 99% and 95% accuracy\nfor the 2D and 4D models, respectively. Note that picking points solely on entropy does\nnot converge in many cases, while both the straddle algorithm and probability incorrect\ntimes standard deviation heuristic result in approximations that are signi\ufb01cantly better than\nrandom and variance heuristics. Figures 1A-C con\ufb01rm that the straddle heuristic is aiding\nin boundary prediction. Note that most of the 50 experiments sampled between Figures 1A\nand 1B are chosen near the boundary. The 100 experiments chosen to minimize the variance\nresult in an even distribution over the input space and a worse boundary approximation, as\nseen in Figure 1C. These results indicate that the algorithm is correctly modeling the test\nfunction and choosing experiments that pinpoint the location of the boundary.\n\nFrom the Equations 1 and 2, it is clear that the algorithm does not depend on data dimen-\nsionality directly. To ensure that heuristics are not exploiting some feature of the 2D input\nspace, we consider the 4D sinusoidal function\n\nf (~x) = sin(10x1) + cos(4x2) \u2212 cos(3x1x2) + cos(2x3) + cos(3x4) \u2212 sin(5x3x4)\n\nwhere ~x \u2208 [(0, 0, 1, 0), (1, 2, 2, 2)] and t = 0. Comparison of the 2D and 4D results in Ta-\nble 1 reveals that the relative performance of the heuristics remains unchanged, indicating\nthat the best heuristic for picking experiments is independent of the problem dimension.\n\nTo show that the decrease in the number candidate points relative to the input parameter\nspace that occurs with higher dimensional problems is not an issue, we reconsider the 2D\n\n\fsinusoidal problem. Now, we use only 31 candidate points instead of 1000 to simulate the\npoint density difference between 4D and 2D. Results shown in Table 1, indicate that re-\nducing the number of candidate points does not drastically alter the realized performance.\nAdditional experiments were performed on a discontinuous 2D function (the DeBoor func-\ntion given in [1]) with similar results, as can be seen in Table 1.\n\n4 Statistical analysis of cosmological parameters\nLet us now look at a concrete application of this work: a statistical analysis of cosmolog-\nical parameters that affect formation and evolution of our universe. One key prediction\nof the Big Bang model for the origin of our universe is the presence of a 2.73K cosmic\nmicrowave background radiation (CMB). Recently, the Wilkinson Microwave Anisotropy\nProject (WMAP) has completed a detailed survey of the this radiation exhibiting small\nCMB temperature \ufb02uctuations over the sky [8]. It is believed that the size and spatial prox-\nimity of these temperature \ufb02uctuations depict the types and rates of particle interactions\nin the early universe and consequently characterize the formation of large scale structure\n(galaxies, clusters, walls and voids) in the current observable universe. It is conjectured\nthat this radiation permeated through the universe unchanged since its formation 15 billion\nyears ago. Therefore, the sizes and angular separations of these CMB \ufb02uctuations give an\nunique picture of the universe immediately after the Big Bang and have a large implication\non our understanding of primordial cosmology.\n\nAn important summary of the temperature \ufb02uctuations is the CMB power spectrum shown\nin Figure 2, which gives the temperature variance of the CMB as a function of spatial\nfrequency (or multi-pole moment). It is well known that the shape of this curve is affected\nby at least seven cosmological parameters: optical depth (\u03c4), dark energy mass fraction\n(\u2126\u039b), total mass fraction (\u2126m), baryon density (\u03c9b), dark matter density (\u03c9dm), neutrino\nfraction (fn), and spectral index (ns). For instance, the height of \ufb01rst peak is determined\nby the total energy density of the universe, while the third peak is related to the amount of\ndark matter. Thus, by \ufb01tting models of the CMB power spectrum for given values of the\nseven parameters, we can determine how the parameters in\ufb02uence the shape of the model\nspectrum. By examining those models that \ufb01t the data, we can then establish the ranges of\nthe parameters that result in models which \ufb01t the data.\n\nPrevious work characterizing con\ufb01dence intervals for cosmological parameters either used\nmarginalization over the other parameters, or made assumptions about the values of the\nparameters and/or the shape of the CMB power spectrum. However, [9] notes that \u201cCMB\ndata have now become so sensitive that the key issue in cosmological parameter determi-\nnation is not always the accuracy with which the CMB power spectrum features can be\nmeasured, but often what prior information is used or assumed.\u201d In this analysis, we make\nno assumptions about the ranges or values of the parameters, and assume only that the data\nare normally distributed around the unknown CMB spectrum with covariance known up\nto a constant multiple. Using the method of [10], we create a non-parametric con\ufb01dence\nball (under a weighted squared-error loss) for the unknown spectrum that is centered on a\nnonparametric estimate with a radius for each speci\ufb01ed con\ufb01dence level derived from the\nasymptotic distribution of a pivot statistic1. For any candidate spectrum, membership in the\ncon\ufb01dence ball can be determined by comparing the ball\u2019s radius to the variance weighted\nsum of squares deviation between the candidate function and the center of the ball.\n\nOne advantage of this method is that it gives us simultaneously valid con\ufb01dence intervals\non all seven of our input parameters; this is not true for 1 \u2212 \u03b1 con\ufb01dence intervals derived\nfrom a collection of \u03c72 distributions where the con\ufb01dence intervals often have substantially\nlower coverage [11]. However, there is no way to invert the modeling process to determine\nparameter ranges given a \ufb01xed sum of squared error. Thus, we use the algorithm detailed\n\n1See Appendix 3 in [10] for the derivation of this radius\n\n\f 6000\n\n 4000\n\n 2000\n\ne\nc\nn\na\ni\nr\na\nV\n \ne\nr\nu\nt\na\nr\ne\np\nm\ne\nT\n\n 0\n\n 0\n\n 200 400 600 800\nMultipole Moment\n\nFigure 2: WMAP data, overlaid with re-\ngressed model (solid) and an example of a\nmodel CMB spectrum that barely \ufb01ts at the\n95% con\ufb01dence level (dashed; parameter\nvalues are \u03c9DM = 0.1 and \u03c9B = 0.028).\n\n 0.1\n\n 0.05\n\nB\n\n\u03c9\n\n 0\n\n 0\n\n 0.2\n\n 0.4\n\u03c9\n\nDM\n\n 0.6\n\n 0.8\n\nFigure 3: 95% con\ufb01dence bounds for \u03c9B\nas a function of \u03c9DM . Gray dots denote\nmodels which are rejected at a 95% con-\n\ufb01dence level, while the black dots denote\nthose that are not.\n\nin \u00a72 to map out the con\ufb01dence surface as a function of the input parameters; that is, we\nuse the algorithm to pick a location in the seven dimensional parameter space to perform\nan experiment, and then run CMBFast [12] to create simulated power spectrum given this\nset of input parameters. We can then compute the sum of squares of error for this spectrum\n(relative to the regressed model) and easily tell if the 7D input point is inside the con\ufb01dence\nball. In practice, we model the sum of squared error, not the con\ufb01dence level of the model.\nThis creates a more linear output space, as the con\ufb01dence level for most of the models is\nzero, and thus it is impossible to distinguish between poor and terrible model \ufb01ts.\n\nDue to previous efforts on this project, we were able to estimate the semivariogram of the\nGP from several hundred thousand random points already run through CMBFast. For this\nwork, we chose the \u03b1l\u2019s such that the partials in each dimension where approximately unity,\nresulting in k \u2243 1; c was set to a small constant to account for instabilities in the simulator.\nThese points also gave a starting point for our algorithm2. Subsequently, we have run\nseveral hundred thousand more CMBFast models. We \ufb01nd that it takes 20 seconds to pick\nan experiment from among a set of 2,000 random candidates. CMBFast then takes roughly\n3 minutes to compute the CMB spectrum given our chosen point in parameter space.\n\nIn Figure 3, we show a plot of baryon density (\u03c9B) versus the dark matter density (\u03c9DM) of\nthe universe over all values of the other \ufb01ve parameters (\u03c4, \u2126DE, \u2126M , fn, ns). Experiments\nthat are within a 95% con\ufb01dence ball given the CMB data are plotted in black, while\nthose that are rejected at the 95% level are gray. Note how there are areas that remain\nunsampled, while the boundary regions (transitions between gray and black points) are\nheavily sampled, indicating that our algorithm is choosing reasonable points. Moreover,\nthe results of Figure 3 agree well with results in the literature (derived using parametric\nmodels and Bayesian analysis), as well as with predictions favored by nucleosynthesis [9].\n\nWhile hard to distinguish in Figure 3, the bottom left group of points above the 95% con\ufb01-\ndence boundary splits into two separate peaks in parameter space. The one to the left is the\nconcordance model, while the second peak (the one to the right) is not believed to represent\nthe correct values of the parameters (due to constraints from other data). The existence of\nhigh probability points in this region of the parameter space has been suggested before,\nbut computational limitations have prevented much characterization of it. Moreover, the\nthird peak, near the top right corner of Figure 3 was basically ignored by previous grid\nbased approaches. Comparison of the number of experiments performed by our straddle\n\n2While initial values are not required (as we have seen in \u00a73), it is possible to incorporate this\n\nbackground knowledge into the model to help the algorithm converge more quickly.\n\n\fTable 2: Number of points found in the three peaks for the grid based approach of [9] and\nour straddle algorithm.\n\nPeak Center\n\n\u03c9DM\n\n0.116\n0.165\n0.665\n\n\u03c9B\n\n0.024\n\n0.023\n\n0.122\n\nConcordance Model\n\nPeak 2\nPeak 3\n\nTotal Points\n\n# Points in Effective Radius\nGrid\n2118\n2825\n\nStraddle\n16055\n9634\n5488\n603384\n\n0\n\n5613300\n\nalgorithm with the grid based approach used by [9] is shown in Table 2. Even with only\n10% of the experiments used in the grid approach, we sampled the concordance peak 8\ntimes more frequently, and the second peak 3.4 times more frequently than the grid based\napproach. Moreover, it appears that the grid completely missed the third peak, while our\nmethod sampled it over 5000 times. These results dramatically illustrate the power of our\nadaptive method, and show how it does not suffer from assumptions made by a grid-based\napproaches. We are following up on the scienti\ufb01c rami\ufb01cations of these results in a separate\nastrophysics paper.\n\n5 Conclusions\nWe have developed an algorithm for locating a speci\ufb01ed contour of a function while min-\nimizing the number queries necessary. We described and showed how several different\nmethods for picking the next experimental point from a group of candidates perform on syn-\nthetic test functions. Our experiments indicate that the straddle algorithm outperforms pre-\nviously published methods, and even handles functions with large discontinuities. More-\nover, the algorithm is shown to work on multi-dimensional data, correctly classifying the\nboundary at a 99% level with half the points required for variance minimizing methods.\nWe have then applied this algorithm to a seven dimensional statistical analysis of cosmo-\nlogical parameters affecting the Cosmic Microwave Background. With only a few hundred\nthousand simulations we are able to accurately describe the interdependence of the cosmo-\nlogical parameters, leading to a better understanding of fundamental physical properties.\n\nReferences\n[1] N. Ramakrishnan, C. Bailey-Kellogg, S. Tadepalli, and V. N. Pandey. Gaussian processes for active data mining of spatial\n\naggregates. In Proceedings of the SIAM International Conference on Data Mining, 2005.\n\n[2] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes in C. Cambridge University Press,\n\n2nd edition, 1992.\n\n[3] D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. In G. Tesauro, D. Touretzky, and\n\nT. Leen, editors, Advances in Neural Information Processing Systems, volume 7, pages 705\u2013712. The MIT Press, 1995.\n\n[4] Simon Tong and Daphne Koller. Active learning for parameter estimation in bayesian networks. In NIPS, pages 647\u2013653,\n\n2000.\n\n[5] A. Moore and J. Schneider. Memory-based stochastic optimization. In D. Touretzky, M. Mozer, and M. Hasselm, editors,\n\nNeural Information Processing Systems 8, volume 8, pages 1066\u20131072. MIT Press, 1996.\n\n[6] Noel A. C. Cressie. Statistics for Spatial Data. Wiley, New York, 1991.\n[7] D. MacKay. Information-based objective functions for active data selection. Neural Computation, 4(4):590\u2013604, 1992.\n[8] C. L. Bennett et al. First-Year Wilkinson Microwave Anisotropy Probe (WMAP) Observations: Preliminary Maps and Basic\n\nResults. Astrophysical Journal Supplement Series, 148:1\u201327, September 2003.\n\n[9] M. Tegmark, M. Zaldarriaga, and A. J. Hamilton. Towards a re\ufb01ned cosmic concordance model: Joint 11-parameter\n\nconstraints from the cosmic microwave background and large-scale structure. Physical Review D, 63(4), February 2001.\n\n[10] C. Genovese, C. J. Miller, R. C. Nichol, M. Arjunwadkar, and L. Wasserman. Nonparametric inference for the cosmic\n\nmicrowave background. Statistic Science, 19(2):308\u2013321, 2004.\n\n[11] C. J. Miller, R. C. Nichol, C. Genovese, and L. Wasserman. A non-parametric analysis of the cmb power spectrum. Bulletin\n\nof the American Astronomical Society, 33:1358, December 2001.\n\n[12] U. Seljak and M. Zaldarriaga. A Line-of-Sight Integration Approach to Cosmic Microwave Background Anisotropies.\n\nAstrophyical Journal, 469:437\u2013+, October 1996.\n\n\f", "award": [], "sourceid": 2940, "authors": [{"given_name": "Brent", "family_name": "Bryan", "institution": null}, {"given_name": "Robert", "family_name": "Nichol", "institution": null}, {"given_name": "Christopher", "family_name": "Genovese", "institution": null}, {"given_name": "Jeff", "family_name": "Schneider", "institution": null}, {"given_name": "Christopher", "family_name": "Miller", "institution": null}, {"given_name": "Larry", "family_name": "Wasserman", "institution": null}]}