{"title": "Budgeted Optimization with Concurrent Stochastic-Duration Experiments", "book": "Advances in Neural Information Processing Systems", "page_first": 1098, "page_last": 1106, "abstract": "Budgeted optimization involves optimizing an unknown function that is costly to evaluate by requesting a limited number of function evaluations at intelligently selected inputs. Typical problem formulations assume that experiments are selected one at a time with a limited total number of experiments, which fail to capture important aspects of many real-world problems. This paper defines a novel problem formulation with the following important extensions: 1) allowing for concurrent experiments; 2) allowing for stochastic experiment durations; and 3) placing constraints on both the total number of experiments and the total experimental time. We develop both offline and online algorithms for selecting concurrent experiments in this new setting and provide experimental results on a number of optimization benchmarks. The results show that our algorithms produce highly effective schedules compared to natural baselines.", "full_text": "Budgeted Optimization with Concurrent\n\nStochastic-Duration Experiments\n\nJavad Azimi, Alan Fern, Xiaoli Z. Fern\nSchool of EECS, Oregon State University\n\n{azimi, afern, xfern}@eecs.oregonstate.edu\n\nAbstract\n\nBudgeted optimization involves optimizing an unknown function that is costly to evalu-\nate by requesting a limited number of function evaluations at intelligently selected inputs.\nTypical problem formulations assume that experiments are selected one at a time with\na limited total number of experiments, which fail to capture important aspects of many\nreal-world problems. This paper de\ufb01nes a novel problem formulation with the following\nimportant extensions: 1) allowing for concurrent experiments; 2) allowing for stochastic\nexperiment durations; and 3) placing constraints on both the total number of experiments\nand the total experimental time. We develop both of\ufb02ine and online algorithms for se-\nlecting concurrent experiments in this new setting and provide experimental results on a\nnumber of optimization benchmarks. The results show that our algorithms produce highly\neffective schedules compared to natural baselines.\n\nIntroduction\n\n1\nWe study the optimization of an unknown function f by requesting n experiments, each specifying an input\nx and producing a noisy observation of f (x). In practice, the function f might be the performance of a de-\nvice parameterized by x. We consider the setting where running experiments is costly (e.g. in terms of time),\nwhich renders methods that rely on many function evaluations, such as stochastic search or empirical gra-\ndient methods, impractical. Bayesian optimization (BO) [8, 4] addresses this issue by leveraging Bayesian\nmodeling to maintain a posterior over the unknown function based on previous experiments. The posterior is\nthen used to intelligently select new experiments to trade-off exploring new parts of the experimental space\nand exploiting promising parts.\nTraditional BO follows a sequential approach where only one experiment is selected and run at a time.\nHowever, it is often desirable to select more than one experiment at a time so that multiple experiments\ncan be run simultaneously to leverage parallel facilities. Recently, Azimi et al. (2010) proposed a batch BO\nalgorithm that selects a batch of k \u2265 1 experiments at a time. While this broadens the applicability of BO, it\nis still limited to selecting a \ufb01xed number of experiments at each step. As such, prior work on BO, both batch\nand sequential, completely ignores the problem of how to schedule experiments under \ufb01xed experimental\nbudget and time constraints. Furthermore, existing work assumes that the durations of experiments are\nidentical and deterministic, whereas in practice they are often stochastic.\nConsider one of our motivating applications of optimizing the power output of nano-enhanced Microbial\nFuel Cells (MFCs). MFCs [3] use micro-organisms to generate electricity. Their performance depends\n\n1\n\n\fstrongly on the surface properties of the anode [10]. Our problem involves optimizing nano-enhanced an-\nodes, where various types of nano-structures, e.g. carbon nano-wire, are grown directly on the anode surface.\nBecause there is little understanding of how different nano-enhancements impact power output, optimizing\nanode design is largely guess work. Our original goal was to develop BO algorithms for aiding this process.\nHowever, many aspects of this domain complicate the application of BO. First, there is a \ufb01xed budget on\nthe number of experiments that can be run due to limited funds and a \ufb01xed time period for the project. Sec-\nond, we can run multiple concurrent experiments, limited by the number of experimental apparatus. Third,\nthe time required to run each experiment is variable because each experiment requires the construction of a\nnano-structure with speci\ufb01c properties. Nano-fabrication is highly unpredictable and the amount of time to\nsuccessfully produce a structure is quite variable. Clearly prior BO models fail to capture critical aspects of\nthe experimental process in this domain.\nIn this paper, we consider the following extensions. First, we have l available labs (which may correspond\nto experimental stations at one location or to physically distinct laboratories), allowing up to l concurrent\nexperiments. Second, experiments have stochastic durations, independently and identically distributed ac-\ncording to a known density function pd. Finally, we are constrained by a budget of n total experiments and a\ntime horizon h by which point we must \ufb01nish. The goal is to maximize the unknown function f by selecting\nexperiments and when to start them while satisfying the constraints.\nWe propose of\ufb02ine (Section 4) and online (Section 5) scheduling approaches for this problem, which aim\nto balance two competing factors. First, a scheduler should ensure that all n experiments complete within\nthe horizon h, which encourages high concurrency. Second, we wish to select new experiments given as\nmany previously completed experiments as possible to make more intelligent experiment selections, which\nencourages low concurrency. We introduce a novel measure of the second factor, cumulative prior experi-\nments (CPE) (Section 3), which our approaches aim to optimize. Our experimental results indicate that these\napproaches signi\ufb01cantly outperform a set of baselines across a range of benchmark optimization problems.\n\n2 Problem Setup\nLet X \u2286 (cid:60)d be a d-dimensional compact input space, where each dimension i is bounded in [ai, bi]. An\nelement of X is called an experiment. An unknown real-valued function f : X \u2192 (cid:60) represents the expected\nvalue of the dependent variable after running an experiment. For example, f (x) might be the result of a wet-\nlab experiment described by x. Conducting an experiment x produces a noisy outcome y = f (x) + \u0001, where\n\u0001 is a random noise term. Bayesian Optimization (BO) aims to \ufb01nd an experiment x \u2208 X that approximately\nmaximizes f by requesting a limited number of experiments and observing their outcomes.\nWe extend traditional BO algorithms and study the experiment scheduling problem. Assuming a known\ndensity function pd for the experiment durations, the inputs to our problem include the total number of\navailable labs l, the total number of experiments n, and the time horizon h by which we must \ufb01nish. The\ngoal is to design a policy \u03c0 for selecting when to start experiments and which ones to start to optimize f.\nSpeci\ufb01cally, the inputs to \u03c0 are the set of completed experiments and their outcomes, the set of currently\nrunning experiments with their elapsed running time, the number of free labs, and the remaining time till the\nhorizon. Given this information, \u03c0 must select a set of experiments (possibly empty) to start that is no larger\nthan the number of free labs. Any run of the policy ends when either n experiments are completed or the\ntime horizon is reached, resulting in a set X of n or fewer completed experiments. The objective is to obtain\na policy with small regret, which is the expected difference between the optimal value of f and the value of\nf for the predicted best experiment in X. In theory, the optimal policy can be found by solving a POMDP\nwith hidden state corresponding to the unknown function f. However, this POMDP is beyond the reach of\nany existing solvers. Thus, we focus on de\ufb01ning and comparing several principled policies that work well\nin practice, but without optimality guarantees. Note that this problem has not been studied in the literature\nto the best of our knowledge.\n\n2\n\n\f3 Overview of General Approach\n\n(CPE) of E as:(cid:80)\n\nA policy for our problem must make two types of decisions: 1) scheduling when to start new experiments,\nand 2) selecting the speci\ufb01c experiments to start. In this work, we factor the problem based on these decisions\nand focus on approaches for scheduling experiments. We assume a black box function SelectBatch for\nintelligently selecting the k \u2265 1 experiments based on both completed and currently running experiments.\nThe implementation of SelectBatch is described in Section 6.\nOptimal scheduling to minimize regret appears to be computationally hard for non-trivial instances of Se-\nlectBatch. Further, we desire scheduling approaches that do not depend on the details of SelectBatch, but\nwork well for any reasonable implementation. Thus, rather than directly optimizing regret for a speci\ufb01c\nSelectBatch, we consider the following surrogate criteria. First, we want to \ufb01nish all n experiments within\nthe horizon h with high probability. Second, we would like to select each experiment based on as much\ninformation as possible, measured by the number of previously completed experiments. These two goals are\nat odds, since maximizing the completion probability requires maximizing concurrency of the experiments,\nwhich minimizes the second criterion. Our of\ufb02ine and online scheduling approaches provide different ways\nfor managing this trade-off.\nTo quantify the second criterion, consider a complete execu-\ntion E of a scheduler. For any experiment e in E, let priorE(e)\ndenote the number of experiments in E that completed be-\nfore starting e. We de\ufb01ne the cumulative prior experiments\ne\u2208E priorE(e). Intuitively, a scheduler with\na high expected CPE is desirable, since CPE measures the total\namount of information SelectBatch uses to make its decisions.\nCPE agrees with intuition when considering extreme policies.\nA poor scheduler that starts all n experiments at the same time\n(assuming enough labs) will have a minimum CPE of zero.\nFurther, CPE is maximized by a scheduler that sequentially executes all experiments (assuming enough\ntime). However, in between these extremes, CPE fails to capture certain intuitive properties. For example,\nCPE increases linearly in the number of prior experiments, while one might expect diminishing returns as\nthe number of prior experiments becomes large. Similarly, as the number of experiments started together\n(the batch size) increases, we might also expect diminishing returns since SelectBatch must choose the\nexperiments based on the same prior experiments. Unfortunately, quantifying these intuitions in a general\nway is still an open problem. Despite its potential shortcomings, we have found CPE to be a robust measure\nin practice.\nTo empirically examine the utility of CPE, we conducted experiments on a number of BO benchmarks. For\neach domain, we used 30 manually designed diverse schedulers, some started more experiments early on\nthan later, and vice-versa, while others included random and uniform schedules. We measured the average\nregret achieved for each scheduler given the same inputs and the expected CPE of the executions. Figure 1\nshows the results for two of the domains (other results are highly similar), where each point corresponds to\nthe average regret and CPE of a particular scheduler. We observe a clear and non-trivial correlation between\nregret and CPE, which provides empirical evidence that CPE is a useful measure to optimize. Further, as we\nwill see in our experiments, the performance of our methods is also highly correlated with CPE.\n\nFigure 1: The correlation between CPE and\nregret for 30 different schedulers on two BO\nbenchmarks.\n\n4 Of\ufb02ine Scheduling\n\nWe now consider of\ufb02ine schedules, which assign start times to all n experiments before the experimental\nprocess begins. Note that while the schedules are of\ufb02ine, the overall BO policy has online characteristics,\nsince the exact experiments to run are only speci\ufb01ed when they need to be started by SelectBatch, based\n\n3\n\n0204060801001200.180.20.220.240.260.280.30.32CPERegretCosines0204060801001200.030.040.050.060.070.080.090.1CPERegretHydrogen\fon the most recent information. This of\ufb02ine scheduling approach is often convenient in real experimental\ndomains where it is useful to plan out a static equipment/personnel schedule for the duration of a project.\nBelow we \ufb01rst consider a restricted class of schedules, called staged schedules, for which we present a\nsolution that optimizes CPE. Next, we describe an approach for a more general class of schedules.\n\n4.1 Staged Schedules\n\ni=1, where 0 < ni \u2264 l,(cid:80)\n\ni di \u2264 h, and(cid:80)\n\n(cid:80)i\u22121\n\ni=2 ni\n\nThe CPE of any safe execution of S (slightly abusing notation) is: CPE(S) =(cid:80)N\n\nA staged schedule de\ufb01nes a consecutive sequence of N experimental stages, denoted by a sequence of\ntuples (cid:104)(ni, di)(cid:105)N\ni ni \u2264 n. Stage i begins by starting up ni new\nexperiments selected by SelectBatch using the most recent information, and ends after a duration of di, upon\nwhich stage i + 1 starts. In some applications, staged schedules are preferable as they allow project planning\nto focus on a relatively small number of time points (the beginning of each stage). While our approach tries\nto ensure that experiments \ufb01nish within their stage, experiments are never terminated and hence might run\nlonger than their speci\ufb01ed duration. If, because of this, at the beginning of stage i there are not ni free labs,\nthe experiments will wait till labs free up.\nWe say that an execution E of a staged schedule S is safe if each experiment is completed within its speci\ufb01ed\nduration in S. We say that a staged schedule S is p-safe if with probability at least p an execution of S is safe\nwhich provides a probabilistic guarantee that all n experiments complete within the horizon h. Further, it\nensures with probability p that the maximum number of concurrent experiments when executing S is maxi ni\n(since experiments from two stages will not overlap with probability p). As such, we are interested in \ufb01nding\nstaged schedules that are p-safe for a user speci\ufb01ed p, e.g. 95%. Meanwhile, we want to maximize CPE.\nj=1 nj. Typical\napplications will use relative high values of p, since otherwise experimental resources would be wasted, and\nthus with high probability we expect the CPE of an execution of S to equal CPE(S).\nOur goal is thus to maximize CPE(S) while ensuring p-safeness. It turns out that for any \ufb01xed number of\nstages N, the schedules that maximize CPE(S) must be uniform. A staged schedule is de\ufb01ned to be uniform\nif \u2200i, j, |ni \u2212 nj| \u2264 1, i.e., the batch sizes across stages may differ by at most a single experiment.\nProposition 1. For any number of experiments n and labs l, let SN be the set of corresponding N stage\nschedules, where N \u2265 (cid:100)n/l(cid:101). For any S \u2208 SN , CPE(S) = maxS(cid:48)\u2208SN CPE(S(cid:48)) if and only if S is uniform.\nIt is easy to verify that for a given n and l, an N\nstage uniform schedule achieves a strictly higher\nCPE than any N \u2212 1 stage schedule. This im-\nplies that we should prefer uniform schedules\nwith maximum number of stages allowed by the\np-safeness restriction. This motivates us to solve\nthe following problem: Find a p-safe uniform\nschedule with maximum number of stages.\nOur approach, outlined in Algorithm 1, considers\nN stage schedules in order of increasing N, start-\ning at the minimum possible number of stages\nN = (cid:100)n/l(cid:101) for running all experiments. For each\nvalue of N, the call to MaxProbUniform com-\nputes a uniform schedule S with the highest prob-\nability of a safe execution, among all N stage uni-\nform schedules. If the resulting schedule is p-safe\nthen we consider N + 1 stages. Otherwise, there\nis no uniform N stage schedule that is p-safe and\nwe return a uniform N \u2212 1 stage schedule, which was computed in the previous iteration.\n\nAlgorithm 1 Algorithm for computing a p-safe uniform\nschedule with maximum number of stages.\nInput:number of experiments (n), number of labs (l),\nhorizon (h), safety probability (p)\nOutput:A p-safe uniform schedule with maximum\nnumber of stages\n\nS(cid:48) \u2190 MaxProbUniform(N, n, l, h)\nif S(cid:48) is not p-safe then\n\nN = (cid:100)n/l(cid:101), S \u2190 null\nloop\n\nreturn S\n\nend if\nS \u2190 S(cid:48), N \u2190 N + 1\n\nend loop\n\n4\n\n\f(cid:2)Pd(d\n\n(cid:48)\n\n)(cid:3)N(cid:48)\u00b7n(cid:48)(cid:20)\n\n(cid:18) h \u2212 d(cid:48) \u00b7 N(cid:48)\n\n(cid:19)(cid:21)(N\u2212N(cid:48))\u00b7(n(cid:48)\u22121)\n\nIt remains to describe the MaxProbUniform function, which computes a uniform N stage schedule S =\n(cid:104)(ni, di)(cid:105)N\ni=1 that maximizes the probability of a safe execution. First, any N stage uniform schedule must\nhave N(cid:48) = (n mod N ) stages with n(cid:48) = (cid:98)n/N(cid:99)+1 experiments and N\u2212N(cid:48) stages with n(cid:48)\u22121 experiments.\nFurthermore, the probability of a safe execution is invariant to the ordering of the stages, since we assume\ni.i.d. distribution on the experiment durations. The MaxProbUniform problem is now reduced to computing\nthe durations di of S that maximize the probability of safeness for each given ni. For this we will assume that\nthe distribution of the experiment duration pd is log-concave, which allows us to characterize the solution\nusing the following lemma.\nLemma 1. For any duration distribution pd that is log-concave, if an N stage schedule S = (cid:104)(ni, di)(cid:105)N\nis p-safe, then there is a p-safe N stage schedule S(cid:48) = (cid:104)(ni, d(cid:48)\nThis lemma suggests that any stages with equal ni\u2019s should have equal di\u2019s to maximize the probability of\nsafe execution. For a uniform schedule, ni is either n(cid:48) or n(cid:48) \u2212 1. Thus we only need to consider schedules\nwith two durations, d(cid:48) for stages with ni = n(cid:48) and d(cid:48)(cid:48) for stages with ni = n(cid:48) \u2212 1. Since all durations must\nsum to h, d(cid:48) and d(cid:48)(cid:48) are deterministically related by: d(cid:48)(cid:48) = h\u2212d(cid:48)\u00b7N(cid:48)\n. Based on this, for any value of d(cid:48) the\nN\u2212N(cid:48)\nprobability of the uniform schedule using durations d(cid:48) and d(cid:48)(cid:48) is as follows, where Pd is the CDF of pd.\n\ni=1 such that if ni = nj then d(cid:48)\n\ni = d(cid:48)\nj.\n\ni)(cid:105)N\n\ni=1\n\nPd\n\nN \u2212 N(cid:48)\n\nIndependent Lab Schedules\n\nexperiments mi such that(cid:80)\n\ni mi = n. Further, for each lab i a sequence of mi durations Di = (cid:104)d1\n\n(1)\nWe compute MaxProbUniform by maximizing Equation 1 with respect to d(cid:48) and using the corresponding\nduration for d(cid:48)(cid:48). Putting everything together we get the following result.\nTheorem 1. For any log-concave pd, computing MaxProbUniform by maximizing Equation 1 over d(cid:48), if a\np-safe uniform schedule exists, Algorithm 1 returns a maximum-stage p-safe uniform schedule.\n4.2\nWe now consider a more general class of of\ufb02ine schedules and a heuristic algorithm for computing them.\nThis class allows the start times of different labs to be decoupled, desirable in settings where labs are run\nby independent experimenters. Further, our online scheduling approach is based on repeatedly calling an\nof\ufb02ine scheduler, which requires the \ufb02exibility to make schedules for labs in different stages of execution.\nAn independent lab (IL) schedule S speci\ufb01es a number of labs k < l and for each lab i, a number of\n(cid:105)\ni , . . . , dmi\nis given. The execution of S runs each lab independently, by having each lab start up experiments whenever\nthey move to the next stage. Stage j of lab i ends after a duration of dj\ni , or after the experiment \ufb01nishes\nwhen it runs longer than dj\ni (i.e. we do not terminate experiments). Each experiment is selected according\nto SelectBatch, given information about all completed and running experiments across all labs.\nWe say that an execution of an IL schedule is safe if all experiments \ufb01nish within their speci\ufb01ed durations,\nwhich also yields a notion of p-safeness. We are again interested in computing p-safe schedules that max-\nimizes the CPE. Intuitively, CPE will be maximized if the amount of concurrency during an execution is\nminimized, suggesting the use of as few labs as possible. This motivates the problem of \ufb01nding a p-safe IL\nschedule that use the minimum number of labs. Below we describe our heuristic approach to this problem.\nAlgorithm Description. Starting with k = 1, we compute a k labs IL schedule with the goal of maximizing\nthe probability of safe execution. If this probability is less than p, we increment k, and otherwise output the\nschedule for k labs. To compute a schedule for each value of k, we \ufb01rst allocate the number of experiments\nmi across k labs as uniformly as possible. In particular, (n mod k) labs will have (cid:98)n/k(cid:99) + 1 experiments\nand k \u2212 (n mod k) labs will have (cid:98)n/k(cid:99) experiments. This choice is motivated by the intuition that the\nbest way to maximize the probability of a safe execution is to distribute the work across labs as uniformly\nas possible. Given mi for each lab, we assign all durations of lab i to be h/mi, which can be shown to be\noptimal for log-concave pd. In this way, for each value of k the schedule we compute has just two possible\nvalues of mi and labs with the same mi have the same stage durations.\n\ni\n\n5\n\n\f5 Online Scheduling Approaches\nWe now consider online scheduling, which selects the start time of experiments online. The \ufb02exibility of\nthe online approaches offers the potential to outperform of\ufb02ine schedules by adapting to speci\ufb01c stochastic\noutcomes observed during experimental runs. Below we \ufb01rst describe two baseline online approaches,\nfollowed by our main approach, policy switching, which aims to directly optimize CPE.\nOnline Fastest Completion Policy (OnFCP). This baseline policy simply tries to \ufb01nish all of the n exper-\niments as quickly as possible. As such, it keeps all l labs busy as long as there are experiments left to run.\nSpeci\ufb01cally whenever a lab (or labs) becomes free the policy immediately uses SelectBatch with the latest\ninformation to select new experiments to start right away. This policy will achieve a low value of expected\nCPE since it maximizes concurrency.\nOnline Minimum Eager Lab Policy (OnMEL). One problem with OnFCP is that it does not attempt to\nuse the full time horizon. The OnMEL policy simply restricts OnFCP to use only k labs, where k is the\nminimum number of labs required to guarantee with probability at least p that all n experiments complete\nwithin the horizon. Monte-Carlo simulation is used to estimate p for each k.\nPolicy Switching (PS). Our policy switching approach decides the number of new experiments to start at\neach decision epoch. Decision epochs are assumed to occur every \u2206 units of time, where \u2206 is a small\nconstant relative to the expected experiment durations. The motivation behind policy switching is to exploit\nthe availability of a policy generator that can produce multiple policies at any decision epoch, where at least\none of them is expected to be good. Given such a generator, the goal is to de\ufb01ne a new (switching) policy that\nperforms as well or better than the best of the generated policies in any state. In our case, the objective is to\nimprove CPE, though other objectives can also be used. This is motivated by prior work on policy switching\n[6] over a \ufb01xed policy library, and generalize that work to handle arbitrary policy generators instead of static\npolicy libraries. Below we describe the general approach and then the speci\ufb01c policy generator that we use.\nLet t denote the number of remaining decision epochs (stages-to-go), which is originally equal to (cid:98)h/\u2206(cid:99) and\ndecremented by one each epoch. We use s to denote the experimental state of the scheduling problem, which\nencodes the number of completed experiments and ongoing experiments with their elapsed running time. We\nassume access to a policy generator \u03a0(s, t) which returns a set of base scheduling policies (possibly non-\nstationary) given inputs s and t. Prior work on policy switching [6] corresponds to the case where \u03a0(s, t)\nreturns a \ufb01xed set of policies regardless of s and t. Given \u03a0(s, t), \u00af\u03c0(s, t, \u03c0) denotes the resulting switching\npolicy based on s, t, and the base policy \u03c0 selected in the previous epoch. The decision returned by \u00af\u03c0 is\ncomputed by \ufb01rst conducting N simulations of each policy returned by \u03a0(s, t) along with \u03c0 to estimate their\nCPEs. The base policy with the highest estimated CPE is then selected and its decision is returned by \u00af\u03c0. The\nneed to compare to the previous policy \u03c0 is due to the use of a dynamic policy generator, rather than a \ufb01xed\nlibrary. The base policy passed into policy switching for the \ufb01rst decision epoch can be arbitrary.\nDespite its simplicity, we can make guarantees about the quality of \u00af\u03c0 assuming a bound on the CPE estima-\ntion error. In particular, the CPE of the switching policy will not be much worse than the best of the policies\nproduced by our generator given accurate simulations. We say that a CPE estimator is \u0001-accurate if it can\nestimate the CPE C \u03c0\nt (s) of any base policy \u03c0 for any s and t within an accuracy bound of \u0001. Below we\ndenote the expected CPE of \u00af\u03c0 for s, t, and \u03c0 to be C \u00af\u03c0\nTheorem 2. Let \u03a0(s, t) be a policy generator and \u00af\u03c0 be the switching policy computed with \u0001-accurate\nt (s) \u2212 2t\u0001.\nestimates. For any state s, stages-to-go t, and base policy \u03c0, C \u00af\u03c0\nWe use a simple policy generator \u03a0(s, t) that makes multiple calls to the of\ufb02ine IL scheduler described\nearlier. The intuition is to notice that the produced p-safe schedules are fairly pessimistic in terms of the\nexperiment runtimes.\nIn reality many experiments will \ufb01nish early and we can adaptively exploit such\nsituations. Speci\ufb01cally, rather than follow the \ufb01xed of\ufb02ine schedule we may choose to use fewer labs and\nhence improve CPE. Similarly if experiments run too long, we will increase the number of labs.\n\nt (s, \u03c0) \u2265 max\u03c0(cid:48)\u2208\u03a0(s,t)\u222a{\u03c0} C \u03c0(cid:48)\n\nt (s, \u03c0).\n\n6\n\n\fCosines(2)[1]\n\nHartman(3,6)[7]\nShekel(4)[7]\n\nTable 1: Benchmark Functions\n\n\u03a3i=14\u03b1i exp(cid:2)\u2212\u03a3d\n1 \u2212 (u2 + v2 \u2212 0.3cos(3\u03c0u) \u2212 0.3cos(3\u03c0v)) Rosenbrock(2)[1] 10 \u2212 100(y \u2212 x2)2 \u2212 (1 \u2212 x)2\n(cid:16) i.x2\n(cid:17)(cid:17)20\n\nj=1Aij(xj \u2212 Pij)2(cid:3)\n\nu = 1.6x \u2212 0.5, v = 1.6y \u2212 0.5\n\nMichalewicz(5)[9]\u2212(cid:80)5\n\n(cid:16)\n\ni=1 sin(xi).\n\nsin\n\ni\n\n\u03c0\n\n\u03b11\u00d74, A4\u00d7d, P4\u00d7d are constants\n\n\u03a310\ni=1\n\n\u03b1i+\u03a3j=14(xj\u2212Aji)2\n\n1\n\n\u03b11\u00d710, A4\u00d710 are constants\n\nWe de\ufb01ne \u03a0(s, t) to return k + 1 policies, {\u03c0(s,t,0), . . . , \u03c0(s,t,k)}, where k is the number of experiments\nrunning in s. Policy \u03c0(s,t,i) is de\ufb01ned so that it waits for i current experiments to \ufb01nish, and then uses the\nof\ufb02ine IL scheduler to return a schedule. This amounts to adding a small lookahead to the of\ufb02ine IL scheduler\nwhere different amounts of waiting time are considered 1. Note that the de\ufb01nition of these policies depends\non s and t and hence can not be viewed as a \ufb01xed set of static policies as used by traditional policy switching.\nIn the initial state s0, \u03c0(s0,h,0) corresponds to the of\ufb02ine IL schedule and hence the above theorem guarantees\nthat we will not perform much worse than the of\ufb02ine IL, with the expectation of performing much better.\nWhenever policy switching selects a \u03c0i with i > 0 then no new experiments will be started and we wait for\nthe next decision epoch. For i = 0, it will apply the of\ufb02ine IL scheduler to return a p-safe schedule to start\nimmediately, which may require starting new labs to ensure high probability of completing n experiments.\n\nwith the RBF kernel and the kernel width = 0.01(cid:80)d\n\n6 Experiments\nImplementation of SelectBatch. Given the set of completed experiments O and on-going experiments A,\nSelectBatch selects k new experiments. We implement SelectBatch based on a recent batch BO algorithm\n[2], which greedily selects k experiments considering only O. We modify this greedy algorithm to also\nconsider A by forcing the selected batch to include the ongoing experiments plus k additional experiments.\nSelectBatch makes selections based on a posterior over the unknown function f. We use Gaussian Process\ni=1 li, where li is the input space length in dimension i.\nBenchmark Functions. We evaluate our scheduling policies using 6 well-known synthetic benchmark\nfunctions (shown in Tab. 1 with dimension inside the parenthesis) and two real-world benchmark functions\nHydrogen and FuelCell over [0, 1]2 [2]. The Hydrogen data is produced by a study on biosolar hydrogen\nproduction [5], where the goal was to maximize hydrogen production of a particular bacteria by optimizing\nPH and Nitrogen levels. The FuelCell data was collected in our motivating application mentioned in Sect. 1.\nIn both cases, the benchmark function was created by \ufb01tting regression models to the available data.\nEvaluation. We consider a p-safeness guarantee of p = 0.95 and the number of available labs l is 10. For\npd(x), we use one sided truncated normal distribution such that x \u2208 (0, inf) with \u00b5 = 1, \u03c32 = 0.1, and we\nset the total number of experiments n = 20. We consider three time horizons h of 6, 5, and 4.\nGiven l, n and h, to evaluate policy \u03c0 using function f (with a set of initial observed experiments), we execute\n\u03c0 and get a set X of n or fewer completed experiments. We measure the regret of \u03c0 as the difference between\nthe optimal value of f (known for all eight functions) and the f value of the predicted best experiment in X.\nResults. Table 2 shows the results of our proposed of\ufb02ine and online schedulers. We also include, as a\nreference point, the result of the un-constrained sequential policy (i.e., selecting one experiment at a time)\nusing SelectBatch, which can be viewed as an effective upper bound on the optimal performance of any\nconstrained scheduler because it ignores the time horizon (h = \u221e). The values in the table correspond to\nthe regrets (smaller values are better) achieved by each policy, averaged across 100 independent runs with\nthe same initial experiments (5 for 2-d and 3-d functions and 20 for the rest) for all policies in each run.\n\n1For simplicity our previous discussion of the IL scheduler did not consider states with ongoing experiments, which\nwill occur here. To handle this the scheduler \ufb01rst considers using already executing labs taking into account how long\nthey have been running. If more labs are required to ensure p-safeness new ones are added.\n\n7\n\n\fFunctionh = \u221e OnFCP OfStaged OfIL OnMEL PS OfStaged OfIL OnMEL PS OfStaged OfIL OnMEL PS\n.156\nCosines\n.153\nFuelCell\n.025\nHydro\nRosen\n.009\nHart(3)\n.038\n.480\nMichal\n.510\nShekel\nHart(6)\n.262\nCPE\n138\n\n.274\n.239\n.086\n.011\n.081\n.521\n.682\n.333\n91\n\n.181\n.167\n.071\n.009\n.055\n.500\n.635\n.334\n100\n\n.275\n.258\n.123\n.013\n.096\n.525\n.688\n.354\n66\n\n.205\n.206\n.059\n.008\n.067\n.502\n.623\n.347\n100\n\n.167\n.154\n.036\n.007\n.045\n.477\n.530\n.304\n133\n\n.270\n.230\n.064\n.010\n.070\n.502\n.576\n.301\n120\n\n.181\n.182\n.069\n.010\n.070\n.509\n.630\n.338\n100\n\n.195\n.191\n.070\n.009\n.069\n.508\n.648\n.340\n100\n\n.142\n.160\n.025\n.008\n.037\n.465\n.427\n.265\n190\n\n.339\n.240\n.115\n.013\n.095\n.545\n.660\n.348\n55\n\n.147\n.163\n.035\n.009\n.050\n.460\n.564\n.266\n137\n\nTable 2: The proposed policies results for different horizons.\n\nh=4\n\nh=5\n\nh=6\n\n.194\n.190\n.069\n.008\n.064\n.510\n.645\n.330\n100\n\n.150\n.185\n.042\n.008\n.045\n.494\n.540\n.297\n118\n\nWe \ufb01rst note that the two of\ufb02ine algorithms (OfStages and OfIL) perform similarly across all three horizon\nsettings. This suggests that there is limited bene\ufb01t in these scenarios to using the more \ufb02exible IL schedules,\nwhich were primarily introduced for use in the online scheduling context. Comparing with the two online\nbaselines (OnFCP and OnMEL), the of\ufb02ine algorithms perform signi\ufb01cantly better. This may seem surpris-\ning at \ufb01rst because online policies should offer more \ufb02exibility than \ufb01xed of\ufb02ine schedules. However, the\nof\ufb02ine schedules purposefully wait for experiments to complete before starting up new experiments, which\ntends to improve the CPE values. To see this, the last row of Table 2 gives the average CPEs of each pol-\nicy. Both OnFCP and OnMEL yield signi\ufb01cantly lower CPEs compared to the of\ufb02ine algorithms, which\ncorrelates with their signi\ufb01cantly larger regrets.\nFinally, policy switching consistently outperforms other policies (excluding h = \u221e) on the medium horizon\nsetting and performs similarly in the other settings. This makes sense since the added \ufb02exibility of PS is not\nas critical for long and short horizons. For short horizons, there is less opportunity for scheduling choices and\nfor longer horizons the scheduling problem is easier and hence the of\ufb02ine approaches are more competitive.\nIn addition, looking at Table 2, we see that PS achieves a signi\ufb01cantly higher CPE than of\ufb02ine approaches in\nthe medium horizon, and is similar to them in the other horizons, again correlating with the regret. Further\nexamination of the schedules produced by PS indicates that although it begins with the same number of labs\nas OfIL, PS often selects fewer labs in later steps if early experiments are completed sooner than expected,\nwhich leads to higher CPE and consequently better performance. Note that the variances of the proposed\npolicies are very small which are shown in the supplementary materials.\n7 Summary and Future Work\nMotivated by real-world applications we introduced a novel setting for Bayesian optimization that incorpo-\nrates a budget on the total time and number of experiments and allows for concurrent, stochastic-duration\nexperiments. We considered of\ufb02ine and online approaches for scheduling experiments in this setting, rely-\ning on a black box function to intelligently select speci\ufb01c experiments at their scheduled start times. These\napproaches aimed to optimize a novel objective function, Cumulative Prior Experiments (CPE), which we\nempirically demonstrate to strongly correlate with performance on the original optimization problem. Our\nof\ufb02ine scheduling approaches signi\ufb01cantly outperformed some natural baselines and our online approach of\npolicy switching was the best overall performer.\nFor further work we plan to consider alternatives to CPE, which, for example, incorporate factors such as\ndiminishing returns. We also plan to study further extensions to the experimental model for BO and also for\nactive learning. For example, taking into account varying costs and duration distributions across labs and\nexperiments. In general, we believe that there is much opportunity for more tightly integrating scheduling\nand planning algorithms into BO and active learning to more accurately model real-world conditions.\nAcknowledgments\nThe authors acknowledge the support of the NSF under grants IIS-0905678.\n\n8\n\n\fReferences\n[1] B. S. Anderson, A. Moore, and D. Cohn. A nonparametric approach to noisy and costly optimization. In ICML,\n\n2000.\n\n[2] J. Azimi, A. Fern, and X. Fern. Batch bayesian optimization via simulation matching. In NIPS, 2010.\n[3] D. Bond and D. Lovley. Electricity production by geobacter sulfurreducens attached to electrodes. Applications of\n\nEnvironmental Microbiology, 69:1548\u20131555, 2003.\n\n[4] E. Brochu, M. Cora, and N. de Freitas. A tutorial on Bayesian optimization of expensive cost functions, with appli-\ncation to active user modeling and hierarchical reinforcement learning. Technical Report TR-2009-23, Department\nof Computer Science, University of British Columbia, 2009.\n\n[5] E. H. Burrows, W.-K. Wong, X. Fern, F. W. Chaplen, and R. L. Ely. Optimization of ph and nitrogen for enhanced\nhydrogen production by synechocystis sp. pcc 6803 via statistical and machine learning methods. Biotechnology\nProgress, 25:1009\u20131017, 2009.\n\n[6] H. Chang, R. Givan, and E. Chong. Parallel rollout for online solution of partially observable markov decision\n\nprocesses. Discrete Event Dynamic Systems, 14:309\u2013341, 2004.\n\n[7] L. Dixon and G. Szeg. The Global Optimization Problem: An Introduction Toward Global Optimization. North-\n\nHolland, Amsterdam, 1978.\n\n[8] D. Jones. A taxonomy of global optimization methods based on response surfaces. Journal of Global Optimization,\n\npages 345\u2013383, 2001.\n\n[9] Z. Michalewicz. Genetic algorithms + data structures = evolution programs (2nd, extended ed.). Springer-Verlag\n\nNew York, Inc., New York, NY, USA, 1994.\n\n[10] D. Park and J. Zeikus. Improved fuel cell and electrode designs for producing electricity from microbial degrada-\n\ntion. Biotechnol.Bioeng., 81(3):348\u2013355, 2003.\n\n9\n\n\f", "award": [], "sourceid": 661, "authors": [{"given_name": "Javad", "family_name": "Azimi", "institution": null}, {"given_name": "Alan", "family_name": "Fern", "institution": null}, {"given_name": "Xiaoli", "family_name": "Fern", "institution": null}]}