{"title": "Process-constrained batch Bayesian optimisation", "book": "Advances in Neural Information Processing Systems", "page_first": 3414, "page_last": 3423, "abstract": "Abstract Prevailing batch Bayesian optimisation methods allow all control variables to be freely altered at each iteration. Real-world experiments, however, often have physical limitations making it time-consuming to alter all settings for each recommendation in a batch. This gives rise to a unique problem in BO: in a recommended batch, a set of variables that are expensive to experimentally change need to be fixed, while the remaining control variables can be varied. We formulate this as a process-constrained batch Bayesian optimisation problem. We propose two algorithms, pc-BO(basic) and pc-BO(nested). pc-BO(basic) is simpler but lacks convergence guarantee. In contrast pc-BO(nested) is slightly more complex, but admits convergence analysis. We show that the regret of pc-BO(nested) is sublinear. We demonstrate the performance of both pc-BO(basic) and pc-BO(nested) by optimising benchmark test functions, tuning hyper-parameters of the SVM classifier, optimising the heat-treatment process for an Al-Sc alloy to achieve target hardness, and optimising the short polymer fibre production process.", "full_text": "Process-constrained batch Bayesian Optimisation\n\nPratibha Vellanki1, Santu Rana1, Sunil Gupta1, David Rubin2\n\nAlessandra Sutti2, Thomas Dorin2, Murray Height2,Paul Sandars3, Svetha Venkatesh1\n\n1Centre for Pattern Recognition and Data Analytics\n\nDeakin University, Geelong, Australia\n\n[pratibha.vellanki, santu.rana, sunil.gupta, svetha.venkatesh@deakin.edu.au]\n\n2Institute for Frontier Materials, GTP Research\n\nDeakin University, Geelong, Australia\n\n[d.rubindecelisleal, alessandra.sutti, thomas.dorin, murray.height@deakin.edu.au]\n\n3Materials Science and Engineering, Michigan Technological University, USA\n\n[sanders@mtu.edu]\n\nAbstract\n\nPrevailing batch Bayesian optimisation methods allow all control variables to be\nfreely altered at each iteration. Real-world experiments, however, often have phys-\nical limitations making it time-consuming to alter all settings for each recommend-\nation in a batch. This gives rise to a unique problem in BO: in a recommended\nbatch, a set of variables that are expensive to experimentally change need to be\n\ufb01xed, while the remaining control variables can be varied. We formulate this\nas a process-constrained batch Bayesian optimisation problem. We propose two\nalgorithms, pc-BO(basic) and pc-BO(nested). pc-BO(basic) is simpler but lacks\nconvergence guarantee. In contrast pc-BO(nested) is slightly more complex, but\nadmits convergence analysis. We show that the regret of pc-BO(nested) is sublin-\near. We demonstrate the performance of both pc-BO(basic) and pc-BO(nested) by\noptimising benchmark test functions, tuning hyper-parameters of the SVM clas-\nsi\ufb01er, optimising the heat-treatment process for an Al-Sc alloy to achieve target\nhardness, and optimising the short polymer \ufb01bre production process.\n\n1\n\nIntroduction\n\nExperimental optimisation is used to design almost all products and processes, scienti\ufb01c and indus-\ntrial, around us. Experimental optimisation involves optimising input control variables in order to\nachieve a target output. Design of experiments (DOE) [16] is the conventional laboratory and indus-\ntrial standard methodology used to ef\ufb01ciently plan experiments. The method is rigid - not adaptive\nbased on the completed experiments so far. This is where Bayesian optimisation offers an effective\nalternative.\nBayesian optimisation [13, 17] is a powerful probabilistic framework for ef\ufb01cient, global optim-\nisation of expensive, black box functions. The \ufb01eld is undergoing a recent resurgence, spurred by\nnew theory and problems and is impacting computer science broadly - tuning complex algorithms\n[3, 22, 18, 21], combinatorial optimisation [24, 12], reinforcement learning [4]. Usually, a prior be-\nlief in the form of Gaussian process is maintained over the possible set of objective functions and the\nposterior is the re\ufb01ned belief after updating the model with experimental data. The updated model\nis used to seek the most promising location of function extrema by using a variety of criteria, e.g.\nexpected improvement (EI), and upper con\ufb01dence bound (UCB). The maximiser of such a criteria\nfunction is then recommended for the function evaluation. Iteratively the model is updated and re-\ncommendations are made till the target outcome is achieved. When concurrent function evaluations\nare possible, Bayesian optimisation returns multiple suggestions, and this is termed as the batch\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\f(a) Heat treatment for Al-Sc - temperat-\nure time pro\ufb01le\n\n(b) Experimental setup for short polymer \ufb01bre production.\n\nFigure 1: Examples of real-world applications requiring process constraints.\n\nsetting. Bayesian optimisation with batch setting has been investigated by [10, 5, 6, 9, 1] wherein\ndifferent strategies are used to recommend multiple settings at each iteration. In all these methods,\nall the control variables are free to be altered at each iteration. However, in some situations needing\nto change all the variables for a single batch may not be ef\ufb01cient and this leads to the motivation of\nour process-constrained Bayesian optimisation.\nThis work has been directly in\ufb02uenced from the way experiments are conducted in many real-world\nscenarios with a typical limitation on resources. For example, in our work with metallurgists, we\nwere given a task to \ufb01nd the optimal heat-treatment schedule of an alloy which maximises the\nstrength. Heat-treatment involves taking the alloy through a series of exposures to different temper-\natures for a variable amount of durations as shown in Figure 1a. Typically, a heat treatment schedule\ncan last for multiple days, so doing one experiment at a time is not ef\ufb01cient. Fortunately, a furnace\nis big enough to hold multiple samples at the same time. If we have to perform multiple experiments\nin one batch yet using only one furnace, then we must design our Bayesian optimisation recom-\nmendations in such a way that the temperatures across a batch remain the same, whilst still allowing\nthe durations to vary. Samples would be put in the same oven, but would be taken out after dif-\nferent elapsed time for each step of the heat treatment. Similar examples abound in other domains\nof process and product design. For short polymer \ufb01bre production a polymer is injected axially\nwithin another \ufb02ow of a solvent in a particular geometric manifold [20]. A representation of the\nexperimental setup marked with the parameters involved is shown in Figure 1b. When optimising\nfor the yield it is generally easy to change the \ufb02ow parameters (pump speed setting) than changing\nthe device geometry (opening up the enclosure and modifying the physical con\ufb01guration). Hence in\nthis case as well, it is bene\ufb01cial to recommend a batch of suggested experiments at a \ufb01xed geometry\nbut allowing \ufb02ow parameters to vary. Many such examples where the batch recommendations are\nconstrained by the processes involved have been encountered by the authors in realising the potential\nof Bayesian optimisation for real-world applications.\nTo construct a more familiar application we use the hyper-parameter tuning problem for Support\nVector Machines (SVM). When we use parallel tuning using batch Bayesian optimisation, it may be\nuseful if all the parallel training runs \ufb01nished at the same time. This would require \ufb01xing the cost\nparameter, while allowing the the other hyper-parameters to vary. Whist this may or may not be a\nreal concern depending on the use cases, we use it here as a case study.\nWe formulate this unique problem as process-constrained batch Bayesian optimisation. The recom-\nmendation schedule needs to constrain a set of variables corresponding to control variables that are\nexperimentally expensive (time, cost, dif\ufb01culty) to change (constrained set) and varies all the re-\nmaining control variables (unconstrained set). Our approach involves incorporating constraints on\nstipulated control parameters and allowing the others to change in an unconstrained manner. The\nmathematical formulation of our optimisation problem is as follows.\n\n\u2217\nx\n\n= argmaxx\u2208X f (x)\n\nand we want a batch Bayesian optimisation sequence\n\n{{xt,0, xt,1, ..., xt,K\u22121}}T\n\nt=1 such that \u2200t and xt,k = [xuc\n(cid:48)\u2200k, k\n\n(cid:48) \u2208 [0, ..., K \u2212 1]\n\nt,kxc\n\nt,k],\n\nt,k = xc\nxc\nt,k\n\nWhere xc\n\nt,k is the kth constrained variable in tth batch and similarly xuc\n\nt,k is the kth unconstrained\n\nvariable in the tth batch. T is the total number of iterations and K is the batch-size.\n\n2\n\nTemperature (T)t1t2Time (t)t3t4T1T2T3T4Coagulant flow (\ud835\udc63\ud835\udc63\ud835\udc50\ud835\udc50)Polymerflow (\ud835\udc63\ud835\udc63\ud835\udc5d\ud835\udc5d)Constrictionangle (\ud835\udefc\ud835\udefc)Channelwidth(\u210e)Deviceposition(\ud835\udc51\ud835\udc51)ShortNano-fibers\fWe propose two approaches to the solve this problem: basic process-constrained Bayesian optimisa-\ntion (pc-BO(basic)) and nested process-constrained batch Bayesian optimisation (pc-BO(nested)).\npc-BO(basic) is an intuitive modi\ufb01cation motivated by the work of [5] and pc-BO(nested) is based\non a nested Bayesian optimisation method we will describe in section 3. We formulate the al-\ngorithms pc-BO(basic) and pc-BO(nested), and for pc-BO(nested) we present the theoretic analysis\nto show that the average regret vanishes superlinearly with iterations. We demonstrate the perform-\nance of pc-BO(basic) and pc-BO(nested) on both benchmark test functions and real world problems\nthat involve hyper-parameter tuning for SVM classi\ufb01cation for two datasets: breast cancer and bio-\ndegradable waste, the industrial problem of heat treatment process for an Aluminium-Scandium\n(Al-Sc) alloy, and another industrial problem of short polymer \ufb01bre production process.\n\n2 Related background\n\n2.1 Bayesian optimisation\n\nBayesian optimisation is a sequential method of global optimisation of an expensive and unknown\nblack-box function f whose domain is X , to \ufb01nd its maxima x\u2217 = argmax\nf (x) (or minima). It is\nx\u2208X\nespecially powerful when the function is expensive to evaluate and it does not have a closed-form\nexpression, but it is possible to generate noisy observations from experiments.\nThe Gaussian process (GP) is commonly used as a \ufb02exible way to place a prior over the unknown\nfunction [14]. It is are completely described by the mean function m(x) and the covariance function\nk(x, x(cid:48)) and they imply our belief and uncertainties about the objective function. Noisy observations\nfrom the experiments are sequentially appended into the model, that in turn updates our belief about\nthe objective function.\nThe acquisition function is a surrogate utility function that takes a known tractable closed form and\nallows us to choose the next query point. It is maximised in the place of the unknown objective\nfunction and constructed such that it balances between exploring regions of high value (mean) and\nexploiting regions of high uncertainties (variances) across the objective function.\nGaussian process based Upper Con\ufb01dence Bound (GP-UCB) proposed by [19] is one of the ac-\nquisition functions which is shown to achieve sublinear growth in cumulative regret. It is de\ufb01ne at\ntthiteration as\n\nGP\u2212U CB(x) = \u00b5t\u22121(x) +(cid:112)\u03b2t\u03c3t\u22121(x)\n\n\u03b1t\n\n(1)\n\nwhere, v = 1 and \u03b2t = 2log(td/2+2\u03c02/3\u03b4) is the con\ufb01dence parameter, wherein t denotes the\niteration number, d represents the dimensionality of the data and \u03b4 \u2208 (0, 1). We are motivated by\nGP-UCB based methods. Although our approach can be intuitively extended to other acquisition\nfunction, we do not explore this in the current work.\n\n2.2 Batch Bayesian optimisation methods\n\nThe GP exhibits an interesting characteristic that its predictive variance is dependent on only the\ninput attributes while updating its mean requires knowledge about the outcome of the experiment.\nThis leads us to a direction of strategies for multiple recommendations. There are several batch\nBayesian optimisation algorithms for an unconstrained case. GP-BUCB by [6] recommends mul-\ntiple batch points using the UCB strategy and the aforementioned characteristic. To \ufb01ll up a batch, it\nupdates the variances with the available attribute information and appends the outcomes temporarily\nby substituting them with most recently computed posterior mean. A similar strategy is used in\nthe GP-UCB-PE by [5] that optimises the unknown function by incorporating some batch elements\nwhere uncertainty is high. GP-UCB-PE computes the \ufb01rst batch element by using the UCB strategy\nand recommends the rest of the points by relying on only the predictive variance, and not the mean.\nIt has been shown that for these GP-UCB based algorithms the regret can be bounded tighter than\nthe single recommendation methods. To the best of our knowledge these existing batch Bayesian\noptimisation techniques do not address the process-constrained problem presented in this work. The\nalgorithms proposed in this paper are inspired by the previous approaches but address it in context\nof a process-constrained setting.\n\n3\n\n\f2.3 Constrained-batch vs. constrained-space optimisation\n\nWe refer to the parameters that are not allowed to change (eg.\ntemperatures for heat treatment,\nor device geometry for \ufb01bre production) as constrained set and the other parameters (heat treatment\ndurations or \ufb02ow parameters) as unconstrained set. We emphasise that our usage of constraint differs\nfrom the problem settings presented in literature, for example in [2, 11, 7, 8], where the parameters\nvalues are constrained or the function evaluations are constrained by inequalities. In the problem\nsetting that we present, all the parameters exist in unconstrained space; for each individual batch,\nthe constrained variables should have the same value.\n\n3 Proposed method\nWe recall the maximisation problem from Section 1 as x\u2217 = argmaxx\u2208X f (x).\nX uc \u222a X c, where X c is the constrained subspace and X uc is the unconstrained subspace.\n\nIn our case X =\n\nAlgorithm 1 pc-BO(basic): Basic process-constrained pure exploration batch Bayesian optimisation\nalgorithm.\nwhile (t < M axIter)\n\nAlgorithm 2 pc-BO(nested): Nested process-constrained batch Bayesian optimisation algorithm.\nwhile (t < M axIter)\n\nfor k = 1, .., K \u2212 1\n\nt,0xc\nt,0\n\nxt,0 =(cid:2)xuc\nD = D \u222a(cid:8)(cid:2)xuc\n\nend\n\n(cid:3) = argmaxx\u2208X \u03b1GP\u2212U CB (xt,0 | D)\nt,0,(cid:8)xuc\n(cid:3) , f(cid:0)(cid:2)xuc\n\nt,k | D, xc\n(cid:3)(cid:1)(cid:9)K\u22121\nxuc\n\n(cid:16)\n\nt,kxc\nt,1\n\nt,k = argmax xuc\u2208X uc \u03c3\nxuc\n\nt,kxc\nt,1\n\nk=0\n\nt,k(cid:48)(cid:9)k(cid:48)<k(cid:17)\n\nt = argmaxxc\u2208X c \u03b1GP\u2212U CB\nxc\n(xc\nt,0 = argmaxxuc\u2208X uc \u03b1GP\u2212U CB\nxuc\nfor k = 1, ..., K-1\n\nuc\n\nc\n\nend\n\nt,k = argmaxxuc\u2208X uc \u03c3uc\nxuc\n\nDO = DO \u222a(cid:8)xc\nDI = DI \u222a(cid:8)(cid:2)xuc\n\nt , f(cid:0)(cid:2)(xuc\n(cid:3) , f(cid:0)(cid:2)xuc\n\nt )+ xc\nt\n\nt,kxc\nt\n\nt,kxc\nt\n\nt | DO)\n(xuc\nt\n\n(cid:16)\n(cid:3)(cid:1)(cid:9)\n(cid:3)(cid:1)(cid:9)K\u22121\n\nxuc\nt\n\nk=0\n\n| DI , xc\n\n| DI , xc\nt )\n\nt,k(cid:48)(cid:9)k(cid:48)<k(cid:17)\nt ,(cid:8)xuc\n\nend\n\nend\n\nA na\u00efve approach to solving the process is to employ any standard batch Bayesian optimisation\nalgorithm where the \ufb01rst member is generated and then subsequent members are \ufb01lled up by setting\nthe constraint variables to that of the \ufb01rst member. We describe this approach as the basic process-\nconstrained pure exploration batch Bayesian optimisation (pc-BO(basic)) algorithm as detailed in\nalgorithm 1, where \u03b1GP\u2212U CB(x | D) is the acquisition function as de\ufb01ned in Equation 1. We note\nthat pc-BO(basic) is an improvisation over the work of [5]. During each iteration, the \ufb01rst batch\nelement is recommended using the UCB strategy. The remaining batch elements, as in GP-UCB-\nPE, are generated by updating the posterior variance of the GP, after the constrained set attributes\nare \ufb01xed to those of the \ufb01rst batch element.\nWe provide an alternate formulation via a nested optimisation problem called nested process-\nconstrained batch Bayesian optimisation (pc-BO(nested)) with two stages. For each batch, in the\nouter stage optimisation is performed to \ufb01nd the optimal values of the constrained variables and\nin the inner stage optimisation is performed to \ufb01nd optimal values of the unconstrained variables.\n(x | D) is the acquisition function for the\nThe algorithm is detailed in algorithm 2, where \u03b1GP\u2212U CB\n(x | D) is the acquisition function for the inner stage as de\ufb01ned in Equa-\nouter stage, and \u03b1GP\u2212U CB\nt \u2208(cid:110)\nt ]), is the unconstrained batch parameter that yields the\ntion 1, and (xuc\nargmax\n\nuc\nt )+ =\n\n(cid:111)K\u22121\n\nf ([xuc\n\nt xc\n\nxuc\n\nc\n\nxuc\nt,k\n\nbest target goal for the given constrained parameter xc. We are able to analyse the convergence of\n\nk=0\n\n4\n\n\fpc-BO(nested). It can be expected that in some cases the performance of the pc-BO(basic) and pc-\nBO(nested) are close. The pc-BO(basic) method maybe considered simpler, but it lacks guaranteed\nconvergence.\n\n3.1 Convergence analysis for pc-BO(nested)\n\nWe now present the analysis of the convergence of pc-BO(nested) as described in Algorithm 2. The\nouter stage optimisation problem for xc and observation Do is expressed as follows.\n\nwhere,\n\n\u2217\n\n(xc)\ng(xc) (cid:44)\n(cid:39)\n\n= argmaxxc\u2208X c g(xc),\nf ([xucxc])\n\nmax\n\nxuc\u2208X uc\n\nwhere, X uc (cid:44) {{xt,0, xt,1, ..., xt,K\u22121}}T\n\nsuch that, xc\n\nt,k = xc,\n\nf ([xucxc]) = f ([(xuc)+xc]),\n\nmax\n\nxuc\u2208Xuc\n\nDO (cid:44) (cid:110)\n\nxc\nt , f\n\n(cid:16)(cid:104)(cid:0)xuc\n\nt,k\n\n(cid:1)+ xc\n\nt,\n\n(cid:105)(cid:17)(cid:111)T\n\nt=1\n\nt=1\n\nAnd the inner stage optimisation problem for xuc and observation DI is expressed as follows.\n\nwhere,\n\n= argmaxxuc\u2208X uc h (xuc) ,\n\n\u2217\n(xuc)\nh(xuc) (cid:44) f ([xucxc])\nt,kxc\nt\n\nDI (cid:44) (cid:110)(cid:8)(cid:2)xuc\n\n(cid:3) , f(cid:0)(cid:2)xuc\n\nt,kxc\nt\n\n(cid:3)(cid:1)(cid:9)K\u22121\n\n(cid:111)T\n\nk=0\n\nt=1\n\nThis is solved using a Bayesian optimisation routine. Here,(xuc)+ is the unconstrained batch para-\nmeter that yields the best target goal for the given constrained parameter xc. Unfortunately as g(xc)\nis not easily measurable, we use f ([(xuc)+xc]) as an approximation to it. To address this we use\na provable batch Bayesian optimisation such as GP-UCB-PE [5] in the inner stage. The loops are\nperformed together where in each iteration t, the outer loop \ufb01rst recommends a single recommend-\nation of xc\nk=1. Combining them we get process-\nconstrained set of recommendations. We show that together these two Bayesian optimisation loops\nconverge to the optimal solution.\nLet us denote (xuc\n\nt and then the inner loop suggests a batch,(cid:8)xuc\n\nt ]). Following that we can write g(xc) as,\n\nt )+ = argmax\n\n(cid:9)K\n\nf ([xucxc\n\nt,k\n\ng(xc) = f(cid:0)(cid:2)(xuc\n(cid:16)(cid:104)\n\n\u2217\n\nxc\nt,\n\nt )\n\nxuc\u2208{xuc\n\nk }K\n\nk=1\n\n(cid:16)(cid:104)\n\n(cid:3)(cid:1) = f\n(cid:105)(cid:17)\n\n= f\n\n(xuc\n\nt )+ xc\n\nt,\n\n+ ruc\nt\n\n(xuc\n\nt )+ xc\n\nt,\n\n(cid:105)(cid:17)\n\n+ f(cid:0)(cid:2)(xuc\n\n\u2217\n\nxc\nt,\n\nt )\n\n(cid:16)(cid:104)\n\n(cid:3)(cid:1) \u2212 f\n\n(xuc\n\nt )+ xc\n\nt,\n\n(cid:105)(cid:17)\n\nis the regret of the inner loop.\n\nwhere ruc\nThe observational model is given as\n\nt\n\n(cid:16)(cid:104)\n\nLemma 1. For regret of the inner loop,(cid:80)T\n\nyc = g(xc) + \u0001 = f\n\n(xuc\n\nt )+ xc\n\nt=1\n\nt + \u0001\n\n+ ruc\n\n(cid:1)2 \u2264 \u03b2uc\n\n1 C uc\n\n1 \u03b3uc\n\nT + \u03c02\n\n6\n\n(cid:105)(cid:17)\n(cid:0)rK\n\nt,\n\nt\n\n(2)\n\n(3)\n\nwhere \u0001 \u223c N (0, \u03c32)\n\nProof. As we use GP-UCB-PE for unconstrained parameter optimisation, we can say that the regret\nt . Now, even though\nt = min rk\nrK\nt\nevery batch recommendation for xc will always be run for one iteration only, the \u03c30\nt (xt) is computed\nfrom the updated GP. Hence the sum of (\u03c30\nt )2 can be upper bounded by \u03b3T . Thus,\n\n\u2200k = 0, ..., K \u2212 1 (Lemma 1, [5]). Hence, rK\n\n\u221a\n\u03b21\u03c30\n\nt \u2264 r0\n\nt \u2264 2\n\nT(cid:88)\n\n(cid:16)\n\nt=1\n\n(cid:17)2 \u2264 \u03b2uc\n\nrK\nt\n\n1 C uc\n\n1 \u03b3uc\n\nT +\n\n\u03c02\n6\n\n(4)\n\nA\u2208X c,|A|=T\n\n\u03b21 = 2log(1d/2+2\u03c02/3\u03b4)\nmax\n\n\u22122);\nI(yA : fA) assuming y = f + \u0001, where \u0001 \u223c N (0, \u03c32/2) is the maximum\n\nHere,\n\u03b3T =\ninformation gain after T rounds. (Please see supplementary material 5 for derivation)\nLemma 2. For the variance of ruc\n\nt has the order of \u03c32\n\nC1 = 8/log(1 + \u03c3\n\ncon\ufb01dence\n\nparameter;\n\nrt \u223c O(C uc\n\nt + C uc\n2 )\n\n1 \u03b2uc\n\n1 \u03b3uc\n\nthe\n\nis\n\n5\n\n\fProof. We use PE algorithm [5] to compute K-recommendation, hence the variance of the regret\nt can be bounded above by\nruc\n\n\u2264 E((ruc\n\nt )2) \u2264 E\n\n\u03c32\nruc\nt\n\n(ruc\n\nt(cid:48) )2\n\n= E\n\nmin\nk<K\n\n(ruc\n\nt(cid:48)k)2\n\n(cid:33)\n\n(cid:32)\n\nt(cid:88)\n\nt(cid:48)=0\n\n1\nt\n\n(cid:33)\n\nThe second inequality holds since on an average the gap ruc\nt )+xc]) decreases with\niteration t, \u2200xc \u2208 X c. From equation 3, equation 4 and using the Lemma 4 and 5 of [5] we can write\n\nt = g(xc)\u2212 f ([(xuc\n\n(cid:32)\n\nt(cid:88)\n\nt(cid:48)=0\n\n1\nt\n\n(cid:33)\n\nmin\nk<K\n\n(ruc\n\nt(cid:48)k)2\n\n\u223c O(\n\n1\nt\n\nC uc\n\n1 \u03b2uc\n\n1 \u03b3uc\n\nt + C uc\n2 )\n\n(5)\n\n2 \u2208 R. \u03b3t is the maximum information gain over t samples. This concludes the\n\n(cid:32)\n\nE\n\nt(cid:88)\n\nt(cid:48)=0\n\n1\nt\n\nfor some C uc,\nproof.\n\n1 C uc\n\nThe following lemma guarantees an existence of a \ufb01nite T0 after which the noise variance coming\nfrom the inner optimisation loop becomes smaller than the noise in the observation model.\nLemma 3. \u2203T0 < \u221e for which \u03c32\n\n\u2264 \u03c32.\n\nruc\nT0\n\nProof. In Lemma 1,C uc,\n1 C uc\nquantity of the form M1 \u00d7 1\nthe lemma is proved.\n\n2 and \u03b2uc\nt C uc\n1 \u03b3uc\n\n1 \u03b2uc\n\n1 are \ufb01xed constant and \u03b3uc\nt + C uc\n\ntK is sublinear in t. Therefore, any\n2 also decreases sublinearly with t for \u2200M1 \u2208 R. Hence\n\nLet us denote the instantaneous regret for the outer Bayesian optimisation loop as rc\ng(xc\n\nt ), we can write the average regret after T iterations as,\n\nt = g((xc)\u2217) \u2212\n\n\u00afRT =\n\n(cid:88)\n(2(cid:112)\u03b2c\n\nT(cid:88)\n(cid:114)\n\nt=0\n\n\u03b2c\nT\n\n1\nT\n\nt \u2264 1\nrc\nT\n\n(cid:80)(\u03c3c\n\n\u2264 2\n\nt ))2\n\nt\u22121(xc\nT\n\n+\n\n1\nT\n\nt ) +\n\nt \u03c3c\n\nt\u22121(xc\n\n(cid:88) 1\n\nt2\n\n1\nt2 )\n\n(cid:33)\n\nT(cid:88)\n\nt=1\n\n1\nt2\n\n+\n\n1\nT\n\n(6)\n\n(7)\n\nusing the Lemma 5.8 of [19] and Cauchy-Schwartz inequality.\n\u00afRT \u2192 0\nLemma 4. For the outer Bayesian optimisation lim\nT\u2192\u221e\n\nProof. From the equation 6\n\nT(cid:88)\nT(cid:88)\n\nt=1\n\n1\nt2\n\n(\u03c3c\n\nt\u22121(xc\n\nt ))2\n\n(cid:80)T\n(cid:32) T0(cid:88)\n\n\u03b2c\nT\n\n(cid:115)\n(cid:118)(cid:117)(cid:117)(cid:116) \u03b2c\n(cid:114)\n\nT\nT\n\n\u03b2c\nT\nT\n\n\u00afRT \u2264 2\n\n= 2\n\n\u2264 2\n\nt\u22121(xc\n\nt ))2\n\nt=1(\u03c3c\nT\n\n+\n\n1\nT\n\n((\u03c3c\n\nt\u22121(xc\n\nt ))2 +\n\nt=1\n\n(AT0 + BT ) +\n\nt=T0+1\n\n1\nt2\n\n1\nT\n\nT(cid:88)\nT(cid:88)\n\nt=1\n\nWe then show that AT0 is upper bounded by a constant irrespective of T as long as T \u2265 T0 and BT is\nsublinear with T . \u03b2c\n6 . Hence the right hand side vanishes as\nT \u2192 \u221e. The details of the proof is presented in the supplementary material.\n\nT is sublinear in T and lim\nT\u2192\u221e\n\nt2 = \u03c02\n\nt=1\n\n1\n\nHowever, in reality using regret as the upper bound on ruc\nis not necessary, as a tighter upper bound\nmay exist when we know the maximum value of the function1 and we can safely alter the upper\nbound as,\n\nt\n\nt )+xc]), 2(cid:112)\u03b21\u03c3uc\n\nt\u22121(xuc\n\n0 ))\n\n(8)\n\nt \u2264 min(f max \u2212 f ([(xuc\nruc\nThe above results holds since Lemma 2 still holds.\n\n1e.g. for hyper-parameter tuning we know that maximum value of accuracy is 1.\n\n6\n\n\fFigure 2: Synthetic test function optimisation using pc-BO(nested), pc-BO(basic) and s-BO. The\nzoomed area on the respective scale is shown for Branin and Goldstein-Price.\n\n4 Experiments\n\nWe conducted a set of experiments using both synthetic data and real data to demonstrate the per-\nformance of pc-BO(basic) and pc-BO(nested). To the best of our knowledge, there are no other\nmethods that can selectively constrain parameters in each batch during Bayesian optimisation. Fur-\nther, we also show the results for the test function optimisation using sequential BO (s-BO) using\nGP-UCB.\nThe code is implemented in MATLAB and all the experiments are run on an Intel CPU E5-2640 v3\n@2.60GHz machine. We use the squared exponential distance kernel. To show the performance,\nwe plot the results as the best outcome so far against the number of iterations performed. The\nuncertainty bars in the \ufb01gures pertain to 10 runs of BO algorithms with different initialisations for a\nbatch of 3 recommendations. The errors bars show the standard error and the graph shows the mean\nbest outcome until the respective iteration.\n\n4.1 Benchmark test function optimisation\n\nIn this section, we use benchmark test functions and demonstrate the performance of pc-BO(basic)\nand pc-BO(nested). We apply the test functions by constraining the second parameter and \ufb01nding\nthe best con\ufb01guration across the \ufb01rst parameter (unconstrained). The Branin, Ackley, Goldstein-\nPrice and the Egg-holder functions were optimised using pc-BO(basic) and pc-BO(nested), and the\nresults are shown in Figure 2. From the results, we note that the pc-BO(nested) is marginally better\nor similar in performance when compared with pc-BO(basic). It also shows that batch Bayesian\noptimisation is more ef\ufb01cient in terms of number of iterations than a purely sequential approach for\nthe problem at hand.\n\n4.2 Hyper-parameter tuning for SVM\n\nSupport vector machines with RBF kernel require hyper-parameter tuning for Cost (C) and Gamma\n(\u03b3). Out of these parameters, the cost is a critical parameter that trades off error for generalisation.\nConsider tuning SVM\u2019s in parallel. The cost parameter strongly affects the time required for training\nSVM. It would be inconvenient if one training process took much longer than the other. Thus\nconstraining the cost parameter for a single batch maybe a good idea. We use our algorithms to tune\n\n7\n\n0102030 4050607080number of iterations 0.850.90.951best value so farBranin (normalised)pc-BO(nested)pc-BO(basic)s-BO0102030 4050607080number of iterations 0.10.20.30.40.50.60.70.80.91best value so farAckley (normalised)pc-BO(nested)pc-BO(basic)s-BO0102030 4050607080number of iterations 0.930.940.950.960.970.980.991best value so farGoldstein-Price (normalised)pc-BO(nested)pc-BO(basic)s-BO0102030 4050607080number of iterations 0.550.60.650.70.750.80.850.90.951best value so farEgg-holder (normalised)pc-BO(nested)pc-BO(basic)s-BO\fboth the hyper-parameters C and \u03b3, at each batch only varying \u03b3, but not C. This is demonstrated on\nthe classi\ufb01cation using SVM problem using two datasets downloaded from UCI machine learning\nrepository: Breast cancer dataset (BCW) and Bio-degradation dataset (QSAR).\nBCW has 683 instances with 9 attributes each of the data, where the instances are labelled as be-\nnign or malign tumour as per the diagnosis. The QSAR dataset categorises 1055 chemicals with\n42 attributes as ready or not ready biodegradable waste. The results are plotted as best accuracy\nobtained across number of iterations. We observe from the results in Figure 3, that pc-BO(nested)\nagain performs marginally better than pc-BO(basic) for the BCW dataset. For the QSAR dataset,\npc-BO(nested) higher accuracy with lesser iterations than what pc-BO(basic) requires.\n\n4.3 Heat treatment for an Al-Sc alloy\n\nAlloy casting involves heat treatment process - exposing the cast to different temperatures for select\ntimes, that ensures target hardness of the alloy. This process is repeated in steps. The underlying\nphysics of heat-treatment of an alloy is based on nucleation and growth. During the nucleation pro-\ncess, \u201cnew phases\u201d or precipitates are formed when clusters of atom self organise. This is a dif\ufb01cult\nstochastic process that happens at lower temperatures. These precipitates then diffuse together to\nachieve the requisite target alloy characteristics in the growth step. KWN [15, 23] is the industrial\nstandard precipitation model for the kinetics of nucleation and growth steps. As a preliminary study\nwe use this simulator to demonstrate the strength of our algorithm.\nAs explained in the introduction, it is cost ef\ufb01cient to test heat treatment in the real world by varying\nthe time and keeping the temperature constrained in each batch. This will allow us to test multiple\nsamples at one go in a single oven. We use the same constrains for our simulator driven study. We\nconsider a two stage heat treatment process. The input to \ufb01rst stage is the alloy composition, the\ntemperature and time. The nucleation output of this stage is input to the the second stage along\nwith the temperature and time for the second stage. The \ufb01nal output is hardness of the material\n(strength in kPa). To optimise this two stage heat treatment process our inputs are [T1, T2, t1, t2],\nwhere [T1, T2] represent temperatures in Celsius, [t1, t2] represent the time in minutes for each stage.\nFigure 4 shows the results of the heat-treatment process optimisation.\n\n4.4 Short polymer \ufb01bre polymer production\n\nShort polymer \ufb01bre production is a set of experiments we conducted in collaboration with material\nscientists at Deakin University. For production of short polymer \ufb01bres, a polymer rich \ufb02uid is\ninjected coaxially into the \ufb02ow of another solvent in a particular geometric manifold. The parameters\nincluded in this experiment are device position in mm, constriction angle in degrees, channel width\nin mm, polymer \ufb02ow in ml/hr, and coagulant speed in cm/s. The \ufb01nal output, the combined utility is\nthe distance of the length and diameter of the polymer from target polymer. The goal is to optimise\nthe input parameters to obtain a polymer \ufb01bre of a desired length and diameter. As explained in the\nintroduction, it is ef\ufb01cient to test multiple combinations of polymer \ufb02ow and coagulant speed for a\n\ufb01xed geometric setup than in a single batch.\n\nFigure 3: Hyper-parameter tuning for SVM based classi\ufb01cation on Breast Cancer Data (BCW) and\nbio-degradable waste data (QSAR) using pc-BO(nested) and pc-BO(basic)\n\n8\n\n0102060708030 40 50 number of iterations 0.850.90.951accuracySVM with BCWpc-BO(nested)pc-BO(basic)0102060708030 40 50 number of iterations 0.70.750.80.850.9SVM with QSARpc-BO(nested)pc-BO(basic)accuracy\fFigure 4: Results for heat-treatment and short polymer \ufb01bre production processes. (a) Experimental\nresult for Al-Sc heat treatment pro\ufb01le for a two stage heat-treatment process using pc-BO(nested)\nand pc-BO(basic). (b) Optimisation for short polymer \ufb01bre production with position, constriction\nangle and channel width constrained for each batch. Polymer \ufb02ow and coagulant speed are uncon-\nstrained. The optimisation is shown for pc-BO(nested) and pc-BO(basic) algorithms.\n\nThe parameters in this experiments are discrete, where every parameter takes 3 discrete values,\nexcept the constriction angle which takes 2 discrete values. Coagulant speed and polymer \ufb02ow are\nunconstrained parameters and channel width, constriction angle and position are the constrained\nparameters. We conducted the experiment in batches of 3. The Figure 4 shows the optimisation\nresults for this experiment over 53 iterations.\n\n5 Conclusion\n\nWe have identi\ufb01ed a new problem in batch Bayesian optimisation, motivated from physical limita-\ntions in real world experiments while conducting batch experiments. It is not feasible and resource-\nfriendly to change all available settings in scienti\ufb01c and industrial experiments for a batch. We\npropose process-constrained batch Bayesian optimisation for such applications, where it is prefer-\nable to \ufb01x the values of some variables in a batch. We propose two approaches to solve the problem\nof process-constrained batches pc-BO(basic) and pc-BO(nested). We present analytical proof for\nconvergence of pc-BO(nested). Synthetic functions, and real world experiments: hyper-parameter\ntuning for SVM, alloy heat treatment process, and short polymer \ufb01ber production process were op-\ntimised using the proposed algorithms. We found that pc-BO(nested) in each of these scenarios is\neither more ef\ufb01cient or equally well performing compared with pc-BO(basic).\n\nAcknowledgements\n\nThis research was partially funded by the Australian Government through the Australian Research\nCouncil (ARC) and the Telstra-Deakin Centre of Excellence in Big Data and Machine Learning.\nProf Venkatesh is the recipient of an ARC Australian Laureate Fellowship (FL170100006).\n\nReferences\n[1] J. Azimi, A. Fern, and X. Z. Fern. Batch bayesian optimization via simulation matching. In Advances in\n\nNeural Information Processing Systems, pages 109\u2013117, 2010.\n\n[2] J. Azimi, X. Fern, and A. Fern. Budgeted optimization with constrained experiments. Journal of Arti\ufb01cial\n\nIntelligence Research, 56:119\u2013152, 2016.\n\n[3] J. Bergstra, R. Bardenet, Y. Bengio, and B. K\u00e9gl. Algorithms for hyper-parameter optimization.\n\nAdvances in Neural Information Processing Systems, pages 2546\u20132554, 2011.\n\nIn\n\n[4] E. Brochu, V. M. Cora, and N. de Freitas. A tutorial on Bayesian optimization of expensive cost functions,\nwith application to active user modeling and hierarchical reinforcement learning. arXiv:1012.2599, (UBC\nTR-2009-023 and arXiv:1012.2599), 2010.\n\n[5] E. Contal, D. Buffoni, A. Robicquet, and N. Vayatis. Parallel gaussian process optimization with up-\nIn Joint European Conference on Machine Learning and\n\nper con\ufb01dence bound and pure exploration.\nKnowledge Discovery in Databases, pages 225\u2013240. Springer, 2013.\n\n9\n\n05102025300number of iterations iterations7590105120Hardness of the alloypc-BO(nested)pc-BO(basic)Al-Sc alloy heat treatment15 0510152025303540number of iterations00.20.40.60.81best combined utility of polymershort polymer fibre productionpc-BO(nested)pc-BO(basic)\f[6] T. Desautels, A. Krause, and J. W. Burdick. Parallelizing exploration-exploitation tradeoffs in gaussian\n\nprocess bandit optimization. Journal of Machine Learning Research, 15(1):3873\u20133923, 2014.\n\n[7] J. R. Gardner, M. J. Kusner, Z. E. Xu, K. Q. Weinberger, and J. P. Cunningham. Bayesian optimization\n\nwith inequality constraints. In International Conference on Machine Learning, pages 937\u2013945, 2014.\n\n[8] M. A. Gelbart, J. Snoek, and R. P. Adams. Bayesian optimization with unknown constraints. In Uncer-\n\ntainty in Arti\ufb01cial Intelligence, pages 250\u2013259, 2014.\n\n[9] D. Ginsbourger, R. Le Riche, and L. Carraro. A multi-points criterion for deterministic parallel global\n\noptimization based on gaussian processes. Technical report, 2008.\n\n[10] J. Gonz\u00e1lez, Z. Dai, P. Hennig, and N. D. Lawrence. Batch bayesian optimization via local penalization.\n\nIn Arti\ufb01cial Intelligence and Statistics, pages 648\u2013657, 2015.\n\n[11] J. M. Hern\u00e1ndez-Lobato, M. A. Gelbart, R. P. Adams, M. W. Hoffman, and Z. Ghahramani. A general\nframework for constrained bayesian optimization using information-based search. Journal of Machine\nLearning Research, 17(160):1\u201353, 2016.\n\n[12] F. Hutter, H. H. Hoos, and K. Leyton-Brown. Sequential model-based optimization for general algorithm\n\ncon\ufb01guration. In Learning and Intelligent Optimization, pages 507\u2013523, 2011.\n\n[13] D. R. Jones, M. Schonlau, and W. J. Welch. Ef\ufb01cient global optimization of expensive black-box func-\n\ntions. Journal of Global optimization, 13(4):455\u2013492, 1998.\n\n[14] C. E. Rasmussen. Gaussian processes for machine learning. 2006.\n\n[15] J. Robson, M. Jones, and P. Prangnell. Extension of the n-model to predict competing homogeneous and\n\nheterogeneous precipitation in al-sc alloys. Acta Materialia, 51(5):1453\u20131468, 2003.\n\n[16] J. Sacks, W. J. Welch, T. J. Mitchell, and H. P. Wynn. Design and analysis of computer experiments.\n\nStatistical science, pages 409\u2013423, 1989.\n\n[17] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. de Freitas. Taking the human out of the loop: A\n\nreview of bayesian optimization. Proceedings of the IEEE, 104(1):148\u2013175, 2016.\n\n[18] J. Snoek, H. Larochelle, and R. P. Adams. Practical bayesian optimization of machine learning algorithms.\n\nIn Advances in Neural Information Processing Systems, pages 2960\u20132968, 2012.\n\n[19] N. Srinivas, A. Krause, S. Kakade, and M. W. Seeger. Gaussian process optimization in the bandit setting:\nIn Proceedings of the 27th International Conference on Machine\n\nNo regret and experimental design.\nLearning (ICML-10), June 21-24, 2010, Haifa, Israel, pages 1015\u20131022, 2010.\n\n[20] A. Sutti, T. Lin, and X. Wang. Shear-enhanced solution precipitation: a simple process to produce short\n\npolymeric nano\ufb01bers. Journal of nanoscience and nanotechnology, 11(10):8947\u20138952, 2011.\n\n[21] K. Swersky, J. Snoek, and R. P. Adams. Multi-task bayesian optimization. In Advances in Neural Inform-\n\nation Processing Systems, pages 2004\u20132012, 2013.\n\n[22] C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown. Auto-weka: combined selection and hyper-\nparameter optimization of classi\ufb01cation algorithms. In International Conference on Knowledge Discovery\nand Data Mining, pages 847\u2013855, 2013.\n\n[23] R. Wagner, R. Kampmann, and P. W. Voorhees. Homogeneous Second-Phase Precipitation. Wiley Online\n\nLibrary, 1991.\n\n[24] Z. Wang, M. Zoghi, F. Hutter, D. Matheson, and N. de Freitas. Bayesian optimization in high dimensions\nvia random embeddings. In International Joint Conference on Arti\ufb01cial Intelligence, pages 1778\u20131784,\n2013.\n\n10\n\n\f", "award": [], "sourceid": 1946, "authors": [{"given_name": "Pratibha", "family_name": "Vellanki", "institution": "Deakin University"}, {"given_name": "Santu", "family_name": "Rana", "institution": "Deakin University"}, {"given_name": "Sunil", "family_name": "Gupta", "institution": "Deakin University"}, {"given_name": "David", "family_name": "Rubin", "institution": null}, {"given_name": "Alessandra", "family_name": "Sutti", "institution": "Deakin University"}, {"given_name": "Thomas", "family_name": "Dorin", "institution": "Deakin University"}, {"given_name": "Murray", "family_name": "Height", "institution": "Deakin University"}, {"given_name": "Paul", "family_name": "Sanders", "institution": null}, {"given_name": "Svetha", "family_name": "Venkatesh", "institution": "Deakin University"}]}