{"title": "Adaptive Stochastic Optimization: From Sets to Paths", "book": "Advances in Neural Information Processing Systems", "page_first": 1585, "page_last": 1593, "abstract": "Adaptive stochastic optimization optimizes an objective function adaptively under uncertainty. Adaptive stochastic optimization plays a crucial role in planning and learning under uncertainty, but is, unfortunately, computationally intractable in general. This paper introduces two conditions on the objective function, the marginal likelihood rate bound and the marginal likelihood bound, which enable efficient approximate solution of adaptive stochastic optimization. Several interesting classes of functions satisfy these conditions naturally, e.g., the version space reduction function for hypothesis learning. We describe Recursive Adaptive Coverage (RAC), a new adaptive stochastic optimization algorithm that exploits these conditions, and apply it to two planning tasks under uncertainty. In constrast to the earlier submodular optimization approach, our algorithm applies to adaptive stochastic optimization algorithm over both sets and paths.", "full_text": "Adaptive Stochastic Optimization: From Sets to Paths\n\nZhan Wei Lim\n\nDavid Hsu\n\nWee Sun Lee\n\nDepartment of Computer Science, National University of Singapore\n\n{limzhanw,dyhsu,leews}@comp.nus.edu.sg\n\nAbstract\n\nAdaptive stochastic optimization (ASO) optimizes an objective function adap-\ntively under uncertainty. It plays a crucial role in planning and learning under\nuncertainty, but is, unfortunately, computationally intractable in general. This pa-\nper introduces two conditions on the objective function, the marginal likelihood\nrate bound and the marginal likelihood bound, which, together with pointwise\nsubmodularity, enable ef\ufb01cient approximate solution of ASO. Several interesting\nclasses of functions satisfy these conditions naturally, e.g., the version space re-\nduction function for hypothesis learning. We describe Recursive Adaptive Cover-\nage, a new ASO algorithm that exploits these conditions, and apply the algorithm\nto two robot planning tasks under uncertainty. In contrast to the earlier submodu-\nlar optimization approach, our algorithm applies to ASO over both sets and paths.\n\n1\n\nIntroduction\n\nA hallmark of an intelligent agent is to learn new information as the world unfolds and to improvise\nby fusing the new information with prior knowledge. Consider an autonomous unmanned aerial\nvehicle (UAV) searching for a victim lost in a jungle. The UAV acquires new information on the\nvictim\u2019s location by scanning the environment with noisy onboard sensors. How can the UAV plan\nand adapt its search strategy in order to \ufb01nd the victim as fast as possible? This is an example of\nstochastic optimization, in which an agent chooses a sequence of actions under uncertainty in order\nto optimize an objective function. In adaptive stochastic optimization (ASO), the agent\u2019s action\nchoices are conditioned on the outcomes of earlier choices. ASO plays a crucial role in planning\nand learning under uncertainty, but it is, unfortunately, computationally intractable in general [5].\nAdaptive submodular optimization provides a powerful tool for approximate solution of ASO and\nhas several important applications, such as sensor placement, active learning, etc. [5]. However, it\nhas been so far restricted to optimization over a set domain: the agent chooses a subset out of a\n\ufb01nite set of items. This is inadequate for the UAV search, as the agent\u2019s consecutive choices are\nconstrained to form a path. Our work applies to ASO over both sets and paths.\nOur work aims to identify subclasses of ASO and provide conditions that enable ef\ufb01cient near-\noptimal solution. We introduce two conditions on the objective function, the marginal likelihood\nrate bound (MLRB) and the marginal likelihood bound (MLB). They enable ef\ufb01cient approximation\nof ASO with pointwise submodular objective functions, functions that satisfy a \u201cdiminishing return\u201d\nproperty. MLRB is different from adaptive submodularity; we prove that adaptive submodularity\ndoes not imply MLRB and vice versa. While there exist functions that do not satisfy either the\nadaptive submodular or the MLRB condition, all pointwise submodular functions satisfy the MLB\ncondition, albeit with different constants.\nWe propose Recursive Adaptive Coverage (RAC), a polynomial-time approximation algorithm that\nguarantees near-optimal solution of ASO over either a set or a path domain, if the objective function\nsatis\ufb01es the MLRB or the MLB condition and is pointwise monotone submodular. Since MLRB\ndiffers from adaptive submodularity, the new algorithm expands the set of problems that admit ef\ufb01-\ncient approximate solutions, even for ASO over a set domain. We have evaluated RAC in simulation\non two robot planning tasks under uncertainty and show that RAC performs well against several\ncommonly used heuristic algorithms, including greedy algorithms that optimize information gain.\n\n1\n\n\f2 Related Work\n\nSubmodular set function optimization encompasses many hard combinatorial optimization prob-\nlems in operation research and decision making. Submodularity implies a diminishing return effect\nwhere adding an item to a smaller set is more bene\ufb01cial than adding the same item to a bigger set.\nFor example, adding a new temperature sensor when there are few sensors helps more in mapping\ntemperature in a building than when there are already many sensors. Submodular functions can be\nef\ufb01ciently approximated using a greedy heuristic [11]. Recent works have incorporated stochastic-\nity to submodular optimization [1, 5] and generalized the problem from sets optimization to path\noptimization [2].\nOur work builds on progress in submodular optimization on paths to solve the adaptive stochas-\ntic optimization problem on paths. Our RAC algorithm share a similar structure and analysis as\nthe RAId algorithm in [10] that is used to solve adaptive informative path planning (IPP) problems\nwithout noise. In fact, noiseless adaptive IPP is a special case of adaptive stochastic optimization\nproblems on paths that satis\ufb01es the marginal likelihood rate bound condition. We can derive the\nsame approximation bound using the results in Section 6. Both works are inspired by the algorithm\nin [8] used to solve the Adaptive Traveling Salesperson (ATSP) problem. In the ATSP problem, a\nsalesperson has to service a subset of locations with demand that is not known in advance. However,\nthe salesperson knows the prior probabilities of the demand at each location (possibly correlated)\nand the goal is to \ufb01nd an adaptive policy to service all locations with demand.\nAdaptive submodularity [5] generalizes submodularity to stochastic settings and gives logarithmic\napproximation bounds using a greedy heuristic. It was also shown that no polynomial time algorithm\ncan compute approximate solution of adaptive stochastic optimization problems within a factor of\n2, that is the polynomial-time hierarchy collapses to its second level [5].\nMany Bayesian active learning problems can be modeled by suitable adaptive submodular objective\nfunctions [6, 4, 3]. However, [3] recently proposed a new stochastic set function for active learning\nwith a general loss function that is not adaptive monotone submodular. This new objective function\nsatis\ufb01es the marginal likelihood bound with nontrivial constant G.\nAdaptive stochastic optimization is a special case of the Partially Observable Markov Decision Pro-\ncess (POMDP), a mathematical principled framework for reasoning under uncertainty [9]. Despite\nrecent tremendous progress in of\ufb02ine [12] and online solvers [14, 13], most partially observable\nplanning problems remain hard to solve.\n\nO(|X|1\u270f) unless P H =Pp\n\n3 Preliminaries\n\nWe now describe the adaptive stochastic optimization problem and use the UAV search and rescue\ntask to illustrate our de\ufb01nitions. Let X be the set of actions and let O be the set of observations. The\nagent operates in a world whose events are determined by a static state called the scenario, denoted\nas : X ! O. When the agent takes an action x 2 X, it receives an observation o = (x) 2 O\nthat is determined by an initially unknown scenario . We denote a random scenario as and use\na prior distribution p() := P[ = ] over the scenarios to represent our prior knowledge of the\nworld.\nFor e.g., in the UAV task, the actions are \ufb02ying to various locations, observations are the possible\nsensors\u2019 readings, and a scenario is a victim\u2019s position. When the UAV \ufb02ies to a particular location\nx, it observes its sensors\u2019 readings o that depends on actual victim\u2019s position . Prior knowledge\nabout the victim\u2019s position can be encoded as a probability distribution over the possible victim\u2019s\npositions.\nAfter taking actions x1, x2, . . . and receiving observations o1, o2, . . . after each action, the agent has\na history = {(x1, o1), (x2, o2), . . .}. We say that a scenario is consistent with a history when\nthe actions and corresponding observations of the history never contradict with the , i.e. (x) = o\nfor all (x, o) 2 . We denote this by \u21e0 . We can also say that a history 0 is consistent with\nanother history if dom( 0) dom( ) and 0(x) = (x) for all x 2 dom( ), where dom( ) is\nthe set of actions taken in . For example, a victim\u2019s position has not been ruled out given the\nsensors readings at various locations when \u21e0 .\nAn agent\u2019s goal can be characterized by a stochastic set function f : 2X \u21e5 OX ! R, which\nmeasures progress toward the goal given the actions taken and the true scenario. In this paper, we\nassume that f is pointwise monotone on \ufb01nite domain. i.e., f (A, ) \uf8ff f (B, ) for any and for\n\n2\n\n\fall A \u2713 B \u2713 X. An agent achieves its goal and covers f when f has maximum value after taking\nactions S \u2713 X and given it is in scenario , i.e., f (S, ) = f (X, ). For example, the objective\nfunction can be the sum of prior probabilities of impossible victim\u2019s positions given a history. The\nUAV \ufb01nds the victim when all except the true victim\u2019s position are impossible.\nAn agent\u2019s strategy for adaptively taking actions is a policy \u21e1 that maps a history to its next action.\nA policy terminates when there is no next action to take for a given history. We say that a policy \u21e1\ncovers the function f when the agent executing \u21e1 always achieves its goal upon termination. That\nis, f (dom( ), ) = f (X, ) for all scenarios \u21e0 , where is the history when the agent executes\n\u21e1. For example, a policy \u21e1 tells the UAV where to \ufb02y to next given the locations visited and whether\nit has a positive sensor at those locations or not and it covers the objective function when the UAV\nexecuting it always \ufb01nd the victim.\nFormally,\nthe tuple\n(X, d, p, O, r, f ), the set of actions X is the set of locations the agent can visit, r is the starting lo-\ncation of the agent, and d is a metric that gives the distance between any pair of locations x, x0 2 X.\nThe cost of the policy \u21e1, C(\u21e1, ), is the length of the path starting from location r traversed by the\nagent until the policy terminates, when presented with scenario , e.g., the distance traveled by UAV\nexecuting policy \u21e1 for a particular true victim position. We want to \ufb01nd a policy \u21e1 that minimizes\nthe cost of traveling to cover the function. We formally state the problem:\nProblem 1. Given an adaptive stochastic optimization problem on paths I = (X, d, p, O, r, f ),\ncompute an adaptive policy that minimizes the expected cost\n(1)\n\nan adaptive stochastic optimization problem on paths consists of\n\nC(\u21e1, )p().\n\nC(\u21e1) = E[C(\u21e1, )] =X\n\nsubject to f (dom( ), 0) = f (X, 0), where is the history encountered when executing \u21e1 on 0,\nfor all \u2019.\n\nAdaptive stochastic optimization problems on sets can be formally de\ufb01ned by a tuple, (X, c, p, O, f ).\nThe set of actions X is a set of items that an agent may select. Instead of a distance metric, the cost\nof selecting an item is de\ufb01ned by a cost function c : X ! R and the cost of a policy C(\u21e1, ) =\nPx2S c(x), where S is the subset of items selected by \u21e1 when presented with scenario .\n\n4 Classes of Functions\n\nThis section introduces the classes of objective functions for adaptive stochastic optimization prob-\nlems and gives the relationship between them.\nGiven a \ufb01nite set X and a function on subsets of X, f : 2X ! R, the function f is submodular if\nf (A) + f (B) f (A [ B) + f (A \\ B) for all A, B \u2713 X. Let f (S, ) be a stochastic set function.\nIf f (S, ) is submodular for each \ufb01xed scenario 2 OX, then f is pointwise submodular.\nAdaptive submodularity and monotonicity generalize submodularity and monotonicity to stochas-\ntic settings where we receive random observations after selecting each item [6]. We de-\n\ufb01ne the expected marginal value of an item x given a history , 4(x| ) as: 4(x| ) =\nE [f (dom( ) [{ x}, ) f (dom( ), ) | \u21e0 ] . A function f : 2X \u21e5 OX ! R is adaptive\nmonotone with respect to a prior distribution p() if , for all such that P[ \u21e0 ] > 0 and all\nx 2 X, it holds that 4(x| ) 0. i.e. the expected marginal value of any \ufb01xed item is nonnegative.\nFunction f is adaptive submodular with respect to a prior distribution p() if, for all and 0 such\nthat 0 \u21e0 and for all x 2 X\\dom( 0), it holds that 4(x| ) 4(x| 0). i.e. the expected marginal\nvalue of any \ufb01xed item does not increase as more items are selected. A function can be adaptive\nsubmodular with respect to a certain distribution p but not be pointwise submodular. However, it\nmust be pointwise submodular if it is adaptive submodular with respect to all distributions.\nWe denote \u02c6f (S, ) = min\u21e0 f (S, ) as the worst case value of f given a history and p( ) :=\nP[ \u21e0 ] as the marginal likelihood of a history. The marginal likelihood rate bound (MLRB)\ncondition requires a function f such that: For all 0 \u21e0 , if p( 0) \uf8ff 0.5p( ) then ,\n\n(2)\nexcept for scenarios already covered, where K > 1 and Q max f (X, ) is a constant upper\nbound for the maximum value of f for all scenarios.\n\nQ \u02c6f (dom( 0), ) \uf8ff\n\nK\u21e3Q \u02c6f (dom( ), )\u2318 ,\n\n1\n\n3\n\n\fIntuitively, this condition means that the worst case remaining objective value decreases by a con-\nstant fraction whenever the marginal likelihood of history decreases by at least half.\nExample: The version space reduction function V with arbitrary prior is adaptive submodular and\nmonotone [5]. Furthermore, it satis\ufb01es the MLRB. The version space reduction function V is de\ufb01ned\nas:\n(3)\n\np(0)\n\nV(S, ) = 1 X0\u21e0(S)\n\nfor all scenario , S \u2713 X and (S) gives the history of visiting locations x in S when the scenario is\n. The version space reduction function is often used for active learning, where the true hypothesis\nis identi\ufb01ed once all the scenarios are covered. We present the proof that the version space reduction\nfunction satis\ufb01es the MLRB condition (and all other proofs) in the supplementary material.\nProposition 1. The version space function V satis\ufb01es the MLRB with constants Q = 1 and K = 2.\nThe following proposition teases apart the relationship between the MLRB condition and adaptive\nsubmodularity.\nProposition 2. Adaptive submodularity does not imply the MLRB condition, and vice versa.\n\nThe marginal likelihood bound (MLB) condition requires that there exists some constant G, such\nthat for all ,\n\nf (X, ) \u02c6f (dom( ), ) \uf8ff G \u00b7 p( ).\n\n(4)\nIn other words, the worst remaining objective value must be less than the marginal likelihood of its\nhistory multiplied by some constant G. Our quality of solution depends on the constant G. The\nsmaller the constant G, the better the approximation bound.\nWe can make any adaptive stochastic optimization problem satisfy the MLB with a large enough\nconstant G. To trivially ensure the bound of MLB, we set G = Q \u00b7 1/, where = min p().\nHence, Q \uf8ff G \u00b7 p( ) unless we have visited all locations and covered the function by de\ufb01nition.\nExample: The version space reduction function V can be interpreted as the expected 0 1 loss of a\nrandom scenario 0 \u21e0 differing from true scenario . The loss is counted as one whenever 0 6= .\nFor example, a pair of scenarios that differ in observation at one location has the same loss of 1 as\nanother pair that differs in all observations. Thus, it can be useful to assign different loss to different\npair of scenarios with a general loss function. The generalized version space reduction function is\nde\ufb01ned as: fL(S, ) = E0 [L(, 0)1((S) 6= 0(S))] , where 1(\u00b7) is an indicator function and\nL : OX \u21e5 OX ! R0 is a general loss function that satis\ufb01es L(0, ) = L(, 0) and L(, 0) = 0\nif = 0. The generalized version space reduction function is not adaptive submodular [3] and does\nnot satisfy the MLRB condition. However, it satis\ufb01es condition MLB with a non-trivial constant G.\nProposition 3. The generalized version space reduction function fL satis\ufb01es MLB with G =\nmax,0 L(, 0).\n\n5 Algorithm\n\nAdaptive planning is computationally hard due to the need to consider every possible observation af-\nter each action. RAC assumes that it always receive the most likely observation to simplify adaptive\nplanning. RAC is a recursive algorithm that partially covers the function in each step and repeats on\nthe residual function until the entire function is covered.\nIn each recursive step, RAC uses the mostly like observation assumption to transform adaptive\nstochastic optimization problem into a submodular orienteering problem to generate a tour and tra-\nverse it. If the assumption is true throughout the tour, then RAC achieves the required partial cov-\nerage. Otherwise, RAC receives some observation that has probability less than half (since only the\nmost likely observation has probability at least half), the marginal likelihood of history decreases by\nat least half, and the MLRB and MLB conditions ensures that substantial progress is made towards\ncovering the function.\nSubmodular orienteering takes a submodular function g : X ! R and a metric on X and\ngives the minimum cost path \u2327 that covers function g such that g(\u2327 ) = g(X). We now de-\nscribe the submodular orienteering problem used in each recursive step. Given the current his-\ntory , we construct a restricted set of location-observation pairs, Z = {(x, o) : (x, o) /2\n\n4\n\n\f , o is the most likely observation at x given }. Using ideas from [7], we construct a submodu-\nlar function g\u21e4\u232b : 2Z ! R to be used in the submodular orienteering problem. Upon completion of\nthe recursive step, we would like the function to be either covered or have value at least \u232b for all\nscenarios consistent with [Z0 where Z0 is the selected subset of Z. We \ufb01rst restrict to a subset of\nscenarios that are consistent with . To simplify, we transform the function so that its maximum\nvalue for all is at least \u232b by de\ufb01ning f\u232b(S, ) = f (S, ) + (\u232b f (X, )) whenever f (X, ) <\u232b\nand f\u232b(S, ) = f (S, ) otherwise. For Z0 \u2713 Z, we now de\ufb01ne g\u232b(Z0, ) = f\u232b(dom( [ Z0), ) if\nZ0 is consistent with and g\u232b(Z0, ) = f\u232b(X, ) otherwise. Finally, we construct the submodular\nfunction g\u21e4\u232b(Z0) = 1/| |P2 min(\u232b, g\u232b(Z0, )). The constructions have the following properties\nthat guarantees the effectiveness of the recursive steps of RAC.\nProposition 4. Let f be a pointwise monotone submodular function. Then g\u232b is pointwise monotone\nsubmodular and g\u21e4\u232b is monotone submodular. In addition g\u21e4\u232b(Z0) \u232b if and only if f is either\ncovered or have value at least \u232b for all scenarios consistent with [ Z0.\nWe can replace g\u21e4\u232b by a simpler function if f satis\ufb01es a minimal dependency property where the value\nof function f depends only on the history, i.e. f (dom( ), 0) = f (dom( ), ) for all , 0 \u21e0 .\nWe de\ufb01ne a new submodular set function gm\nProposition 5. When f satis\ufb01es minimal dependency, gm\n\n\u232b (Z0) = g\u232b(Z0, Z [ ).\n\n\u232b (Z0) \u232b implies g\u21e4\u232b(Z0) \u232b.\n\nRAC needs to guard against committing to costly plan made under the most likely observation\nassumption which is bound to be wrong eventually. RAC uses two different mechanisms for hedging.\nFor MLRB, instead of requiring complete coverage, we solve partial coverage using a submodular\npath optimization problem g\u21e4(11/K)Q so that f (S) (11/K)Q for all consistent scenarios under\nthe most likely observation assumption in each recursive step. For MLB, we solve submodular\norienteering for complete coverage of g\u21e4Q but also solve for the version space reduction function\nwith 0.5 as the target, V\u21e40.5, as a hedge against over-commitment by the \ufb01rst tour when the function\nis not well aligned with the probability of observations. The cheaper tour is then traversed by RAC\nin each recursive step.\nWe de\ufb01ne the informative observation set \u2326x for every location x 2 X: \u2326x = { o | p(o|x) \uf8ff 0.5}.\nRAC traverses the tour and adaptively terminates when it encounters an informative observation.\nSubsequent recursive calls work on the residual function f0 and normalized prior p0. Let be\nthe history encountered so far just before the recursive call, for any set S dom( ) f0(S, ) =\nf (S, )f (dom( ), ). We assume that function f is integer-valued. The recursive step is repeated\nuntil the residual value Q0 = 0. We give the pseudocode of RAC in Algorithm 1. We give details of\nSUBMODULARPATH procedure and prove its approximation bound in supplementary material.\n\nAlgorithm 1 RAC\n\nprocedure RECURSERAC(p, f, Q)\n\nreturn\n\nif max2{0|p(0)>0} f (X, ) = 0 then\n\u2327 GENTOUR(p, f, Q)\n EXECUTEPLAN(\u2327)\np0 p( |)p()\nf0 f (Y, ) f (\u2327, )\nQ0 Q min f (\u2327, ) for all \u21e0 \nRECURSERAC(p0, f0, Q0)\nprocedure EXECUTEPLAN(\u2327)\n\np( )\n\nVisit next location x in \u2327 and observe o.\n\nrepeat\nuntil o 2 \u2326x or end of tour.\nMove to location xt = r.\nreturn history encountered .\n\nx0 = xt = r\n\nprocedure GENTOUR(p, f, Q)\n\nif f satis\ufb01es MLB then\n\n\u2327f SUBMODULARPATH(g\u21e4Q)\nif max p() \uf8ff 0.5 then\n\n\u2327vs SUBMODULARPATH(V\u21e40.5)\n\u2327 arg min\u2327f ,\u2327vs(W (\u23270))\n\u2327 \u2327f\n\nelse\n\nelse\n\n\u2327 SUBMODULARPATH(g\u21e4(11/K)Q)\nreturn \u2327 where \u2327 = (x0, x1, . . . , xt) and\n\n6 Analysis\n\nWe give the performance guarantees for applying RAC to adaptive stochastic optimization problem\non paths that satisfy MLRB and MLB.\n\n5\n\n\fTheorem 1. Assume that f is an integer-valued pointwise submodular monotone function. If f\nsatis\ufb01es MLRB condition, then for any constant \u270f> 0 and an instance of adaptive stochastic opti-\nmization problem on path optimizing f, RAC computes a policy \u21e1 in polynomial time such that\n\nC(\u21e1) = O((log|X|)2+\u270f log Q logK Q)C(\u21e1\u21e4)),\n\nwhere Q and K > 1 are constants that satis\ufb01es Equation (2).\nTheorem 2. Assume that prior probability distribution p is represented as non-negative integers\nIf f\nsatis\ufb01es MLB, then for any constant \u270f> 0 and an instance of adaptive stochastic optimization\nproblem on path optimizing f, RAC computes a policy \u21e1 for in polynomial time such that\n\nwith P p() = P and f is an integer-valued pointwise submodular monotone function.\n\nwhere Q = max f (X, ).\n\nC(\u21e1) = O((log|X|)2+\u270f(log P + log Q) log G)C(\u21e1\u21e4),\n\nFor adaptive stochastic optimization problems on subsets, we achieve tighter approximation bounds\nby replacing the bound of submodular orienteering with greedy submodular set cover.\nTheorem 3. Assume f is an integer-valued pointwise submodular and monotone function. If f sat-\nis\ufb01es MLRB condition, then for an instance of adaptive stochastic optimization problem on subsets\noptimizing f, RAC computes a policy \u21e1 in polynomial time such that\n\nC(\u21e1) = 4(ln Q + 1)(logK Q + 1)C(\u21e1\u21e4),\n\nwhere Q and K > 1 are constants that satis\ufb01es Equation (2).\nTheorem 4. Assume f is an integer-valued pointwise submodular and monotone function and =\nmin p(). If f satis\ufb01es MLB condition, then for an instance of adaptive stochastic optimization\nproblem on subsets optimizing f, RAC computes a policy \u21e1 in polynomial time such that\n\nC(\u21e1) = 4(ln 1/ + ln Q + 2)(log G + 1)C(\u21e1\u21e4)),\n\nwhere Q = max f (X, ).\n\n7 Application: Noisy Informative Path Planning\n\nIn this section, we apply RAC to solve adaptive informative path planning (IPP) problems with noisy\nobservations. We reduce an adaptive noisy IPP problem to an Equivalence Class Determination\n(ECD) problem [6] and apply RAC to solve it near-optimally using an objective function that satis\ufb01es\nMLRB condition. We evaluate this approach on two IPP tasks with noisy observations.\nIn an informative path planning (IPP) problem, an agent seeks a path to sense and gather infor-\nmation from its environment. An IPP problem is speci\ufb01ed as a tuple I = (X, d, H, ph, O,Zh, r),\nthe de\ufb01nitions for X, d, O, r are the same as adaptive stochastic optimization problem on path. In\naddition, there is a \ufb01nite set of hypotheses, H, and a prior probability over them, p(h). We also have\na set of probabilistic observation functions Zh = {Zx | x 2 X}, with one observation function\nZx(h, o) = p(o|x, h) for each location x. The goal of IPP problem is to identify the true hypothesis.\n7.1 Equivalence Class Determination Problem\n\nAn Equivalence Class Determination (ECD) problem consists of a set of hypotheses H and a set of\nequivalence classes {H1,H2, . . . ,Hm} that partitions H. Its goal is to identify which equivalence\nclass the true hypothesis lies in by moving to locations and making observations with the mini-\nmum expected movement cost. ECD problem has been applied to noisy Bayesian active learning\nto achieve near-optimal performance. Noisy adaptive IPP problem can also be reduced to an ECD\ninstance when it is always possible to identify the true hypothesis in IPP problem.\nTo differentiate between the equivalence classes, we use the Gibbs error objective function (called\nthe edge-cutting function in [6]). The idea is consider the ambiguities between pairs of hypotheses\nin different equivalence class, and to visit locations and make observations to disambiguate be-\ntween them. The set of pairs of hypotheses in different classes is E = [1\uf8ffi 0o\n\nthat consists of all observation vectors h0 = (o1, o2, . . . , o|X|) 2 H0 that are possible with hypothe-\nsis Hi. When we can always identify the true underlying hypothesis h 2 H, the equivalence classes\nis a partition on the set H0.\n\n7.2 Experiments\n\nWe evaluate RAC in simulation on two noisy IPP tasks modi\ufb01ed from [10]. We highlight the modi-\n\ufb01cations and give the full description in the supplementary material. In a variant of the UAV search\nand rescue task (see Figure 1), there is a safe zone (marked grey in Figure 1) where the victim is\ndeemed to be safe if we know that he is in it. otherwise we need to know the exact location of the\nvictim. The equivalence classes task are the safe zone and every location outside of it. Furthermore,\nthe long range sensor may report the wrong reading with probability of 0.03.\nIn a noisy variant of the grasping task, the laser range \ufb01nder has a 0.85 chance of detecting the\ncorrect discretized value x, 0.05 chance of \u00b11 errors each, and 0.025 chance of \u00b12 errors each.\nThe robot gripper is fairly robust to estimation error of the cup handle\u2019s orientation. For each cup,\nwe partition the cup handle orientation into regions of 20 degrees each. We only need to know the\nregion that contains cup handle. The equivalence classes here are the regions. However, it is not\nalways possible to identify the true region due to observation noise. We can still reduce to ECD\nproblem by associating each observation vector to its most likely equivalence class.\nWe now describe our baselines algorithms. De\ufb01ne information gain to be reduction in Shannon\nentropy of the equivalence classes, the information gain (IG) algorithm, greedily picks the location\nthat maximizes the expected information gain, where the expectation is taken over all possible ob-\nservations at the location. To account for movement cost, the information gain (IG-Cost) algorithm\ngreedily picks the location that maximizes expected information gain per unit movement cost. Both\nIG and IG-Cost do not reason over the long term but achieve limited adaptivity by replanning in\neach step. The Sampled-RAId algorithm is as described in [10].\nWe evaluate IG, IG-Cost, Sampled-RAId,and RAC with version space reduction (RAC-V) and Gibbs\nerror (RAC-GE) objectives. RAC-GE has theoretical performance guarantees for the noisy adaptive\n\n7\n\n\fIPP problem. Under the MLRB condition, RAC-V can also be shown to have a similar performance\nbound. However RAC-GE optimizes the target function directly and we expect that optimizing the\ntarget function directly would usually have better performance in practice. Even though the version\nspace reduction function and Gibbs error function are adaptive submodular, the greedy policy in [5]\nis not applicable as the movement cost per step depends on the paths and is not \ufb01xed. If we ignore\nmovement cost, a greedy policy on the version space reduction function is equivalent to generalized\nbinary search, which is equivalent to IG [15] for the UAV task where the prior is uniform and there\nare two observations.\nWe set all algorithms to terminate when the Gibbs error of the equivalence classes is less than\n\u2318 = 105. The Gibbs error corresponds to the exponentiated R\u00e9nyi entropy (order 2) and also the\nprediction error of a Gibbs classi\ufb01er that predicts by sampling a hypothesis from the prior. We run\n1000 trials with the true hypothesis sampled randomly from the prior for the UAV search task and\n3000 trials for the grasping task as its variance is higher. For Sampled-RAId, we set the number of\nsamples to be three times the number of hypothesis.\nFor performance comparison, we pick 15 different thresholds (starting from 1\u21e5105 and doubling\n each step) for Gibbs error of the equivalence classes and compute the average cost incurred by each\nalgorithm to reduce Gibbs error to below each threshold level . We plot the average cost with 95%\ncon\ufb01dence interval for the two IPP tasks in Figures 3 and 4. For the grasping task, we omit trials\nwhere the minimum Gibbs error possible is greater than when we compute the average cost for\nthat speci\ufb01c value. For readability, we omit results due to IG from the plots when it is worse than\nother algorithms by a large margin, which is all of IG in the grasping task. From our experiments,\nRAC-GE has the lowest average cost for both tasks at almost every . The RAC-V has very close\nresults while the other algorithms, Sampled-RAId, IG-Cost and IG do not perform as well for both\nthe UAV search and grasping task.\n\n140\n\n130\n\n120\n\n110\n\n100\n\n90\n\n80\n\n70\n\n60\n\nt\ns\no\nC\n\nIG-Cost\nRAC-GE\nRAC-V\nSampled-RAId\nIG\n\n105\n\n104\n\n103\n\n102\n\n101\n\n100\n\nGibbs Error\n\n720\n\n700\n\n680\n\n660\n\n640\n\n620\n\n600\n\n580\n\n560\n\nt\ns\no\nC\n\nIG-Cost\nRAC-V\nSampled-RAId\nRAC-GE\n\n540\n\n105\n\n104\n\n103\n\n102\n\n101\n\n100\n\nGibbs Error\n\nFigure 3: UAV search and rescue\n\nFigure 4: Grasping\n\n8 Conclusion\n\nWe study approximation algorithms for adaptive stochastic optimization over both sets and paths.\nWe give two conditions on pointwise monotone submodular functions that are useful for understand-\ning the performance of approximation algorithms on these problems: the MLB and the MLRB. Our\nalgorithm, RAC, runs in polynomial time with an approximation ratio that depends on the constants\ncharacterizing these two conditions. The results extend known results for adaptive stochastic op-\ntimization problems on sets to paths, and enlarges the class of functions known to be ef\ufb01ciently\napproximable for both problems. We apply the algorithm to two adaptive informative path planning\napplications with promising results.\nAcknowledgement This work is supported in part by NUS AcRF grant R-252-000-587-112, Na-\ntional Research Foundation Singapore through the SMART Phase 2 Pilot Program (Subaward\nAgreement No. 09), and US Air Force Research Laboratory under agreement number FA2386-\n15-1-4010.\n\n8\n\n\fReferences\n[1] Arash Asadpour, Hamid Nazerzadeh, and Amin Saberi. Stochastic submodular maximization.\n\nIn Internet and Network Economics, pages 477\u2013489. 2008.\n\n[2] Gruia Calinescu and Alexander Zelikovsky. The polymatroid steiner problems. Journal of\n\nCombinatorial Optimization, 9(3):281\u2013294, 2005.\n\n[3] Nguyen Viet Cuong, Wee Sun Lee, and Nan Ye. Near-optimal Adaptive Pool-based Active\n\nLearning with General Loss. In Proc. Uncertainty in Arti\ufb01cial Intelligence, 2014.\n\n[4] Nguyen Viet Cuong, Wee Sun Lee, Nan Ye, Kian Ming A. Chai, and Hai Leong Chieu. Active\nLearning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion. In Advances\nin Neural Information Processing Systems (NIPS), 2013.\n\n[5] Daniel Golovin and Andreas Krause. Adaptive submodularity: Theory and applications in\nactive learning and stochastic optimization. J. Arti\ufb01cial Intelligence Research, 42(1):427\u2013486,\n2011.\n\n[6] Daniel Golovin, Andreas Krause, and Debajyoti Ray. Near-optimal bayesian active learning\nwith noisy observations. In Advances in Neural Information Processing Systems (NIPS), pages\n766\u2013774, 2010.\n\n[7] Andrew Guillory and Jeff Bilmes. Interactive submodular set cover. In International Confer-\n\nence on Machine Learning (ICML), Haifa, Israel, 2010.\n\n[8] Anupam Gupta, Viswanath Nagarajan, and R. Ravi. Approximation Algorithms for Optimal\nDecision Trees and Adaptive TSP Problems. In Samson Abramsky, Cyril Gavoille, Claude\nKirchner, Friedhelm Meyer auf der Heide, and Paul G. Spirakis, editors, Automata, Lan-\nguages and Programming, number 6198 in Lecture Notes in Computer Science, pages 690\u2013\n701. Springer Berlin Heidelberg, January 2010.\n\n[9] Leslie Pack Kaelbling, Michael. L Littman, and Anthony R. Cassandra. Planning and acting\nin partially observable stochastic domains. Arti\ufb01cial Intelligence, 101:99\u2013134, January 1998.\n[10] Zhan Wei Lim, David Hsu, and Wee Sun Lee. Adaptive informative path planning in metric\n\nspaces. In Workshop on the Algorithmic Foundations of Robotics, 2014.\n\n[11] George L. Nemhauser, Laurence A. Wolsey, and Marshall L. Fisher. An analysis of approxima-\ntions for maximizing submodular set functions\u2014I. Mathematical Programming, 14(1):265\u2013\n294, 1978.\n\n[12] Sylvie C. W. Ong, Shao Wei Png, David Hsu, and Wee Sun Lee. Planning under uncertainty\nfor robotic tasks with mixed observability. Int. J. Robotics Research, 29(8):1053\u20131068, 2010.\n[13] David Silver and Joel Veness. Monte-Carlo Planning in Large POMDPs. Advances in Neural\n\nInformation Processing Systems (NIPS), 2010.\n\n[14] Adhiraj Somani, Nan Ye, David Hsu, and Wee Sun Lee. Despot: Online pomdp planning\nIn Advances in Neural Information Processing Systems (NIPS), pages\n\nwith regularization.\n1772\u20131780, 2013.\n\n[15] Alice X. Zheng, Irina Rish, and Alina Beygelzimer. Ef\ufb01cient Test Selection in Active Diagno-\n\nsis via Entropy Approximation. Proc. Uncertainty in Arti\ufb01cial Intelligence, 2005.\n\n9\n\n\f", "award": [], "sourceid": 987, "authors": [{"given_name": "Zhan Wei", "family_name": "Lim", "institution": "NUS"}, {"given_name": "David", "family_name": "Hsu", "institution": "National University of Singapore"}, {"given_name": "Wee Sun", "family_name": "Lee", "institution": "National University of Singapore"}]}