{"title": "Auction Mechanism Design for Multi-Robot Coordination", "book": "Advances in Neural Information Processing Systems", "page_first": 879, "page_last": 886, "abstract": "", "full_text": "Auction Mechanism Design for Multi-Robot\n\nCoordination\n\nCurt Bererton, Geoff Gordon, Sebastian Thrun, Pradeep Khosla\n\nfcurt,ggordon,thrun,pkkg@cs.cmu.edu\n\nCarnegie Mellon University\n\n5000 Forbes Ave\n\nPittsburgh, PA 15217\n\nAbstract\n\nThe design of cooperative multi-robot systems is a highly active research\narea in robotics. Two lines of research in particular have generated inter-\nest: the solution of large, weakly coupled MDPs, and the design and im-\nplementation of market architectures. We propose a new algorithm which\njoins together these two lines of research. For a class of coupled MDPs,\nour algorithm automatically designs a market architecture which causes\na decentralized multi-robot system to converge to a consistent policy. We\ncan show that this policy is the same as the one which would be produced\nby a particular centralized planning algorithm. We demonstrate the new\nalgorithm on three simulation examples: multi-robot towing, multi-robot\npath planning with a limited fuel resource, and coordinating behaviors in\na game of paint ball.\n\n1\n\nIntroduction\n\nIn recent years, the design of cooperative multi-robot systems has become a highly active\nresearch area within robotics [1, 2, 3, 4, 5, 6]. Many planning problems in robotics are best\nphrased as MDPs, de\ufb01ned over world states or\u2014in case of partial observability\u2014belief\nstates [7]. However, existing MDP planning techniques generally scale poorly to multi-\nrobot systems because of the curse of dimensionality: in general, it is exponentially harder\nto solve an MDP for N agents than it is to solve a single-agent MDP, because the state\nand action space for N robots can be exponentially larger than for a single-robot system.\nThis enormous complexity has con\ufb01ned MDP planning techniques largely to single-robot\nsystems.\n\nIn many cases, robots in a multi-robot system interact only in limited ways. Robots might\nseek not to collide with each other [1], coordinate their locations to carry out a joint\ntask [4, 6], or consume a joint resource with limited availability [8, 9, 10]. While these\nproblems are not trivially decomposed, they do not necessarily have the worst-case expo-\nnential complexity that characterizes the general case. However, so far we lack effective\nmechanisms for cooperatively solving such MDPs.\n\nHandling this sort of limited interaction is exactly the strength of market-based planning\nalgorithms [10, 12]: by focusing their attention on a limited set of important resources and\nignoring all other interactions, these algorithms reduce the problem of cooperating with\n\n\fother robots to the problem of deciding which resources to produce or consume. Market-\nbased algorithms are particularly attractive for multi-robot planning because many common\ntypes of interactions can be phrased as constraints on resources such as space (two robots\ncan\u2019t occupy the same location at once) and time (a robot can only work on a limited\nnumber of tasks at once).\n\nFrom the point of view of these auction algorithms, the dif\ufb01cult part of the multi-robot\nplanning problem is to compute the probability distribution of the price of each resource at\nevery time step: the optimal price for a resource at time t depends on how much each robot\nproduces or consumes between now and time t, and what each robot\u2019s state is at time t.\nThe resource usage and state depend on the robots\u2019 plans between now and time t, which\nin turn depend on the price. Worse yet, future resource usage depends on random events\nwhich can\u2019t be predicted exactly.\n\nIn this paper, we bring together resource-allocation tehniques from the auction and MDP\nliterature. In particular, we propose a general technique for decomposing multi-robot MDP\nproblems into \u201cloosely coupled\u201d MDPs which interact only through resource production\nand consumption constraints. The decomposition works by turning all interactions into\nstreams of payments between robots, thereby allowing each robot to learn its own local\nvalue function. Prices can be attached to any function of the visitation frequencies of each\nrobot\u2019s states and actions. The actual prices for these resources are set by a \u201cmaster\u201d agent;\nthe master agent takes into account the possibility of re-allocating resources at each step,\nbut it approximates the effect of interactions between robots.\n\nOur approach generalizes a large body of previous literature in multi-robot systems, includ-\ning prior work by Guestrin and Gordon [11]. Our algorithm can be distributed so that each\nrobot reasons only about its own local interactions, and it always produces the same answer\nas a particular centralized planning algorithm.\n\n2 MDPs, linear programs, and duals\n\nA Markov Decision Process (MDP) is a tuple M = fS; A; T ; c; (cid:13); sog. S is a set of N\nstates. A is a set of M actions. T is the dynamics T (s0; a; s) = p(s0 j s; a). The reward\nfunction is c : S (cid:2) A 7! <. The discount factor is (cid:13) 2 [0; 1]. Finally, so 2 S is the initial\nstate. For any MDP there is a value function which indicates how desirable any state is.\nIt is de\ufb01ned as V (s) = maxa (c(s; a) + (cid:13) Ps0 p(s0 j s; a)V (s0)). We can compute V by\nsolving the Bellman linear program (1). Once we have V , we can compute the optimal\npolicy by one-step lookahead. Here V 2