Ciamac C. Moallemi, Benjamin Roy
We develop a protocol for optimizing dynamic behavior of a network of simple electronic components, such as a sensor network, an ad hoc network of mobile devices, or a network of communication switches. This protocol requires only local communication and simple computa- tions which are distributed among devices. The protocol is scalable to large networks. As a motivating example, we discuss a problem involv- ing optimization of power consumption, delay, and buffer overﬂow in a sensor network. Our approach builds on policy gradient methods for optimization of Markov decision processes. The protocol can be viewed as an extension of policy gradient methods to a context involving a team of agents op- timizing aggregate performance through asynchronous distributed com- munication and computation. We establish that the dynamics of the pro- tocol approximate the solution to an ordinary differential equation that follows the gradient of the performance objective.