{"title": "Scan Strategies for Meteorological Radars", "book": "Advances in Neural Information Processing Systems", "page_first": 993, "page_last": 1000, "abstract": "We address the problem of adaptive sensor control in dynamic resource-constrained sensor networks. We focus on a meteorological sensing network comprising radars that can perform sector scanning rather than always scanning 360 degrees. We compare three sector scanning strategies. The sit-and-spin strategy always scans 360 degrees. The limited lookahead strategy additionally uses the expected environmental state K decision epochs in the future, as predicted from Kalman filters, in its decision-making. The full lookahead strategy uses all expected future states by casting the problem as a Markov decision process and using reinforcement learning to estimate the optimal scan strategy. We show that the main benefits of using a lookahead strategy are when there are multiple meteorological phenomena in the environment, and when the maximum radius of any phenomenon is sufficiently smaller than the radius of the radars. We also show that there is a trade-off between the average quality with which a phenomenon is scanned and the number of decision epochs before which a phenomenon is rescanned.", "full_text": "Scan Strategies for Adaptive Meteorological Radars\n\nVictoria Manfredi, Jim Kurose\nDepartment of Computer Science\n\nUniversity of Massachusetts\n\n{vmanfred,kurose}@cs.umass.edu\n\nAmherst, MA USA\n\nAbstract\n\nWe address the problem of adaptive sensor control\nin dynamic resource-\nconstrained sensor networks. We focus on a meteorological sensing network com-\nprising radars that can perform sector scanning rather than always scanning 360\u25e6\n.\nWe compare three sector scanning strategies. The sit-and-spin strategy always\nscans 360\u25e6\n. The limited lookahead strategy additionally uses the expected envi-\nronmental state K decision epochs in the future, as predicted from Kalman \ufb01lters,\nin its decision-making. The full lookahead strategy uses all expected future states\nby casting the problem as a Markov decision process and using reinforcement\nlearning to estimate the optimal scan strategy. We show that the main bene\ufb01ts of\nusing a lookahead strategy are when there are multiple meteorological phenomena\nin the environment, and when the maximum radius of any phenomenon is suf\ufb01-\nciently smaller than the radius of the radars. We also show that there is a trade-off\nbetween the average quality with which a phenomenon is scanned and the number\nof decision epochs before which a phenomenon is rescanned.\n\n1 Introduction\n\nTraditionally, meteorological radars, such as the National Weather Service NEXRAD system, are\ntasked to always scan 360 degrees. In contrast, the Collaborative Adaptive Sensing of the Atmo-\nsphere (CASA) Engineering Research Center [5] is developing a new generation of small, low-power\nbut agile radars that can perform sector scanning, targeting sensing when and where the user needs\nare greatest. Since all meteorological phenomena cannot be now all observed all of the time with\nthe highest degree of \ufb01delity, the radars must decide how best to perform scanning. While we fo-\ncus on the problem of how to perform sector scanning in such an adaptive meteorological sensing\nnetwork, it is an instance of the larger class of problems of adaptive sensor control in dynamic\nresource-constrained sensor networks.\n\nGiven the ability of a network of radars to perform sector scanning, how should scanning be adapted\nat each decision epoch? Any scan strategy must consider, for each scan action, both the expected\nquality with which phenomena would be observed, and the expected number of decision epochs\nbefore which phenomena would be \ufb01rst observed (for new phenomena) or rescanned, since not all\nregions are scanned every epoch under sectored scanning. Another consideration is whether to opti-\nmize myopically only over current and possibly past environmental state, or whether to additionally\noptimize over expected future states. In this work we examine three methods for adapting the radar\nscan strategy. The methods differ in the information they use to select a scan con\ufb01guration at a\nparticular decision epoch. The sit-and-spin strategy of always scanning 360 degrees is indepen-\ndent of any external information. The limited lookahead strategies additionally use the expected\nenvironmental state K decision epochs in the future in its decision-making. Finally, the full looka-\nhead strategy has an in\ufb01nite horizon: it uses all expected future states by casting the problem as a\nMarkov decision process and using reinforcement learning to estimate the optimal scan strategy. All\nstrategies, excluding sit-and-spin, work by optimizing the overall \u201cquality\u201d (a term we will de\ufb01ne\n\n1\n\n\fprecisely shortly) of the sensed information about phenomena in the environment, while restricting\nor penalizing long inter-scan intervals.\n\nOur contributions are two-fold. We \ufb01rst introduce the meteorological radar control problem and\nshow how to constrain the problem so that it is amenable to reinforcement learning methods. We\nthen identify conditions under which the computational cost of an in\ufb01nite horizon radar scan strategy\nsuch as reinforcement learning is necessary. With respect to the radar meteorological application,\nwe show that the main bene\ufb01ts of considering expected future states are when there are multiple\nmeteorological phenomena in the environment, and when the maximum radius of any phenomenon\nis suf\ufb01ciently smaller than the radius of the radars. We also show that there is a trade-off between\nthe average quality with which a phenomenon is scanned and the number of decision epochs before\nwhich a phenomenon is rescanned. Finally, we show that for some environments, a limited looka-\nhead strategy is suf\ufb01cient. In contrast to other work on radar control (see Section 5), we focus on\ntracking meteorological phenomena and the time frame over which to evaluate control decisions.\n\nThe rest of this paper is organized as follows. Section 2 de\ufb01nes the radar control problem. Section\n3 describes the scan strategies we consider. Section 4 describes our evaluation framework and\npresents results. Section 5 reviews related work on control and resource allocation in radar and\nsensor networks. Finally, Section 6 summarizes this work and outlines future work.\n\n2 Meteorological Radar Control Problem\n\nMeteorological radar sensing characteristics are such that the smaller the sector that a radar scans\n(until a minimum sector size is reached), the higher the quality of the data collected, and thus, the\nmore likely it is that phenomena located within the sector are correctly identi\ufb01ed [2]. The multi-\nradar meteorological control problem is then as follows. We have a set of radars, with \ufb01xed locations\nand possibly overlapping footprints. Each radar has a set of scan actions from which it chooses. In\nthe simplest case, a radar scan action determines the size of the sector to scan, the start angle, the\nend angle, and the angle of elevation. We will not consider elevation angles here. Our goal is\nto determine which scan actions to use and when to use them. An effective scanning strategy must\nbalance scanning small sectors (thus implicitly not scanning other sectors), to ensure that phenomena\nare correctly identi\ufb01ed, with scanning a variety of sectors, to ensure that no phenomena are missed.\n\nWe will evaluate the performance of different scan strategies based on inter-scan time, quality, and\ncost. Inter-scan time is the number of decision epochs before a phenomenon is either \ufb01rst observed\nor rescanned; we would like this value to be below some threshold. Quality measures how well a\nphenomenon is observed, with quality depending on the amount of time a radar spends sampling\na voxel in space, the degree to which a meteorological phenomena is scanned in its (spatial) en-\ntirety, and the number of radars observing a phenomenon; higher quality scans are better. Cost is\na meta-metric that combines inter-scan time and quality, and that additionally considers whether a\nphenomenon was never scanned. The radar control problem is that of dynamically choosing the scan\nstrategy of the radars over time to maximize quality while minimizing inter-scan time.\n\n3 Scan Strategies\n\nWe de\ufb01ne a radar con\ufb01guration to be the start and end angles of the sector to be scanned by an\nindividual radar for a \ufb01xed interval of time. We de\ufb01ne a scan action to be a set of radar con\ufb01gurations\n(one con\ufb01guration for each radar in the meteorological sensing network). We de\ufb01ne a scan strategy\nto be an algorithm for choosing scan actions. In Section 3.1 we de\ufb01ne the quality function associated\nwith different radar con\ufb01gurations and in Section 3.2 we de\ufb01ne the quality functions associated with\ndifferent scan strategies.\n\n3.1 Quality Function\n\nThe quality function associated with a given scan action was proposed by radar meteorologists in [5]\nand has two components. There is a quality component Up associated with scanning a particular\nphenomenon p. There is also a quality component Us associated with scanning a sector, which is\nindependent of any phenomena in that sector. Let sr be the radar con\ufb01guration for a single radar r\nand let Sr be the scan action under consideration. From [5], we compute the quality Up(p, Sr) of\n\n2\n\n\fFigure 1: Step functions used by the Up and Us quality functions, from [9]\n\nscanning a phenomenon p using scan action Sr with the following equations,\n\nUp(p, sr) = Fc (c(p, sr)) \u00d7(cid:20)\u03b2Fd (d(r, p)) + (1 \u2212 \u03b2)Fw(cid:18) w(sr)\n360 (cid:19)(cid:21)\n\nUp(p, Sr) = maxsr\u2208Sr [Up(p, sr)]\n\n(1)\n\nwhere\n\nw(sr) = size of sector sr scanned by r\na(r, p) = minimal angle that would allow r to cover p\nc(p, sr) = w(sr)\na(r, p)\nh(r, p) = distance from r to geometric center of p\n\n= coverage of p by r scanning sr\n\nhmax(r) = range of radar r\nd(r, p) = h(r, p)\nhmax(r)\n\n= normalized distance from r to p\n\n\u03b2 = tunable parameter\n\nUp(p, Sr) is the maximum quality obtained for scanning phenomenon p over all possible radars and\ntheir associated radar con\ufb01gurations sr. Up(p, sr) is the quality obtained for scanning phenomenon\np using a speci\ufb01c radar r and radar con\ufb01guration sr. The functions Fc(\u00b7), Fw(\u00b7), and Fd(\u00b7) from [5]\nare plotted in Figure 1. Fc captures the effect on quality due to the percentage of the phenomenon\ncovered; to usefully scan a phenomenon, at least 95% of the phenomenon must be scanned. Fw\ncaptures the effect of radar rotation speed on quality; as rotation speed is reduced, quality increases.\nFd captures the effects of the distance from the radar to the geometrical center of the phenomenon on\nquality; the further away the radar center is from the phenomenon being scanned, the more degraded\nwill be the scan quality due to attenuation. Due to the Fw function, the quality function Up(p, sr)\noutputs the same quality for scan angles of 181\u25e6\n. The quality Us(ri, sr) for scanning a\nsubsector i of radar r scanned using con\ufb01guration sr is,\n\nto 360\u25e6\n\n360 (cid:19)\nUs(ri, sr) = Fw(cid:18) w(sr)\n\n(2)\n\nIntuitively, a sector scanning strategy is only preferable when the quality function is such that the\nquality gained for scanning a sector is greater than the quality lost for not scanning another sector.\n\n3.2 Scan Strategies\n\nWe compare the performance of the following three scan strategies. The strategies differ in whether\nthey optimize quality over only current or also future expected states. For example, suppose a storm\ncell is about to move into a high-quality multi-doppler region (i.e., the area where multiple radar\nfootprints overlap). By considering future expected states, a lookahead strategy can anticipate this\nevent and have all radars focused on the storm cell when it enters the multi-doppler region, rather\nthan expending resources (with little \u201creward\u201d) to scan the storm cell just before it enters this region.\n(i) Sit-and-spin strategy. All radars always scan 360\u25e6\n(ii) Limited \u201clookahead\u201d strategy. We examine both a 1-step and a 2-step look-ahead scan strategy.\nAlthough we do not have an exact model of the dynamics of different phenomena, to perform the\n\n.\n\n3\n\n 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1FccFc Function 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1Fww/360Fw Function 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1.2FddFd Function\flook-ahead we estimate the future attributes of each phenomenon using a separate Kalman \ufb01lter. For\neach \ufb01lter, the true state x is a vector comprising the (x, y) location and velocity of the phenomenon,\nand the measurement y is a vector comprising only the (x, y) location. The Kalman \ufb01lter assumes\nthat the state at time t is a linear function of the state at time t \u2212 1 plus some Gaussian noise, and\nthat the measurement at time t is a linear function of the state at time t plus some Gaussian noise. In\nparticular, xt = Axt\u22121 + N[0, Q] and yt = Bxt + N[0, R].\nFollowing work by [8], we initialize each Kalman \ufb01lter as follows. The A matrix re\ufb02ects that storm\ncells typically move to the north-east. The B matrix, which when multiplied with xt returns xt,\nassumes that the observed state yt is directly the true state xt plus some Gaussian noise. The Q\nmatrix assumes that there is little noise in the true state dynamics. Finally, the measurement error\ncovariance matrix R is a function of the quality Up with which phenomenon p was scanned at time\nt. We discuss how to compute the \u03c3t\u2019s in Section 4. We use the \ufb01rst location measurement of a\nstorm cell y0, augmented with the observed velocity, as the the initial state x0. We assume that our\nestimate of x0 has little noise and use .0001 \u2217 I for the initial covariance P0.\nA =(cid:34) 1\n\n0 (cid:105) , Q =(cid:34) .0001\n\n(cid:35) , B =(cid:104) 1\n\n(cid:35) , R =(cid:104) \u03c3t\n\n0\n\n\u03c3t (cid:105)\n\n0\n\n.0001\n\n.0001\n\n0\n0\n0\n\n0\n0\n\n0\n1\n\n0\n0\n\n0\n\n0\n\n0\n1\n0\n0\n\n1\n0\n1\n0\n\n0\n1\n0\n1\n\n0\n0\n0\n\n0\n\n.0001\n\nWe compute the k-step look-ahead quality for different sets of radar con\ufb01gurations Sr with,\n\n0\n\n0\n0\n\n0\n0\n0\n\nNp(cid:88)\n\ni=1\n\nUK(Sr,1|Tr) =\n\n\u03c6k\u22121\n\nUp(pi,k, Sr,k|Tr)\n\nK(cid:88)\n\nk=1\n\nwhere Np is the number of phenomena in the environment in the current decision epoch, pi,0 is\nthe current set of observed attributes for phenomenon i, pi,k is the k-step set of predicted attributes\nfor phenomenon i, Sr,k is the set of radar con\ufb01gurations for the kth decision epoch in the future,\nand \u03c6 is a tunable discount factor between 0 and 1. The optimal set of radar con\ufb01gurations is\nr,1 = argmaxSr,1UK(Sr,1|Tr). To account for the decay of quality for unscanned sectors\n\u2217\nthen S\nand phenomena, and to consider the possibility of new phenomena appearing, we restrict Sr to be\nthose scan actions that ensure that every sector has been scanned at least once in the last Tr decision\nepochs. Tr is a tunable parameter whose purpose is to satisfy the meteorological dictate found in [5],\nthat all sectors be scanned, for instance by a 360\u25e6\n(iii) Full \u201clookahead\u201d strategy. We formulate the radar control problem as a Markov decision\nprocess (MDP) and use reinforcement learning to obtain a lookahead scan strategy as follows. While\na POMDP (partially observable MDP) could be used to model the environmental uncertainty, due to\nthe cost of solving a POMDP with a large state space [9], we choose to formulate the radar control\nproblem as an MDP with quality (or uncertainty) variables as in an augmented MDP [6].\n\nscan, at most every 5 minutes.\n\nS is the observed state of the environment. The state is a function of the observed number of storms,\nthe observed x, y velocity of each storm, and the observed dimensions of each storm cell given by\nx, y center of mass and radius. To model the uncertainty in the environment, we additionally de\ufb01ne\nas part of the state quality variables up and us based on the Up and Us quality functions de\ufb01ned\nin Equations (1) and (2) in Section 3.1. up is the quality Up(\u00b7) with which each storm cell was\nobserved, and us is the current quality Us(\u00b7) of each 90\u25e6\nsubsector, starting at 0, 90, 180, or 270\u25e6\n.\nA is the set of actions available to the radars. This is the set of radar con\ufb01gurations for a given\ndecision epoch. We restrict each radar to scanning subsectors that are a multiple of 90\u25e6\n, starting at\n0, 90, 180, or 270\u25e6\nThe transition function T (S\u00d7 A\u00d7 S) \u2192 [0, 1] encodes the observed environment dynamics: specif-\nically the appearance, disappearance, and movement of storm cells and their associated attributes.\nFor meteorological radar control, the next state really is a function of not just the current state but\nalso the action executed in the current state. For instance, if a radar scans 180 degrees rather than\n360 degrees, then any new storm cells that appear in the unscanned areas will not be observed. Thus,\nthe new storm cells that will be observed will depend on the scanning action of the radar.\nThe cost function C(S, A, S) \u2192 R encodes the goals of the radar sensing network. C is a function\nof the error between the true state and the observed state, whether all storms have been observed,\n\n. Thus, with N radars there are 13N possible actions at each decision epoch.\n\n4\n\n\fand a penalty term for not rescanning a storm within Tr decision epochs. More precisely,\n\nC =\n\n|do\n\nij\n\n\u2212 dij| + (Np \u2212 N o\n\np )Pm +\n\nI(ti)Pr\n\n(3)\n\nN o\n\np(cid:88)\n\ni=1\n\nNd(cid:88)\n\nj=1\n\nNp(cid:88)\n\ni=1\n\np is the observed number of storms, Nd is the number of attributes per storm, do\n\nwhere N o\nij is the\nobserved value of attribute j of storm i, dij is the true value of attribute j of storm i, Np is the true\nnumber of storms, Pm is the penalty for missing a storm, ti is the number of decision epochs since\nstorm i was last scanned, Pr is the penalty for not scanning a storm at least once within Tr decision\nepochs, and I(ti) is an indicator function that equals 1 when ti \u2265 Tr. The quality with which a\nstorm is observed determines the difference between the observed and true values of its attributes.\n\nWe use linear Sarsa(\u03bb) [15] as the reinforcement learning algorithm to solve the MDP for the radar\ncontrol problem. To obtain the basis functions, we use tile coding [13, 14]. Rather than de\ufb01ning\ntilings over the entire state space, we de\ufb01ne a separate set of tilings for each of the state variables.\n\n4 Evaluation\n\n4.1 Simulation Environment\n\nWe consider radars with both 10 and 30km radii as in [5, 17]. Two overlapping radars are placed\nin a 90km \u00d7 60km rectangle, one at (30km, 30km) and one at (60km, 30km). A new storm cell\ncan appear anywhere within the rectangle and a maximum number of cells can be present on any\ndecision epoch. When the (x, y) center of a storm cell is no longer within range of any radar, the\ncell is removed from the environment. Following [5], we use a 30-second decision epoch.\nWe derive the maximum storm cell radius from [11], which uses 2.83km as \u201cthe radius from the cell\n\u22121 of the cell center intensity.\u201d We then permit a\ncenter within which the intensity is greater than e\nstorm cell\u2019s radius to range from 1 to 4 km. To determine the range of storm cell velocities, we use 39\nreal storm cell tracks obtained from meteorologists. Each track is a series of (latitude, longitude)\ncoordinates. We \ufb01rst compute the differences in latitude and longitude, and in time, between suc-\ncessive pairs of points. We then \ufb01t the differences using Gaussian distributions. We obtain, in units\nof km/hour, that the latitude (or x) velocity has mean 9.1 km/hr and std. dev. of 35.6 km/hr and that\nthe longitude (or y) velocity has mean 16.7 km/hr and std. dev. of 28.8 km/hr. To obtain a storm\ncell\u2019s (x, y) velocity, we then sample the appropriate Gaussian distribution.\nTo simulate the environment transitions we use a stochastic model of rainfall in which storm cell\narrivals are modeled using a spatio-temporal Poisson process, see [11, 1]. To determine the number\nof new storm cells to add during a decision epoch, we sample a Poisson random variable with rate\n\u03bb\u03b7\u03b4a\u03b4t with \u03bb = 0.075 storm cells/km2 and \u03b7 = 0.006 storm cells/minute from [11]. From the\nradar setup we have \u03b4a = 90 \u00b7 60 km2, and from the 30-second decision epoch we have \u03b4t = 0.5\nminutes. New storm cells are uniformly randomly distributed in the 90km \u00d7 60km region and we\nuniformly randomly choose new storm cell attributes from their range of values. This simulates the\ntrue state of the environment over time. The following simpli\ufb01ed radar model determines how well\nthe radars observe the true environmental state under a given set of radar con\ufb01gurations. If a storm\ncell p is scanned using a set of radar con\ufb01gurations Sr, the location, velocity, and radius attributes\nare observed as a function of the Up(p, Sr) quality de\ufb01ned in Section 3.1. Up(p, Sr) returns a value\nu between zero and one. Then the observed value of the attribute is the true value of the attribute\nplus some Gaussian noise distributed with mean zero and standard deviation (1 \u2212 u)V max/\u03c1 where\nV max is the largest positive value the attribute can take and \u03c1 is a scaling term that will allow us to\nadjust the noise variability. Since u depends on the decision epoch t, for the k-step look-ahead scan\nstrategy we also use \u03c3t = (1 \u2212 ut)V max/\u03c1 to compute the measurement error covariance matrix,\nR, in our Kalman \ufb01lter.\nWe parameterize the MDP cost function as follows. We assume that any unobserved storm cell has\nbeen observed with quality 0, hence u = 0. Summing over (1 \u2212 u)V max/\u03c1 for all attributes with\n\u03c3 = 0 gives the value Pm = 15.5667, and thus a penalty of 15.5667 is received for each unobserved\nstorm cell. If a storm cell is not seen within Tr = 4 decision epochs a penalty of Pr = 200 is\ngiven. Using the value 200 ensures that if a storm cell has not been rescanned within the appropriate\namount of time, this part of the cost function will dominate.\n\n5\n\n\fWe distinguish the true environmental state known only to the simulator from the observed environ-\nmental state used by the scan strategies for several reasons. Although radars provide measurements\nabout meteorological phenomena, the true attributes of the phenomena are unknown. Poor over-\nlap in a dual-Doppler area, scanning a subsector too quickly or slowly, or being unable to obtain a\nsuf\ufb01cient number of elevation scans will degrade the quality of the measurements. Consequently,\nmodels of previously existing phenomena may contain estimation errors such as incorrect velocity,\npropagating error into the future predicted locations of the phenomena. Additionally, when a radar\nscans a subsector, it obtains more accurate estimates of the phenomena in that subsector than if it\nhad scanned a full 360\u25e6\n\n, but less accurate estimates of the phenomena outside the subsector.\n\n4.2 Results\n\nIn this section we present experimental results obtained using the simulation model of the previous\nsection and the scan strategies described in Section 3. For the limited lookahead strategy we use\n\u03b2 = 0.5, \u03bap = 0.25, \u03bas = 0.25, and \u03c6 = 0.75. For Sarsa(\u03bb), we use a learning rate \u03b1 = 0.0005,\nexploration rate \u0001 = 0.01, discount factor \u03b3 = 0.9, and eligibility decay \u03bb = 0.3. Additionally,\nwe use a single tiling for each state variable. For the (x, y) location and radius tilings, we use\na granularity of 1.0; for the (x, y) velocity, phenomenon con\ufb01dence, and radar sector con\ufb01dence\ntilings, we use a granularity of 0.1. When there are a maximum of four storms, we restrict Sarsa(\u03bb)\nto scanning only 180 or 360 degree sectors to reduce the time needed for convergence. Finally, all\nstrategies are always compared over the same true environmental state.\n\nFigure 2(a) shows an example convergence pro\ufb01le of Sarsa(\u03bb) when there are at most four storms\nin the environment. Figure 2(b) shows the average difference in scan quality between the learned\nSarsa(\u03bb) strategy and sit-and-spin and 2-step strategies. When 1/\u03c1 = 0.001 (i.e., little measurement\nnoise) Sarsa(\u03bb) has the same or higher relative quality than does sit-and-spin, but signi\ufb01cantly lower\nrelative quality (0.05 to 0.15) than does the 2-step. This in part re\ufb02ects the dif\ufb01culty of learning\nto perform as well as or better than Kalman \ufb01ltering. Examining the learned strategy showed that\nwhen there was at most one storm with observation noise 1/\u03c1 = 0.001, Sarsa(\u03bb) learned to simply\nsit-and-spin, since sector scanning conferred little bene\ufb01t. As the observation noise increases, the\nrelative difference increases for sit-and-spin, and decreases for the 2-step. Figure 2(c) shows the\naverage difference in cost between the learned Sarsa(\u03bb) scan strategy and the sit-and-spin and 2-step\nstrategies for a 30 km radar radius. Sarsa(\u03bb) has the lowest average cost.\n\nLooking at the Sarsa(\u03bb) inter-scan times, Figure 2 (d) shows that, as a consequence of the penalty for\nnot scanning a storm within Tr = 4 time-steps, while Sarsa(\u03bb) may rescan fewer storm cells within\n1, 2, or 3 decision epochs than do the other scan strategies, it scans almost all storm cells within\n4 epochs. Note that for the sit-and-spin CDF, P [X \u2264 1] is not 1; due to noise, for example, the\nmeasured location of a storm cell may be (expected) outside any radar footprint and consequently\nthe storm cell will not be observed. Thus the 2-step has more inter-scan times greater than Tr = 4\nthan does Sarsa(\u03bb). Together with Figure 2(b) and (c), this implies that there is a trade-off between\ninter-scan time and scan quality. We hypothesize that this trade-off occurs because increasing the\nsize of the scan sectors ensures that inter-scan time is minimized, but decreases the scan quality.\n\nOther results (not shown, see [7]) examine the average difference in quality between the 1-step and 2-\nstep strategies for 10 km and 30 km radar radii. With a 10 km radius, the 1-step quality is essentially\nthe same as the 2-step quality. We hypothesize that this is a consequence of the maximum storm cell\nradius, 4 km, relative to the 10 km radar radius. With a 30 km radius and at most eight storm cells,\nthe 2-step quality is about 0.005 better than the 1-step and about 0.07 better than sit-and-spin (recall\nthat quality is a value between 0 and 1). Now recall that Figure 2(b) shows that with a 30 km radius\nand at most four storm cells, the 2-step quality is as much as 0.12 than sit-and-spin. This indicates\nthat there may be some maximum number of storms above which it is best to sit-and-spin.\n\nOverall, depending on the environment in which the radars are deployed, there are decreasing\nmarginal returns for considering more than 1 or 2 future expected states. Instead, the primary value\nof reinforcement learning for the radar control problem is balancing multiple con\ufb02icting goals, i.e.,\nmaximizing scan quality while minimizing inter-scan time. Implementing the learned reinforcement\nlearning scan strategy in a real meteorological radar network requires addressing the differences be-\ntween the of\ufb02ine environment in which the learned strategy is trained, and the online environment\nin which the strategy is deployed. Given the slow convergence time for Sarsa(\u03bb) (on the order of\n\n6\n\n\f(a)\n\n(c)\n\n(b)\n\n(d)\n\nFigure 2: Comparing the scan strategies based on quality, cost, and inter-scan time. Recall that \u03c1 is\na scaling term used to determine measurement noise, see Section 4.1.\n\ndays), training solely online is likely infeasible, although the time complexity could be mitigated\nby using hierarchical reinforcement learning methods and semi-Markov decision process. Some\nonline training could be achieved by treating 360\u25e6\nscans as the true environment state. Then when\nunknown states are entered, learning could be performed, alternating between 360\u25e6\nscans to gauge\nthe true state of the environment and exploratory scans by the reinforcement learning algorithm.\n\n5 Related Work\n\nOther reinforcement learning applications in large state spaces include robot soccer [12] and heli-\ncopter control [10]. With respect to radar control, [4] examines the problem of using agile radars\non airplanes to detect and track ground targets. They show that lookahead scan strategies for radar\ntracking of a ground target outperform myopic strategies. In comparison, we consider the problem of\ntracking meteorological phenomena using ground radars. [4] uses an information theoretic measure\nto de\ufb01ne the reward metric and proposes both an approximate solution to solving the MDP Bellman\nequations as well as a Q-learning reinforcement learning-based solution. [16] examines where to\ntarget radar beams and which waveform to use for electronically steered phased array radars. They\nmaintain a set of error covariance matrices and dynamical models for existing targets, as well as\n\n7\n\n0123456x 10414161820222426EpisodeAverage Cost Per Episode of 1000 StepsRadar Radius = 30km, Max 4 Storms sit\u2212and\u2212spinsarsa00.010.020.030.040.050.060.070.080.090.1\u22120.2\u22120.15\u22120.1\u22120.0500.050.10.151/rAverage Difference in Scan Quality (250,000 steps)Radar Radius = 30km 2step \u2212 sarsa, max 1 storm2step \u2212 sarsa, max 4 stormssitandspin\u2212 sarsa, max 1 stormsitandspin \u2212 sarsa, max 4 storms00.010.020.030.040.050.060.070.080.090.1\u22120.500.511.522.533.544.51/rAverage Difference in Cost (250,000 steps)Radar Radius = 30km 2step \u2212 sarsa, max 1 storm2step \u2212 sarsa, max 4 stormssitandspin\u2212 sarsa, max 1 stormsitandspin \u2212 sarsa, max 4 storms0123456789100.80.820.840.860.880.90.920.940.960.981x = # of decision epochs between storm scansP[X <= x]Max # of Storms = 4, Radar Radius = 30km sit\u2212and\u2212spin, 1/s=0.11step, 1/s=0.12step, 1/s=0.1sarsa, 1/s=0.1\ftrack existence probability density functions to model the probability that targets appear. They then\nchoose the scan mode for each target that has both the longest revisit time for scanning a target and\nerror covariance below a threshold. They do this for control 1-step and 2-steps ahead and show\nthat considering the environment two decision epochs ahead outperforms a 1-step look-ahead for\ntracking of multiple targets.\n\n6 Conclusions and Future Work\n\nIn this work we compared the performance of myopic and lookahead scan strategies in the context\nof the meteorological radar control problem. We showed that the main bene\ufb01ts of using a lookahead\nstrategy are when there are multiple meteorological phenomena in the environment, and when the\nmaximum radius of any phenomenon is suf\ufb01ciently smaller than the radius of the radars. We also\nshowed that there is a trade-off between the average quality with which a phenomenon is scanned\nand the number of decision epochs before which a phenomenon is rescanned. Overall, considering\nonly scan quality, a simple lookahead strategy is suf\ufb01cient. To additionally consider inter-scan time\n(or optimize over multiple metrics of interest), a reinforcement learning strategy is useful. For future\nwork, rather than identifying a policy that chooses the best action to execute in a state for a single\ndecision epoch, it may be useful to consider actions that cover multiple epochs, as in semi-Markov\ndecision processes or to use controllers from robotics [3]. We would also like to incorporate more\nradar and meteorological information into the transition, measurement, and cost functions.\n\nAcknowledgments\n\nThe authors thank Don Towsley for his input. This work was supported in part by the National Sci-\nence Foundation under the Engineering Research Centers Program, award number EEC-0313747.\nAny opinions, \ufb01ndings and conclusions or recommendations expressed in this material are those of\nthe author(s) and do not necessarily re\ufb02ect those of the National Science Foundation.\n\nReferences\n[1] D. Cox and V. Isham. A simple spatial-temporal model of rainfall. Proceedings of the Royal Society of London. Series A, Mathematical\n\nand Physical Sciences, 415:1849:317\u2013328, 1988.\n\n[2] B. Donovan and D. J. McLaughlin. Improved radar sensitivity through limited sector scanning: The DCAS approach. In Proceedings of\n\nAMS Radar Meteorology, 2005.\n\n[3] M. Huber and R. Grupen. A feedback control structure for on-line learning tasks. Robotics and Autonomous Systems, 22(3-4):303\u2013315,\n\n1997.\n\n[4] C. Kreucher and A. O. H. III. Non-myopic approaches to scheduling agile sensors for multistage detection, tracking and identi\ufb01cation.\n\nIn Proceedings of ICASSP, pages 885\u2013888, 2005.\n\n[5] J. Kurose, E. Lyons, D. McLaughlin, D. Pepyne, B. Phillips, D. Westbrook, and M. Zink. An end-user-responsive sensor network\n\narchitecture for hazardous weather detection, prediction and response. AINTEC, 2006.\n\n[6] C. Kwok and D. Fox. Reinforcement learning for sensing strategies. In IROS, 2004.\n\n[7] V. Manfredi and J. Kurose. Comparison of myopic and lookahead scan strategies for meteorological radars. Technical Report U of\n\nMassachusetts Amherst, 2006-62, 2006.\n\n[8] V. Manfredi, S. Mahadevan, and J. Kurose. Switching kalman \ufb01lters for prediction and tracking in an adaptive meteorological sensing\n\nnetwork. In IEEE SECON, 2005.\n\n[9] K. Murphy. A survey of POMDP solution techniques. Technical Report U.C. Berkeley, 2000.\n\n[10] A. Ng, A. Coates, M. Diel, V. Ganapathi, J. Schulte, B. Tse, E. Berger, and E. Liang.\n\nInverted autonomous helicopter \ufb02ight via\n\nreinforcement learning. In International Symposium on Experimental Robotics, 2004.\n\n[11]\n\nI. Rodrigues-Iturbe and P. Eagleson. Mathematical models of rainstorm events in space and time. Water Resources Research, 23:1:181\u2013\n190, 1987.\n\n[12] P. Stone, R. Sutton, and G. Kuhlmann. Reinforcement learning for robocup-soccer keepaway. Adaptive Behavior, 3, 2005.\n\n[13] R. Sutton. Tile coding software. http://rlai.cs.ualberta.ca/RLAI/RLtoolkit/tiles.html.\n\n[14] R. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In NIPS, 1996.\n\n[15] R. Sutton and A. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, Massachusetts, 1998.\n\n[16] S. Suvorova, D. Musicki, B. Moran, S. Howard, and B. L. Scala. Multi step ahead beam and waveform scheduling for tracking of\n\nmanoeuvering targets in clutter. In Proceedings of ICASSP, 2005.\n\n[17] J. M. Trabal, B. C. Donovan, M. Vega, V. Marrero, D. J. McLaughlin, and J. G. Colom. Puerto Rico student test bed applications and\n\nsystem requirements document development. In Proceedings of the 9th International Conference on Engineering Education, 2006.\n\n8\n\n\f", "award": [], "sourceid": 349, "authors": [{"given_name": "Victoria", "family_name": "Manfredi", "institution": null}, {"given_name": "Jim", "family_name": "Kurose", "institution": null}]}