{"title": "When are Kalman-Filter Restless Bandits Indexable?", "book": "Advances in Neural Information Processing Systems", "page_first": 1711, "page_last": 1719, "abstract": "We study the restless bandit associated with an extremely simple scalar Kalman filter model in discrete time. Under certain assumptions, we prove that the problem is {\\it indexable} in the sense that the {\\it Whittle index} is a non-decreasing function of the relevant belief state. In spite of the long history of this problem, this appears to be the first such proof. We use results about {\\it Schur-convexity} and {\\it mechanical words}, which are particularbinary strings intimately related to {\\it palindromes}.", "full_text": "When are Kalman-Filter Restless Bandits Indexable?\n\nChristopher Dance and Tomi Silander\n\nXerox Research Centre Europe\n\n6 chemin de Maupertuis, Meylan, Is`ere, France\n{dance,silander}@xrce.xerox.com\n\nAbstract\n\nWe study the restless bandit associated with an extremely simple scalar Kalman\n\ufb01lter model in discrete time. Under certain assumptions, we prove that the prob-\nlem is indexable in the sense that the Whittle index is a non-decreasing function of\nthe relevant belief state. In spite of the long history of this problem, this appears\nto be the \ufb01rst such proof. We use results about Schur-convexity and mechanical\nwords, which are particular binary strings intimately related to palindromes.\n\n1\n\nIntroduction\n\nWe study the problem of monitoring several time series so as to maintain a precise belief while min-\nimising the cost of sensing. Such problems can be viewed as POMDPs with belief-dependent re-\nwards [3] and their applications include active sensing [7], attention mechanisms for multiple-object\ntracking [22], as well as online summarisation of massive data from time-series [4]. Speci\ufb01cally, we\ndiscuss the restless bandit [24] associated with the discrete-time Kalman \ufb01lter [19]. Restless bandits\ngeneralise bandit problems [6, 8] to situations where the state of each arm (project, site or target)\ncontinues to change even if the arm is not played. As with bandit problems, the states of the arms\nevolve independently given the actions taken, suggesting that there might be ef\ufb01cient algorithms for\nlarge-scale settings, based on calculating an index for each arm, which is a real number associated\nwith the (belief-)state of that arm alone. However, while bandits always have an optimal index pol-\nicy (select the arm with the largest index), it is known that no index policy can be optimal for some\ndiscrete-state restless bandits [17] and such problems are in general PSPACE-hard even to approxi-\nmate to any non-trivial factor [10]. Further, in this paper we address restless bandits with real-valued\nrather than discrete states. On the other hand, Whittle proposed a natural index policy for restless\nbandits [24], but this policy only makes sense when the restless bandit is indexable (Section 2).\nBrie\ufb02y, a restless bandit is said to be indexable when an optimal solution to a relaxed version of the\nproblem consists in playing all arms whose indices exceed a given threshold. (The relaxed version\nof the problem relaxes the constraint on the number of arms pulled per turn to a constraint on the\naverage number of arms pulled per turn). Under certain conditions, indexability implies a form of\nasymptotic optimality of Whittle\u2019s policy for the original problem [23, 20].\nRestless bandits associated with scalar Kalman(-Bucy) \ufb01lters in continuous time were recently\nshown to be indexable [12] and the corresponding discrete-time problem has attracted considerable\nattention over a long period [15, 11, 16, 21]. However, that attention has produced no satisfactory\nproof of indexability \u2013 even for scalar time-series and even if we assume that there is a monotone\noptimal policy for the single-arm problem, which is a policy that plays the arm if and only if the\nrelevant belief-state exceeds some threshold (here the relevant belief-state is a posterior variance).\nTheorem 1 of this paper addresses that gap. After formalising the problem (Section 2), we de-\nscribe the concepts and intuition (Section 3) behind the main result (Section 4). The main tools\nare mechanical words (which are not suf\ufb01ciently well-known) and Schur convexity. As these tools\nare associated with rather general theorems, we believe that future work (Section 5) should enable\nsubstantial generalisation of our results.\n\n1\n\n\f2 Problem and Index\n\n:= (Y1,t, . . . , YN,t),Ht\n\nWe consider the problem of tracking N time-series, which we call arms, in discrete time. The state\nZi,t \u2208 R of arm i at time t \u2208 Z+ evolves as a standard-normal random walk independent of every-\nthing but its immediate past (Z+, R\u2212 and R+ all include zero). The action space is U := {1, . . . , N}.\nAction ut = i makes an expensive observation Yi,t of arm i which is normally-distributed about Zi,t\nwith precision bi \u2208 R+ and we receive cheap observations Yj,t of each other arm j with precision\naj \u2208 R+ where aj < bj and aj = 0 means no observation at all.\nLet Zt, Yt,Ht,Ft be the state, observation, history and observed history, so that Zt\n:= ((Z0, u0, Y0), . . . , (Zt, ut, Yt)) and Ft\n(Z1,t, . . . , ZN,t), Yt\n((u0, Y0), . . . , (ut, Yt)). Then we formalise the above as (1\u00b7 is the indicator function)\n1ut(cid:54)=i\nZi,0 \u223c N (0, 1), Zi,t+1 | Ht \u223c N (Zi,t, 1), Yi,t | Ht\u22121, Zt, ut \u223c N\nbi\nNote that this setting is readily generalised to E[(Zi,t+1 \u2212 Zi,t)2] (cid:54)= 1 by a change of variables.\nThus the posterior belief is given by the Kalman \ufb01lter as Zi,t | Ft \u223c N ( \u02c6Zi,t, xi,t) where the\nposterior mean is \u02c6Zi,t \u2208 R and the error variance xi,t \u2208 R+ satis\ufb01es\nxi,t+1 = \u03c6i,1ut+1=i(xi,t) where \u03c6i,0(x) :=\naix + ai + 1\nProblem KF1. Let \u03c0 be a policy so that ut = \u03c0(Ft\u22121). Let x\u03c0\ni,t be the error variance under \u03c0. The\nproblem is to choose \u03c0 so as to minimise the following objective for discount factor \u03b2 \u2208 [0, 1). The\ni,t with weights wi \u2208 R+ plus observation\nobjective consists of a weighted sum of error variances x\u03c0\n\u221e(cid:88)\nN(cid:88)\ncosts hi \u2208 R+ for i = 1, . . . , N:\n\n(cid:34) \u221e(cid:88)\n\nand \u03c6i,1(x) :=\n\n\u03b2t(cid:8)hi1ut=i + wix\u03c0\n\n\u03b2t(cid:8)hi1ut=i + wix\u03c0\n\nbix + bi + 1\n\n(cid:9)(cid:35)\n\nN(cid:88)\n\n(cid:18)\n\n(cid:19)\n\n.\n\n(1)\n\n1ut=i\n\nx + 1\n\nx + 1\n\n:=\n:=\n\ni,t\n\n=\n\n(cid:9)\n\nE\n\nZi,t,\n\n+\n\nai\n\n.\n\ni,t\n\nt=0\n\ni=1\n\nt=0\n\ni=1\n\nwhere the equality follows as (1) is a deterministic mapping (and assuming \u03c0 is deterministic).\nt , \u03c60(\u00b7), . . . instead of\nSingle-Arm Problem and Whittle Index. Now \ufb01x an arm i and write x\u03c0\nt,i, \u03c6i,0(\u00b7), . . . . Say there are now two actions ut = 0, 1 corresponding to cheap and expensive\nx\u03c0\nobservations respectively and the expensive observation now costs h + \u03bd where \u03bd \u2208 R. The single-\narm problem is to choose a policy, which here is an action sequence, \u03c0 := (u0, u1, . . . )\nt } where x0 = x.\n\n(2)\nLet Q(x, \u03b1|\u03bd) be the optimal cost-to-go in this problem if the \ufb01rst action must be \u03b1 and let \u03c0\u2217 be an\noptimal policy, so that\n\nso as to minimise V \u03c0(x|\u03bd) :=\n\n\u03b2t {(h + \u03bd)ut + wx\u03c0\n\n\u221e(cid:88)\n\nt=0\n\nQ(x, \u03b1|\u03bd) := (h + \u03bd)\u03b1 + wx + \u03b2V \u03c0\u2217\n\n(\u03c6\u03b1(x)|\u03bd).\n\nFor any \ufb01xed x \u2208 R+, the value of \u03bd for which actions u0 = 0 and u0 = 1 are both optimal is\nknown as the Whittle index \u03bbW (x) assuming it exists and is unique. In other words\n\nThe Whittle index \u03bbW (x) is the solution to Q(x, 0|\u03bbW (x)) = Q(x, 1|\u03bbW (x)).\n\nLet us consider a policy which takes action u0 = \u03b1 then acts optimally producing actions u\u03b1\u2217\nand error variances x\u03b1\u2217\n\n(3)\nt (x)\n\n\u221e(cid:88)\n\n\u03b2t(cid:8)(h + \u03bbW (x))u0\u2217\n\nt (x)(cid:9) =\nt (x). Then (3) gives\nt + wx0\u2217\n(cid:80)\u221e\nt=0\n(cid:80)\u221e\nSolving this linear equation for the index \u03bbW (x) gives\nt=1 \u03b2t(x0\u2217\nt=0 \u03b2t(u1\u2217\n\n\u03bbW (x) = w\n\n\u221e(cid:88)\n\nt=0\n\n\u03b2t(cid:8)(h + \u03bbW (x))u1\u2217\n\nt (x)(cid:9) .\n\nt + wx1\u2217\n\nt (x) \u2212 x1\u2217\nt (x) \u2212 u0\u2217\n\nt (x))\nt (x))\n\n\u2212 h.\n\n(4)\n\nWhittle [24] recognised that for his index policy (play the arm with the largest \u03bbW (x)) to make\nsense, any arm which receives an expensive observation for added cost \u03bd, must also receive an\nexpensive observation for added cost \u03bd(cid:48) < \u03bd. Such problems are said to be indexable. The question\nresolved by this paper is whether Problem KF1 is indexable. Equivalently, is \u03bbW (x) non-decreasing\nin x \u2208 R+?\n\n2\n\n\fFigure 1: Orbit x0\u2217\nthe path F GHIJ . . . for the word 10w = 10101. Word w = 101 is a palindrome.\n\nt (x) traces the path ABCDE . . . for the word 01w = 01101. Orbit x1\u2217\n\nt (x) traces\n\n3 Main Result, Key Concepts and Intuition\n\nWe make the following intuitive assumption about threshold (monotone) policies.\nA1. For some x \u2208 R+ depending on \u03bd \u2208 R, the policy ut = 1xt\u2265x is optimal for problem (2).\nNote that under A1, de\ufb01nition (3) means the policy ut = 1xt>x is also optimal, so we can choose\n\n(cid:26)0\n(cid:26)0\n\n1\n\n1\n\nu0\u2217\nt (x) :=\n\nu1\u2217\nt (x) :=\n\nt\u22121(x) \u2264 x\n\nif x0\u2217\notherwise\nif x1\u2217\notherwise\n0 (x) = x. We refer to x0\u2217\n\nt\u22121(x) < x\n\nand x0\u2217\n\nt (x) :=\n\nand x1\u2217\n\nt (x) :=\n\n(cid:26)\u03c60(x0\u2217\n(cid:26)\u03c60(x1\u2217\n\n\u03c61(x0\u2217\n\n\u03c61(x1\u2217\n\nif x0\u2217\n\nt\u22121(x) \u2264 x\n\nt\u22121(x))\nt\u22121(x)) otherwise\nt\u22121(x))\nt\u22121(x)) otherwise\n\nif x1\u2217\n\nt\u22121(x) < x\n\n\uf8fc\uf8f4\uf8f4\uf8f4\uf8fd\uf8f4\uf8f4\uf8f4\uf8fe (5)\n\n0 (x) = x1\u2217\n\nwhere x0\u2217\nWe are now ready to state our main result.\nTheorem 1. Suppose a threshold policy (A1) is optimal for the single-arm problem (2). Then\nProblem KF1 is indexable. Speci\ufb01cally, for any b > a \u2265 0 let\n\nt (x) as the x-threshold orbits (Figure 1).\n\nt (x), x1\u2217\n\n\u03bbW (x) := w\nt (x), u1\u2217\n\n\u03c60(x) :=\n\nx + 1\n\nax + a + 1\n\nand for any w \u2208 R+, h \u2208 R and 0 < \u03b2 < 1, let\n\n\u03c61(x) :=\n\nx + 1\n\nbx + b + 1\n\n,\n\n(cid:80)\u221e\n(cid:80)\u221e\nt=1 \u03b2t(x0\u2217\nt=0 \u03b2t(u1\u2217\n\nt (x) \u2212 x1\u2217\nt (x) \u2212 u0\u2217\n\nt (x))\nt (x))\n\n\u2212 h\n\n(6)\n\nt (x), x1\u2217\n\nt (x) and error variance sequences x0\u2217\n\nin which action sequences u0\u2217\nin terms of \u03c60, \u03c61 by (5). Then \u03bbW (x) is a continuous and non-decreasing function of x \u2208 R+.\nWe are now ready to describe the key concepts underlying this result.\nWords. In this paper, a word w is a string on {0, 1}\u2217 with kth letter wk and wi:j := wiwi+1 . . . wj.\nThe empty word is \u0001, the concatenation of words u, v is uv, the word that is the n-fold repetition\nof w is wn, the in\ufb01nite repetition of w is w\u03c9 and \u02dcw is the reverse of w, so w = \u02dcw means w is\na palindrome. The length of w is |w| and |w|u is the number of times that word u appears in w,\noverlaps included.\nChristoffel, Sturmian and Mechanical Words. It turns out that the action sequences in (5) are\ngiven by such words, so the following de\ufb01nitions are central to this paper.\n\nt (x) are given\n\n3\n\nxtxt+1?0(x)?1(x)ABCDExt0*xtxt+1?0(x)?1(x)FGHIJxt1*\fFigure 2: Part of the Christoffel tree.\n\nThe Christoffel tree (Figure 2) is an in\ufb01nite complete binary tree [5] in which each node is labelled\nwith a pair (u, v) of words. The root is (0, 1) and the children of (u, v) are (u, uv) and (uv, v).\nThe Christoffel words are the words 0, 1 and the concatenations uv for all (u, v) in that tree. The\nfractions |uv|1/|uv|0 form the Stern-Brocot tree [9] which contains each positive rational number\nexactly once. Also, in\ufb01nite paths in the Stern-Brocot tree converge to the positive irrational numbers.\nAnalogously, Sturmian words could be thought of as in\ufb01nitely-long Christoffel words.\nAlternatively, among many known characterisations, the Christoffel words can be de\ufb01ned as the\nwords 0, 1 and the words 0w1 where a := |0w1|1/|0w1| and\n\n(01w)n := (cid:98)(n + 1)a(cid:99) \u2212 (cid:98)na(cid:99)\n\nfor any relatively prime natural numbers |0w1|0 and |0w1|1 and for n = 1, 2, . . . ,|0w1|. The\nSturmian words are then the in\ufb01nite words 0w1w2 \u00b7\u00b7\u00b7 where, for n = 1, 2, . . . and a \u2208 (0, 1)\\Q,\n\n(01w1w2 \u00b7\u00b7\u00b7 )n := (cid:98)(n + 1)a(cid:99) \u2212 (cid:98)na(cid:99).\nWe use the notation 0w1 for Sturmian words although they are in\ufb01nite.\nThe set of mechanical words is the union of the Christoffel and Sturmian words [13]. (Note that the\nmechanical words are sometimes de\ufb01ned in terms of in\ufb01nite repetitions of the Christoffel words.)\nMajorisation. As in [14], let x, y \u2208 Rm and let x(i) and y(i) be their elements sorted in ascending\norder. We say x is weakly supermajorised by y and write x \u227aw y if\n\ny(k)\n\nfor all j = 1, . . . , m.\n\nk=1\n\nk=1\n\nx(k) \u2265 j(cid:88)\nj(cid:88)\nx[k] \u2264 j(cid:88)\nj(cid:88)\nf (xi) \u2264 m(cid:88)\n\nm(cid:88)\n\nk=1\n\nk=1\n\nIf this is an equality for j = m we say x is majorised by y and write x \u227a y. It turns out that\n\nx \u227a y\n\n\u21d4\n\nfor j = 1, . . . , m \u2212 1 with equality for j = m\n\ny[k]\n\nwhere x[k], y[k] are the sequences sorted in descending order. For x, y \u2208 Rm we have [14]\n\nx \u227a y\n\n\u21d4\n\nf (yi)\n\nfor all convex functions f : R \u2192 R.\n\ni=1\n\ni=1\n\nMore generally, a real-valued function \u03c6 de\ufb01ned on a subset A of Rm is said to be Schur-convex on\nA if x \u227a y implies that \u03c6(x) \u2264 \u03c6(y).\nM\u00a8obius Transformations. Let \u00b5A(x) denote the M\u00a8obius transformation \u00b5A(x) := A11x+A12\nA21x+A22\nwhere A \u2208 R2\u00d72. M\u00a8obius transformations such as \u03c60(\u00b7), \u03c61(\u00b7) are closed under composition, so\nfor any word w we de\ufb01ne \u03c6w(x) := \u03c6w|w| \u25e6 \u00b7\u00b7\u00b7 \u25e6 \u03c6w2 \u25e6 \u03c6w1 (x) and \u03c6\u0001(x) := x.\nIntuition. Here is the intuition behind our main result.\nFor any x \u2208 R+, the orbits in (5) correspond to a particular mechanical word 0, 1 or 0w1 depending\non the value of x (Figure 1). Speci\ufb01cally, for any word u, let yu be the \ufb01xed point of the mapping \u03c6u\non R+ so that \u03c6u(yu) = yu and yu \u2208 R+. Then the word corresponding to x is 1 for 0 \u2264 x \u2264 y1,\n0w1 for x \u2208 [y01w, y10w] and 0 for y0 \u2264 x < \u221e.\nIn passing we note that these \ufb01xed points\nare sorted in ascending order by the ratio \u03c1 := |01w|0/|01w|1 of counts of 0s to counts of 1s, as\n\n4\n\n(0,00001)(00001,0001)(0001,0001001)(0001001,001)(001,00100101)(00100101,00101)(00101,0010101)(0010101,01)(01,0101011)(0101011,01011)(01011,01011011)(01011011,011)(011,0110111)(0110111,0111)(0111,01111)(01111,1)(0,0001)(0001,001)(001,00101)(00101,01)(01,01011)(01011,011)(011,0111)(0111,1)(0,001)(001,01)(01,011)(011,1)(0,01)(01,1)(0,1)\fFigure 3: Lower \ufb01xed points y01w of Christoffel words (black dots), majorisation points for those\nwords (black circles) and the tree of \u03c6w(0) (blue).\n\nillustrated by Figure 3. Interestingly, it turns out that ratio \u03c1 is a piecewise-constant yet continuous\nfunction of x, reminiscent of the Cantor function.\nAlso, composition of M\u00a8obius transformations is homeomorphic to matrix multiplication so that\n\n\u00b5A \u25e6 \u00b5B(x) = \u00b5AB(x)\n\nfor any A, B \u2208 R2\u00d72.\n\nThus, the index (6) can be written in terms of the orbits of a linear system (11) given by 0, 1 or 0w1.\nFurther, if A \u2208 R2\u00d72 and det(A) = 1 then the gradient of the corresponding M\u00a8obius transformation\nis the convex function\n\nd\u00b5A(x)\n\ndx\n\n=\n\n1\n\n(A21x + A22)2 .\n\nSo the gradient of the index is the difference of the sums of a convex function of the linear-system\norbits. However, such sums are Schur-convex functions and it follows that the index is increasing\nbecause one orbit weakly supermajorises the other, as we now show for the case 0w1 (noting that\nthe proof is easier for words 0, 1). As 0w1 is a mechanical word, w is a palindrome. Further, if w is\na palindrome, it turns out that the difference between the linear-system orbits increases with x. So,\nwe might de\ufb01ne the majorisation point for w as the x for which one orbit majorises the other. Quite\nremarkably, if w is a palindrome then the majorisation point is \u03c6w(0) (Proposition 7). Indeed the\nblack circles and blue dots of Figure 3 coincide. Finally, \u03c6w(0) is less than or equal to y01w which\nis the least x for which the orbits correspond to the word 0w1. Indeed, the blue dots of Figure 3 are\nbelow the corresponding black dots. Thus one orbit does indeed supermajorise the other.\n\n4 Proof of Main Result\n\n4.1 Mechanical Words\nThe M\u00a8obius transformations of (1) satisfy the following assumption for I := R+. We prove that the\n\ufb01xed point yw of word w (the solution to \u03c6w(x) = x on I) is unique in the supplementary material.\nAssumption A2. Functions \u03c60 : I \u2192 I, \u03c61 : I \u2192 I, where I is an interval of R, are increasing\nand non-expansive, so for all x, y \u2208 I : x < y and for k \u2208 {0, 1} we have\n\n(cid:124)\n\n\u03c6k(x) < \u03c6k(y)\n\nincreasing\n\n(cid:123)(cid:122)\n\n(cid:125)\n\nand\n\n\u03c6k(y) \u2212 \u03c6k(x) < y \u2212 x\n\n.\n\n(cid:124)\n\n(cid:123)(cid:122)\n\nnon-expansive\n\n(cid:125)\n\nFurthermore, the \ufb01xed points y0, y1 of \u03c60, \u03c61 on I satisfy y1 < y0.\nHence the following two propositions (supplementary material) apply to \u03c60, \u03c61 of (1) on I = R+.\n\n5\n\n|01w|0 / |01w|10.10.20.30.40.50.60.70.80.9y01w and ?w(0)020406080100\fProposition 1. Suppose A2 holds, x \u2208 I and w is a non-empty word. Then\n\nx < \u03c6w(x) \u21d4 \u03c6w(x) < yw \u21d4 x < yw\n\nand x > \u03c6w(x) \u21d4 \u03c6w(x) > yw \u21d4 x > yw.\n\nFor a given x, in the notation of (5), we call the shortest word u such that (u1\u2217\n2 , . . . ) = u\u03c9\nthe x-threshold word. Proposition 2 generalises a recent result about x-threshold words in a setting\nwhere \u03c60, \u03c61 are linear [18].\nProposition 2. Suppose A2 holds and 0w1 is a mechanical word. Then\n\n1 , u1\u2217\n\n0w1 is the x-threshold word \u21d4 x \u2208 [y01w, y10w].\n\nAlso, if x0, x1 \u2208 I with x0 \u2265 y0 and x1 \u2264 y1 then the x0- and x1-threshold words are 0 and 1.\nWe also use the following very interesting fact (Proposition 4.2 on p.28 of [5]).\nProposition 3. Suppose 0w1 is a mechanical word. Then w is a palindrome.\n\n4.2 Properties of the Linear-System Orbits M (w) and Pre\ufb01x Sums S(w)\nDe\ufb01nition. Assume that a, b \u2208 R+ and a < b. Consider the matrices\n\n(cid:18)1\n\n(cid:19)\n\n(cid:18)1\n\n(cid:19)\n\nF :=\n\n,\n\nG :=\n\n1\n\na 1 + a\n\n1\n\nb\n\n1 + b\n\nand K :=\n\n(cid:18)\u22121 \u22121\n(cid:19)\n\n0\n\n1\n\nso that the M\u00a8obius transformations \u00b5F , \u00b5G are the functions \u03c60, \u03c61 of (1) and GF \u2212F G = (b\u2212a)K.\nGiven any word w \u2208 {0, 1}\u2217, we de\ufb01ne the matrix product M (w)\n\nM (w) := M (w|w|)\u00b7\u00b7\u00b7 M (w1), where M (\u0001) := I, M (0) := F and M (1) := G\n\nwhere I \u2208 R2\u00d72 is the identity and the pre\ufb01x sum S(w) as the matrix polynomial\n\n|w|(cid:88)\n\nk=1\n\nS(w) :=\n\nM (w1:k),\n\nwhere S(\u0001) := 0 (the all-zero matrix).\n\n(7)\n\nFor any A \u2208 R2\u00d72, let tr(A) be the trace of A, let Aij = [A]ij be the entries of A and let A \u2265 0\nindicate that all entries of A are non-negative.\nRemark. Clearly, det(F ) = det(G) = 1 so that det(M (w)) = 1 for any word w. Also, S(w)\ncorresponds to the partial sums of the linear-system orbits, as hinted in the previous section.\nThe following proposition captures the role of palindromes (proof in the supplementary material).\nProposition 4. Suppose w is a word, p is a palindrome and n \u2208 Z+. Then\n\n(cid:32) f h+1\n\n(cid:33)\n\nf\nh\n\n1. M (p) =\n\nh+f\nh2\u22121\nh+f\n\nfor some f, h \u2208 R,\n\n2. tr(M (10p)) = tr(M (01p)),\n3. If u \u2208 {p(10p)n, (10p)n10} then M (u) \u2212 M (\u02dcu) = \u03bbK for some \u03bb \u2208 R\u2212,\n4. If w is a pre\ufb01x of p then [M (p(10p)n10w)]22 \u2264 [M (p(01p)n01w)]22,\n5. [M ((10p)n10w)]21 \u2265 [M ((01p)n01w)]21,\n6. [M ((10p)n1)]21 \u2265 [M ((01p)n0)]21.\n\nWe now demonstrate a surprisingly simple relation between S(w) and M (w).\nProposition 5. Suppose w is a palindrome. Then\nand\n\nS21(w) = M22(w) \u2212 1\n\nS22(w) = M12(w) + S21(w).\n\nFurthermore, if \u2206k := [S(10w)M (w(10w)k) \u2212 S(01w)M (w(01w)k)]22 then\n\n\u2206k = 0\n\nfor all k \u2208 Z+.\n\n6\n\n(8)\n\n(9)\n\n\fProof. Let us write M := M (w), S := S(w). We prove (8) by induction on |w|. In the base\ncase w \u2208 {\u0001, 0, 1}. For w = \u0001, M22 \u2212 1 = 0 = S21, M12 + S21 = 0 = S22. For w \u2208 {0, 1},\nM22 \u2212 1 = c = S21, M12 + S21 = 1 + c = S22 for some c \u2208 {a, b}. For the inductive step, in\naccordance with Claim 1 of Proposition 4, assume w \u2208 {0v0, 1v1} for some word v satisfying\nfor some c, d, f, h \u2208 R.\n\n(cid:32) f h+1\n\n(cid:18) c\n\nM (v) =\n\nS(v) =\n\n(cid:33)\n\n(cid:19)\n\nd\n\n,\n\nh \u2212 1\n\nf + h \u2212 1\n\nh+f\nh2\u22121\nh+f\n\nf\nh\n\nFor w = 1v1, M := M (1v1) = GM (v)G and S := S(1v1) = GM (v)G+S(v)G+G. Calculating\nthe corresponding matrix products and sums gives\n\nS21 = (bh + h + bf \u2212 1)(bh + 2h + bf + f + 1)(h + f )\u22121 = M22 \u2212 1\n\nS22 \u2212 S21 = bh + 2h + bf + f = M12\n\nas claimed. For w = 0u0 the claim also holds as F = G|b=a. This completes the proof of (8).\nFurthermore Part. Let A := S(w)F G + F G + G and B := S(w)GF + GF + F . Then\n\nby de\ufb01nition of S(\u00b7). By Claim 1 of Proposition 4 and (8) we know that\n\n\u2206k = [(A(M (w)F G)k \u2212 B(M (w)GF )k)M (w)]22\n\n(10)\n\n(cid:18) c\n\n(cid:19)\n\n(cid:32) f h+1\n\n(cid:33)\n\nf\nh\n\nM (w) =\n\nh+f\nh2\u22121\nh+f\n\n,\n\nS(w) =\n\nh \u2212 1\n\nd\n\nf + h \u2212 1\n\nfor some c, d, f, h \u2208 R.\n\nSubstituting these expressions and the de\ufb01nitions of F, G into the de\ufb01nitions of A, B and then\ninto (10) for k \u2208 {0, 1} directly gives \u22060 = \u22061 = 0 (although this calculation is long).\nNow consider the case k \u2265 2. Claim 2 of Proposition 4 says tr(M (10w)) = tr(M (01w)) and clearly\ndet(M (10w)) = det(M (01w)) = 1. Thus we can diagonalise as\n\nM (w)F G =: U DU\u22121, M (w)GF =: V DV \u22121, D := diag(\u03bb, 1/\u03bb)\n\nfor some \u03bb \u2265 1\n\nso that \u2206k = [AU DkU\u22121M (w) \u2212 eT BV DkV \u22121M (w)]22 =: \u03b31\u03bbk + \u03b32\u03bb\u2212k. So, if \u03bb = 1 then\n\u2206k = \u03b31 + \u03b32 = \u22060 and we already showed that \u22060 = 0. Otherwise \u03bb (cid:54)= 1, so \u22060 = \u22061 = 0\nimplies \u03b31 + \u03b32 = \u03b31\u03bb + \u03b32\u03bb\u22121 = 0 which gives \u03b31 = \u03b32 = 0. Thus for any k \u2208 Z+ we have\n\u2206k = \u03b31\u03bbk + \u03b32\u03bb\u2212k = 0.\n\n4.3 Majorisation\n\nThe following is a straightforward consequence of results in [14] proved in the supplementary ma-\nterial. We emphasize that the notation \u227aw has nothing to do with the notion of w as a word.\n+ and f : R \u2192 R is a symmetric function that is convex and\nProposition 6. Suppose x, y \u2208 Rm\n\ndecreasing on R+. Then x \u227aw y and \u03b2 \u2208 [0, 1] \u21d2 (cid:80)m\n\ni=1 \u03b2if (x(i)) \u2265(cid:80)m\n\ni=1 \u03b2if (y(i)).\n\nFor any x \u2208 R and any \ufb01xed word w, de\ufb01ne the sequences for n \u2208 Z+ and k = 1, . . . , m\n:= (xnm+1(x), . . . , xnm+m(x))\n\nxnm+k(x) := [M ((10w)n(10w)1:k)v(x)]2, \u03c3(n)\nx\nynm+k(x) := [M ((01w)n(01w)1:k)v(x)]2,\n\u03c3(n)\ny\n\n:= (ynm+1(x), . . . , ynm+m(x))\n\nx \u227aw \u03c3(n)\n\nwhere m := |10w| and v(x) := (x, 1)T .\nProposition 7. Suppose w is a palindrome and x \u2265 \u03c6w(0). Then \u03c3(n)\nsequences on R+ and \u03c3(n)\nProof. Clearly \u03c6w(0) \u2265 0 so x \u2265 0 and hence v(x) \u2265 0. So for any word u and letter c \u2208 {0, 1} we\nhave M (uc)v(x) = M (c)M (u)v(x) \u2265 M (u)v(x) \u2265 0 as M (c) \u2265 I. Thus xk+1(x) \u2265 xk(x) \u2265 0\nand yk+1(x) \u2265 yk(x) \u2265 0. In conclusion, \u03c3(n)\nNow \u03c6w(0) = [M (w)]12\n[M (w)]22\n\n. Thus [Av(\u03c6w(0))]2 := [AM (w)]22\n[M (w)]22\n\nare ascending sequences on R+.\n\nfor any A \u2208 R2\u00d72. So\n\nfor any n \u2208 Z+.\n\nare ascending\n\nand \u03c3(n)\n\nand \u03c3(n)\n\nx\n\nx\n\ny\n\ny\n\ny\n\n(11)\n\n(cid:41)\n\nxnm+k(\u03c6w(0)) \u2212 ynm+k(\u03c6w(0))\n\n=\n\n1\n\n[M (w)]22\n\n[(M ((10w)n(10w)1:k) \u2212 M ((01w)n(01w)1:k))M (w)]22 \u2264 0\n\n7\n\n\fj(cid:88)\n\nTj(x) :=\n\n(xnm+k(x) \u2212 ynm+k(x)).\n\nk=1\n\nm(cid:88)\n[S(10w)M (w(10w)n) \u2212 S(01w)M (w(01w)n)]22 = 0\n\n[(M ((10w)n(10w)1:k) \u2212 M ((01w)n(01w)1:k))M (w)]22\n\nk=1\n\nThus T1(\u03c6w(0)) \u2265 T2(\u03c6w(0)) \u2265 . . . Tm(\u03c6w(0)). But\n\nTm(\u03c6w(0)) =\n\n1\n\n[M (w)]22\n\n1\n\n=\n\n[M (w)]22\n\ndx Tj(x) =(cid:80)j\n\nfor k = 2, . . . , m by Claim 4 of Proposition 4. So all but the \ufb01rst term of the sum Tm(\u03c6w(0)) is\nnon-positive where\n\nwhere the last step follows from (9). So Tj(\u03c6w(0)) \u2265 0 for j = 1, . . . , m. Yet Claims 5 and 6 of\nk=1[M ((10w)n(10w)1:k) \u2212 M ((01w)n(01w)1:k)]21 \u2265 0. So for\nProposition 4 give d\nx \u2265 \u03c6w(0) we have Tj(x) \u2265 0 for j = 1, . . . , m which means that \u03c3(n)\n\nx \u227aw \u03c3(n)\ny .\n\nIndexability\n\n4.4\nTheorem 1. The index \u03bbW (x) of (6) is continuous and non-decreasing for x \u2208 R+.\n\n\u03bb(x) := \u03bbW (x)(cid:12)(cid:12)w=1,h=0 and we can use w to denote a word. By Proposition 2, x \u2208 [y01w, y10w]\n\nProof. As weight w is non-negative and cost h is a constant we only need to prove the result for\nfor some mechanical word 0w1. (Cases x /\u2208 (y1, y0) are clari\ufb01ed in the supplementary material.)\nLet us show that the hypotheses of Proposition 7 are satis\ufb01ed by w and x. Firstly, w is a palindrome\nby Proposition 3. Secondly, \u03c6w01(0) \u2265 0 and as \u03c6w(\u00b7) is monotonically increasing, it follows that\n\u03c6w\u25e6\u03c6w01(0) \u2265 \u03c6w(0). Equivalently, \u03c601w\u25e6\u03c6w(0) \u2265 \u03c6w(0) so that \u03c6w(0) \u2264 y01w by Proposition 1.\nHence x \u2265 y01w \u2265 \u03c6w(0).\nThus Proposition 7 applies, showing that the sequences \u03c3(n)\ny , with elements xnm+k(x) and\nynm+k(x) as de\ufb01ned in (11), are non-decreasing sequences on R+ with \u03c3(n)\ny . Also, 1/x2\nis a symmetric function that is convex and decreasing on R+. Therefore Proposition 6 applies giving\n\nx \u227aw \u03c3(n)\n\nand \u03c3(n)\n\nx\n\n(cid:18) \u03b2nm+k\u22121\n(xnm+k(x))2 \u2212 \u03b2nm+k\u22121\n\n(ynm+k(x))2\n\n(cid:19)\n\nm(cid:88)\n\nk=1\n\n\u2265 0\n\nfor any n \u2208 Z+ where m := |01w|.\n\n(12)\n\n\u221e(cid:88)\n\nAlso Proposition 2 shows that\n(\u03c6l1(x), . . . , \u03c6l1:k (x), . . . ) where u := (01w)\u03c9 and l := (10w)\u03c9. So the denominator of (6) is\n\nthe x-threshold orbits are (\u03c6u1(x), . . . , \u03c6u1:k (x), . . . ) and\n\n\u03b2k(1lk+1=1 \u2212 1uk+1=1) =\n\nk=0\nNote that d\ndx\n\nex+f\ngx+h =\n\n\u03b2mk(1 \u2212 \u03b2) \u21d2 \u03bb(x) =\n\n1 \u2212 \u03b2m\n1 \u2212 \u03b2\n(gx+h)2 for any eh \u2212 f g = 1. Then (12) gives\n\nk=0\n\n1\n\nm(cid:88)\n\nk=1\n\n(cid:18) \u03b2nm+k\u22121\n(xnm+k(x))2 \u2212 \u03b2nm+k\u22121\n\n(ynm+k(x))2\n\nn=0\n\nk=1\n\nd\u03bb(x)\n\ndx\n\n=\n\n1 \u2212 \u03b2m\n1 \u2212 \u03b2\n\n\u03b2k\u22121(\u03c6u1:k (x) \u2212 \u03c6l1:k (x)).\n(cid:19)\n\n\u2265 0.\n\nBut \u03bb(x) is continuous for x \u2208 R+ (as shown in the supplementary material). Therefore we con-\nclude that \u03bb(x) is non-decreasing for x \u2208 R+.\n\n\u221e(cid:88)\n\u221e(cid:88)\n\n\u221e(cid:88)\n\n5 Further Work\n\nOne might attempt to prove that assumption A1 holds using general results about monotone optimal\npolicies for two-action MDPs based on submodularity [2] or multimodularity [1]. However, we \ufb01nd\ncounter-examples to the required submodularity condition. Rather, we are optimistic that the ideas\nof this paper themselves offer an alternative approach to proving A1. It would then be natural to\nextend our results to settings where the underlying state evolves as Zt+1 | Ht \u223c N (mZt, 1) for\nsome multiplier m (cid:54)= 1 and to cost functions other than the variance. Finally, the question of the\nindexability of the discrete-time Kalman \ufb01lter in multiple dimensions remains open.\n\n8\n\n\fReferences\n[1] E. Altman, B. Gaujal, and A. Hordijk. Multimodularity, convexity, and optimization properties. Mathe-\n\nmatics of Operations Research, 25(2):324\u2013347, 2000.\n\n[2] E. Altman and S. Stidham Jr. Optimality of monotonic policies for two-action Markovian decision pro-\ncesses, with applications to control of queues with delayed information. Queueing Systems, 21(3-4):267\u2013\n291, 1995.\n\n[3] M. Araya, O. Buffet, V. Thomas, and F. Charpillet. A POMDP extension with belief-dependent rewards.\n\nIn Neural Information Processing Systems, pages 64\u201372, 2010.\n\n[4] A. Badanidiyuru, B. Mirzasoleiman, A. Karbasi, and A. Krause. Streaming submodular maximization:\nMassive data summarization on the \ufb02y. In Proceedings of the 20th ACM SIGKDD International Confer-\nence on Knowledge Discovery and Data Mining, pages 671\u2013680, 2014.\n\n[5] J. Berstel, A. Lauve, C. Reutenauer, and F. Saliola. Combinatorics on Words: Christoffel Words and\n\nRepetitions in Words. CRM Monograph Series, 2008.\n\n[6] S. Bubeck and N. Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit\n\nProblems, Foundation and Trends in Machine Learning, Vol. 5. NOW, 2012.\n\n[7] Y. Chen, H. Shioi, C. Montesinos, L. P. Koh, S. Wich, and A. Krause. Active detection via adaptive\nsubmodularity. In Proceedings of The 31st International Conference on Machine Learning, pages 55\u201363,\n2014.\n\n[8] J. Gittins, K. Glazebrook, and R. Weber. Multi-armed bandit allocation indices. John Wiley & Sons,\n\n2011.\n\n[9] R. Graham, D. Knuth, and O. Patashnik. Concrete Mathematics: A Foundation for Computer Science.\n\nAddison-Wesley, 1994.\n\n[10] S. Guha, K. Munagala, and P. Shi. Approximation algorithms for restless bandit problems. Journal of the\n\nACM, 58(1):3, 2010.\n\n[11] B. La Scala and B. Moran. Optimal target tracking with restless bandits. Digital Signal Processing,\n\n16(5):479\u2013487, 2006.\n\n[12] J. Le Ny, E. Feron, and M. Dahleh. Scheduling continuous-time Kalman \ufb01lters. IEEE Trans. Automatic\n\nControl, 56(6):1381\u20131394, 2011.\n\n[13] M. Lothaire. Algebraic combinatorics on words. Cambridge University Press, 2002.\n[14] A. Marshall, I. Olkin, and B. Arnold. Inequalities: Theory of majorization and its applications. Springer\n\nScience & Business Media, 2010.\n\n[15] L. Meier, J. Peschon, and R. Dressler. Optimal control of measurement subsystems. IEEE Trans. Auto-\n\nmatic Control, 12(5):528\u2013536, 1967.\n\n[16] J. Ni\u02dcno-Mora and S. Villar. Multitarget tracking via restless bandit marginal productivity indices and\nKalman \ufb01lter in discrete time. In Proceedings of the 48th IEEE Conference on Decision and Control,\npages 2905\u20132910, 2009.\n\n[17] R. Ortner, D. Ryabko, P. Auer, and R. Munos. Regret bounds for restless Markov bandits. In Algorithmic\n\nLearning Theory, pages 214\u2013228. Springer, 2012.\n\n[18] B. Rajpathak, H. Pillai, and S. Bandyopadhyay. Analysis of stable periodic orbits in the one dimensional\n\nlinear piecewise-smooth discontinuous map. Chaos, 22(3):033126, 2012.\n\n[19] T. Thiele. Sur la compensation de quelques erreurs quasi-syst\u00b4ematiques par la m\u00b4ethode des moindres\n\ncarr\u00b4es. CA Reitzel, 1880.\n\n[20] I. Verloop. Asymptotic optimal control of multi-class restless bandits. CNRS Technical Report, hal-\n\n00743781, 2014.\n\n[21] S. Villar. Restless bandit index policies for dynamic sensor scheduling optimization. PhD thesis, Statistics\n\nDepartment, Universidad Carlos III de Madrid, 2012.\n\n[22] E. Vul, G. Alvarez, J. B. Tenenbaum, and M. J. Black. Explaining human multiple object tracking as\nIn Neural Information\n\nresource-constrained approximate inference in a dynamic probabilistic model.\nProcessing Systems, pages 1955\u20131963, 2009.\n\n[23] R. R. Weber and G. Weiss. On an index policy for restless bandits. Journal of Applied Probability, pages\n\n637\u2013648, 1990.\n\n[24] P. Whittle. Restless bandits: Activity allocation in a changing world. Journal of Applied Probability,\n\npages 287\u2013298, 1988.\n\n9\n\n\f", "award": [], "sourceid": 1038, "authors": [{"given_name": "Christopher", "family_name": "Dance", "institution": "Xerox Research Centre Europe"}, {"given_name": "Tomi", "family_name": "Silander", "institution": "Xerox Research Centre Europe"}]}