{"title": "On Elicitation Complexity", "book": "Advances in Neural Information Processing Systems", "page_first": 3258, "page_last": 3266, "abstract": "Elicitation is the study of statistics or properties which are computable via empirical risk minimization. While several recent papers have approached the general question of which properties are elicitable, we suggest that this is the wrong question---all properties are elicitable by first eliciting the entire distribution or data set, and thus the important question is how elicitable. Specifically, what is the minimum number of regression parameters needed to compute the property?Building on previous work, we introduce a new notion of elicitation complexity and lay the foundations for a calculus of elicitation. We establish several general results and techniques for proving upper and lower bounds on elicitation complexity. These results provide tight bounds for eliciting the Bayes risk of any loss, a large class of properties which includes spectral risk measures and several new properties of interest.", "full_text": "On Elicitation Complexity\n\nRafael Frongillo\n\nUniversity of Colorado, Boulder\n\nraf@colorado.edu\n\nIan A. Kash\n\nMicrosoft Research\n\niankash@microsoft.com\n\nAbstract\n\nElicitation is the study of statistics or properties which are computable via em-\npirical risk minimization. While several recent papers have approached the gen-\neral question of which properties are elicitable, we suggest that this is the wrong\nquestion\u2014all properties are elicitable by \ufb01rst eliciting the entire distribution or\ndata set, and thus the important question is how elicitable. Speci\ufb01cally, what is\nthe minimum number of regression parameters needed to compute the property?\nBuilding on previous work, we introduce a new notion of elicitation complexity\nand lay the foundations for a calculus of elicitation. We establish several general\nresults and techniques for proving upper and lower bounds on elicitation complex-\nity. These results provide tight bounds for eliciting the Bayes risk of any loss, a\nlarge class of properties which includes spectral risk measures and several new\nproperties of interest.\n\n1\n\nIntroduction\n\nEmpirical risk minimization (ERM) is a domininant framework for supervised machine learning,\nand a key component of many learning algorithms. A statistic or property is simply a functional\nassigning a vector of values to each distribution. We say that such a property is elicitable, if for\nsome loss function it can be represented as the unique minimizer of the expected loss under the\ndistribution. Thus, the study of which properties are elicitable can be viewed as the study of which\nstatistics are computable via ERM [1, 2, 3].\nThe study of property elicitation began in statistics [4, 5, 6, 7], and is gaining momentum in machine\nlearning [8, 1, 2, 3], economics [9, 10], and most recently, \ufb01nance [11, 12, 13, 14, 15]. A sequence of\npapers starting with Savage [4] has looked at the full characterization of losses which elicit the mean\nof a distribution, or more generally the expectation of a vector-valued random variable [16, 3]. The\ncase of real-valued properties is also now well in hand [9, 1]. The general vector-valued case is still\ngenerally open, with recent progress in [3, 2, 15]. Recently, a parallel thread of research has been\nunderway in \ufb01nance, to understand which \ufb01nancial risk measures, among several in use or proposed\nto help regulate the risks of \ufb01nancial institutions, are computable via regression, i.e., elicitable (cf.\nreferences above). More often than not, these papers have concluded that most risk measures under\nconsideration are not elicitable, notable exceptions being generalized quantiles (e.g. value-at-risk,\nexpectiles) and expected utility [13, 12].\nThroughout the growing momentum of the study of elicitation, one question has been central: which\nproperties are elicitable? It is clear, however, that all properties are \u201cindirectly\u201d elicitable if one \ufb01rst\nelicits the distribution using a standard proper scoring rule. Therefore, in the present work, we\nsuggest replacing this question with a more nuanced one: how elicitable are various properties?\nSpeci\ufb01cally, heeding the suggestion of Gneiting [7], we adapt to our setting the notion of elicitation\ncomplexity introduced by Lambert et al. [17], which captures how many parameters one needs to\nmaintain in an ERM procedure for the property in question. Indeed, if a real-valued property is\nfound not to be elicitable, such as the variance, one should not abandon it, but rather ask how many\nparameters are required to compute it via ERM.\n\n1\n\n\fOur work is heavily inspired by the recent progress along these lines of Fissler and Ziegel [15], who\nshow that spectral risk measures of support k have elicitation complexity at most k + 1. Spectral\nrisk measures are among those under consideration in the \ufb01nance community, and this result shows\nthat while not elicitable in the classical sense, their elicitation complexity is still low, and hence one\ncan develop reasonable regression procedures for them. Our results extend to these and many other\nrisk measures (see \u00a7 4.6), often providing matching lower bounds on the complexity as well.\nOur contributions are the following. We \ufb01rst introduce an adapted de\ufb01nition of elicitation complex-\nity which we believe to be the right notion to focus on going forward. We establish a few simple but\nuseful results which allow for a kind of calculus of elicitation; for example, conditions under which\nthe complexity of eliciting two properties in tandem is the sum of their individual complexities. In\n\u00a7 3, we derive several techniques for proving both upper and lower bounds on elicitation complexity\nwhich apply primarily to the Bayes risks from decision theory, or optimal expected loss functions.\nThe class includes spectral risk measures among several others; see \u00a7 4. We conclude with brief\nremarks and open questions.\n\n2 Preliminaries and Foundation\nLet \u2126 be a set of outcomes and P \u2286 \u2206(\u2126) be a convex set of probability measures. The goal of\nelicitation is to learn something about the distribution p \u2208 P, speci\ufb01cally some function \u0393(p) such\nas the mean or variance, by minimizing a loss function.\nDe\ufb01nition 1. A property is a function \u0393 : P \u2192 Rk, for some k \u2208 N, which associates a desired\n= {p \u2208 P | r = \u0393(p)} denote the set of distributions p\n.\nreport value to each distribution.1 We let \u0393r\ncorresponding to report value r.\n\nGiven a property \u0393, we want to ensure that the best result is to reveal the value of the property using\na loss function that evaluates the report using a sample from the distribution.\nDe\ufb01nition 2. A loss function L : Rk \u00d7 \u2126 \u2192 R elicits a property \u0393 : P \u2192 Rk if for all p \u2208 P,\n= Ep[L(r,\u00b7)]. A property is elicitable if some loss elicits it.\n.\n\u0393(p) = arginf r L(r, p), where L(r, p)\nFor example, when \u2126 = R, the mean \u0393(p) = Ep[\u03c9] is elicitable via squared loss L(r, \u03c9) = (r\u2212\u03c9)2.\nA well-known necessary condition for elicitability is convexity of the level sets of \u0393.\nProposition 1 (Osband [5]). If \u0393 is elicitable, the level sets \u0393r are convex for all r \u2208 \u0393(P).\nOne can easily check that the mean \u0393(p) = Ep[\u03c9] has convex level sets, yet the variance \u0393(p) =\nEp[(\u03c9 \u2212 Ep[\u03c9])2] does not, and hence is not elicitable [9].\nIt is often useful to work with a stronger condition, that not only is \u0393r convex, but it is the intersection\nof a linear subspace with P. This condition is equivalent the existence of an identi\ufb01cation function,\na functional describing the level sets of \u0393 [17, 1].\nDe\ufb01nition 3. A function V : R\u00d7\u2126 \u2192 Rk is an identi\ufb01cation function for \u0393 : P \u2192 Rk, or identi\ufb01es\n\u0393, if for all r \u2208 \u0393(P) it holds that p \u2208 \u0393r \u21d0\u21d2 V (r, p) = 0 \u2208 Rk, where as with L(r, p) above we\nwrite V (r, p)\nOne can check for example that V (r, \u03c9) = \u03c9 \u2212 r identi\ufb01es the mean.\nWe can now de\ufb01ne the classes of identi\ufb01able and elicitable properties, along with the complexity\nof identifying or eliciting a given property. Naturally, a property is k-identi\ufb01able if it is the link of\na k-dimensional identi\ufb01able property, and k-elicitable if it is the link of a k-dimensional elicitable\nproperty. The elicitation complexity of a property is then simply the minimum dimension k needed\nfor it to be k-elicitable.\nDe\ufb01nition 4. Let Ik(P) denote the class of all identi\ufb01able properties \u0393 : P \u2192 Rk, and Ek(P)\nk\u2208N Ik(P) and\n\ndenote the class of all elicitable properties \u0393 : P \u2192 Rk. We write I(P) = (cid:83)\nE(P) =(cid:83)\n\n.\n= Ep[V (r, \u03c9)]. \u0393 is identi\ufb01able if there exists a V identifying it.\n\nk\u2208N Ek(P).\n\nDe\ufb01nition 5. A property \u0393 is k-identi\ufb01able if there exists \u02c6\u0393 \u2208 Ik(P) and f such that \u0393 = f \u25e6 \u02c6\u0393.\nThe identi\ufb01cation complexity of \u0393 is de\ufb01ned as iden(\u0393) = min{k : \u0393 is k-identi\ufb01able}.\n\n1We will also consider \u0393 : P \u2192 RN.\n\n2\n\n\fDe\ufb01nition 6. A property \u0393 is k-elicitable if there exists \u02c6\u0393 \u2208 Ek(P) and f such that \u0393 = f \u25e6 \u02c6\u0393. The\nelicitation complexity of \u0393 is de\ufb01ned as elic(\u0393) = min{k : \u0393 is k-elicitable}.\nTo make the above de\ufb01nitions concrete, recall that the variance \u03c32(p) = Ep[(Ep[\u03c9]\u2212\u03c9)2] is not elic-\nitable, as its level sets are not convex, a necessary condition by Prop. 1. Note however that we may\nwrite \u03c32(p) = Ep[\u03c92]\u2212Ep[\u03c9]2, which can be obtained from the property \u02c6\u0393(p) = (Ep[\u03c9], Ep[\u03c92]). It\nis well-known [4, 7] that \u02c6\u0393 is both elicitable and identi\ufb01able as the expectation of a vector-valued ran-\ndom variable X(\u03c9) = (\u03c9, \u03c92), using for example L(r, \u03c9) = (cid:107)r\u2212X(\u03c9)(cid:107)2 and V (r, \u03c9) = r\u2212X(\u03c9).\nThus, we can recover \u03c32 as a link of the elicitable and identi\ufb01able \u02c6\u0393 : P \u2192 R2, and as no such\n\u02c6\u0393 : P \u2192 R exists, we have iden(\u03c32) = elic(\u03c32) = 2.\nIn this example, the variance has a stronger property than merely being 2-identi\ufb01able and 2-\nelicitable, namely that there is a single \u02c6\u0393 that satis\ufb01es both of these simultaneously. In fact this\nis quite common, and identi\ufb01ability provides geometric structure that we make use of in our lower\nbounds. Thus, most of our results use this re\ufb01ned notion of elicitation complexity.\nDe\ufb01nition 7. A property \u0393 has (identi\ufb01able) elicitation complexity\nelicI(\u0393) = min{k : \u2203\u02c6\u0393, f such that \u02c6\u0393 \u2208 Ek(P) \u2229 Ik(P) and \u0393 = f \u25e6 \u02c6\u0393}.\nNote that restricting our attention to elicI effectively requires elicI(\u0393) \u2265 iden(\u0393); speci\ufb01cally, if \u0393\nis derived from some elicitable \u02c6\u0393, then \u02c6\u0393 must be identi\ufb01able as well. This restriction is only relevant\nfor our lower bounds, as our upper bounds give losses explicitly.2 Note however that some restriction\non Ek(P) is necessary, as otherwise pathological constructions giving injective mappings from R to\nRk would render all properties 1-elicitable. To alleviate this issue, some authors require continuity\n(e.g. [1]) while others like we do require identi\ufb01ability (e.g. [15]), which can be motivated by the\nfact that for any differentiable loss L for \u0393, V (r, \u03c9) = \u2207rL(\u00b7, \u03c9) will identify \u0393 provided Ep[L]\nhas no in\ufb02ection points or local minima. An important future direction is to relax this identi\ufb01ability\nassumption, as there are very natural (set-valued) properties with iden > elic.3\nOur de\ufb01nition of elicitation complexity differs from the notion proposed by Lambert et al. [17], in\nthat the components of \u02c6\u0393 above do not need to be individually elicitable. This turns out to have\na large impact, as under their de\ufb01nition the property \u0393(p) = max\u03c9\u2208\u2126 p({\u03c9}) for \ufb01nite \u2126 has\nelicitation complexity |\u2126| \u2212 1, whereas under our de\ufb01nition elicI(\u0393) = 2; see Example 4.3. Fissler\nand Ziegel [15] propose a closer but still different de\ufb01nition, with the complexity being the smallest\nk such that \u0393 is a component of a k-dimensional elicitable property. Again, this de\ufb01nition can lead\nto larger complexities than necessary; take for example the squared mean \u0393(p) = Ep[\u03c9]2 when\n\u2126 = R, which has elicI(\u0393) = 1 with \u02c6\u0393(p) = Ep[\u03c9] and f (x) = x2, but is not elicitable and thus has\ncomplexity 2 under [15]. We believe that, modulo regularity assumptions on Ek(P), our de\ufb01nition is\nbetter suited to studying the dif\ufb01culty of eliciting properties: viewing f as a (potentially dimension-\nreducing) link function, our de\ufb01nition captures the minimum number of parameters needed in an\nERM computation of the property in question, followed by a simple one-time application of f.\n\n2.1 Foundations of Elicitation Complexity\n\nIn the remainder of this section, we make some simple, but useful, observations about iden(\u0393) and\nelicI(\u0393). We have already discussed one such observation after De\ufb01nition 7: elicI(\u0393) \u2265 iden(\u0393).\nIt is natural to start with some trivial upper bounds. Clearly, whenever p \u2208 P can be uniquely deter-\nmined by some number of elicitable parameters then the elicitation complexity of every property is\nat most that number. The following propositions give two notable applications of this observation.4\nProposition 2. When |\u2126| = n, every property \u0393 has elicI(\u0393) \u2264 n \u2212 1.\nProof. The probability distribution is determined by the probability of any n \u2212 1 outcomes, and the\nprobability associated with a given outcome is both elicitable and identi\ufb01able.\n\n2Our main lower bound (Thm 2) merely requires \u0393 to have convex level sets, which is necessary by Prop. 1.\n3One may take for example \u0393(p) = argmaxi p(Ai) for a \ufb01nite measurable partition A1, . . . , An of \u2126.\n4Note that these restrictions on \u2126 may easily be placed on P instead; e.g. \ufb01nite \u2126 is equivalent to P having\n\nsupport on a \ufb01nite subset of \u2126, or even being piecewise constant on some disjoint events.\n\n3\n\n\fThen elicI(\u0393) =\n\nProposition 3. When \u2126 = R,5 every property \u0393 has elicI(\u0393) \u2264 \u03c9 (countable).6\nOne well-studied class of properties are those where \u0393 is linear, i.e., the expectation of some vector-\nvalued random variable. All such properties are elicitable and identi\ufb01able (cf. [4, 8, 3]), with\nelicI(\u0393) \u2264 k, but of course the complexity can be lower if the range of \u0393 is not full-dimensional.\nLemma 1. Let X : \u2126 \u2192 Rk be P-integrable and \u0393(p) = Ep[X].\ndim(a\ufb00hull(\u0393(P))), the dimension of the af\ufb01ne hull of the range of \u0393.\nIt is easy to create redundant properties in various ways. For example, given elicitable properties\n= {\u03931, \u03932, \u03931 + \u03932} clearly contains redundant information. A concrete\n.\n\u03931 and \u03932 the property \u0393\ncase is \u0393 = {mean squared, variance, 2nd moment}, which, as we have seen, has elicI(\u0393) = 2. The\nfollowing de\ufb01nitions and lemma capture various aspects of a lack of such redundancy.\nDe\ufb01nition 8. Property \u0393 : P \u2192 Rk in I(P) is of full rank if iden(\u0393) = k.\nNote that there are two ways for a property to fail to be full rank. First, as the examples above\nsuggest, \u0393 can be \u201credundant\u201d so that it is a link of a lower-dimensional identi\ufb01able property. Full\nrank can also be violated if more dimensions are needed to identify the property than to specify it.\nThis is the case with, e.g., the variance which is a 1 dimensional property but has iden(\u03c32) = 2.\nDe\ufb01nition 9. Properties \u0393, \u0393(cid:48) \u2208 I(P) are independent if iden({\u0393, \u0393(cid:48)}) = iden(\u0393) + iden(\u0393(cid:48)).\nLemma 2. If \u0393, \u0393(cid:48)\u2208 E(P) are full rank and independent, then elicI({\u0393, \u0393(cid:48)}) = elicI(\u0393)+elicI(\u0393(cid:48)).\nTo illustrate the lemma, elicI(variance) = 2, yet \u0393 = {mean,variance} has elicI(\u0393) = 2, so clearly\nthe mean and variance are not both independent and full rank. (As we have seen, variance is not full\nrank.) However, the mean and second moment satisfy both by Lemma 1.\nAnother important case is when \u0393 consists of some number of distinct quantiles. Osband [5] essen-\ntially showed that quantiles are independent and of full rank, so their elicitation complexity is the\nnumber of quantiles being elicited.\nLemma 3. Let \u2126 = R and P be a class of probability measures with continuously differen-\ntiable and invertible CDFs F , which is suf\ufb01ciently rich in the sense that for all x1, . . . , xk \u2208 R,\nspan({F \u22121(x1), . . . , F \u22121(xk)}, F \u2208 P) = Rk. Let q\u03b1, denote the \u03b1-quantile function. Then if\n\u03b11, . . . , \u03b1k are all distinct, \u0393 = {q\u03b11, . . . , q\u03b1k} has elicI(\u0393) = k.\nThe quantile example in particular allows us to see that all complexity classes, including \u03c9, are\noccupied. In fact, our results to follow will show something stronger: even for real-valued properties\n\u0393 : P \u2192 R, all classes are occupied; we give here the result that follows from our bounds on spectral\nrisk measures in Example 4.4, but this holds for many other P; see e.g. Example 4.2.\nProposition 4. Let P as in Lemma 3. Then for all k \u2208 N there exists \u03b3 : P \u2192 R with elicI(\u03b3) = k.\n\n3 Eliciting the Bayes Risk\n\nIn this section we prove two theorems that provide our main tools for proving upper and lower\nbounds respectively on elicitation complexity. Of course many properties are known to be elic-\nitable, and the losses that elicit them provide such an upper bound for that case. We provide such\na construction for properties that can be expressed as the pointwise minimum of an indexed set of\nfunctions. Interestingly, our construction does not elicit the minimum directly, but as a joint elicita-\ntion of the value and the function that realizes this value. The form (1) is that of a scoring rule for\nthe linear property p (cid:55)\u2192 Ep[Xa], except that here the index a itself is also elicited.7\nTheorem 1. Let {Xa : \u2126 \u2192 R}a\u2208A be a set of P-integrable functions indexed by A \u2286 Rk. Then if\ninf a Ep[Xa] is attained, the property \u03b3(p) = mina Ep[Xa] is (k + 1)-elicitable. In particular,\n\nL((r, a), \u03c9) = H(r) + h(r)(Xa \u2212 r)\n\n(1)\n\nelicits p (cid:55)\u2192 {(\u03b3(p), a) : Ep[Xa] = \u03b3(p)} for any strictly decreasing h : R \u2192 R+ with d\n\ndr H = h.\n\n5Here and throughout, when \u2126 = Rk we assume the Borel \u03c3-algebra.\n6Omitted proofs can be found in the appendix of the full version of this paper.\n7As we focus on elicitation complexity, we have not tried to characterize all ways to elicit this joint property,\n\nor other properties we give explicit losses for. See \u00a7 4.1 for an example where additional losses are possible.\n\n4\n\n\fProof. We will work with gains instead of losses, and show that S((r, a), \u03c9) = g(r) + dgr(Xa \u2212 r)\nelicits p (cid:55)\u2192 {(\u03b3(p), a) : Ep[Xa] = \u03b3(p)} for \u03b3(p) = maxa Ep[Xa]. Here g is convex with strictly\nincreasing and positive subgradient dg.\nFor any \ufb01xed a, we have by the subgradient inequality,\n\nS((r, a), p) = g(r) + dgr(Ep[Xa] \u2212 r) \u2264 g(Ep[Xa]) = S((Ep[Xa], a), p) ,\n\nand as dg is strictly increasing, g is strictly convex, so r = Ep[Xa] is the unique maximizer. Now\nletting \u02dcS(a, p) = S((Ep[Xa], a), p), we have\n\nargmax\n\n\u02dcS(a, p) = argmax\n\na\u2208A\nbecause g is strictly increasing. We now have\n\na\u2208A\n\ng(Ep[Xa]) = argmax\n\na\u2208A\n\n(cid:26)\n\nargmax\na\u2208A,r\u2208R\n\nS((r, a), p) =\n\n(Ep[Xa], a) : a \u2208 argmax\na\u2208A\n\nEp[Xa] ,\n\n(cid:27)\n\nEp[Xa]\n\n.\n\nOne natural way to get such an indexed set of functions is to take an arbitrary loss function L(r, \u03c9),\nin which case this pointwise minimum corresponds to the Bayes risk, which is simply the minimum\npossible expected loss under some distribution p.\nDe\ufb01nition 10. Given loss function L : A \u00d7 \u2126 \u2192 R on some prediction set A, the Bayes risk of L\nis de\ufb01ned as L(p) := inf a\u2208A L(a, p).\n\nOne illustration of the power of Theorem 1 is that the Bayes risk of a loss eliciting a k-dimensional\nproperty is itself (k + 1)-elicitable.\nCorollary 1. If L : Rk \u00d7 \u2126 \u2192 R is a loss function eliciting \u0393 : P \u2192 Rk, then the loss\n\nelicits {L, \u0393}, where h : R \u2192 R+ is any positive strictly decreasing function, H(r) =(cid:82) r\n\nL((r, a), \u03c9) = L(cid:48)(a, \u03c9) + H(r) + h(r)(L(a, \u03c9) \u2212 r)\n\nand L(cid:48) is any surrogate loss eliciting \u0393.8 If \u0393 \u2208 Ik(P), elicI(L) \u2264 k + 1.\nWe now turn to our second theorem which provides lower bounds for the elicitation complexity\nof the Bayes risk. A \ufb01rst observation, which follows from standard convex analysis, is that L is\nconcave, and thus it is unlikely to be elicitable directly, as the level sets of L are likely to be non-\nconvex. To show a lower bound greater than 1, however, we will need much stronger techniques.\nIn particular, while L must be concave, it may not be strictly so, thus enabling level sets which are\npotentially amenable to elicitation. In fact, L must be \ufb02at between any two distributions which share\na minimizer. Crucial to our lower bound is the fact that whenever the minimizer of L differs between\ntwo distributions, L is essentially strictly concave between them.\nLemma 4. Suppose loss L with Bayes risk L elicits \u0393 : P \u2192 Rk. Then for any p, p(cid:48) \u2208 P with\n\u0393(p) (cid:54)= \u0393(p(cid:48)), we have L(\u03bbp + (1 \u2212 \u03bb)p(cid:48)) > \u03bbL(p) + (1 \u2212 \u03bb)L(p(cid:48)) for all \u03bb \u2208 (0, 1).\n\n(2)\n0 h(x)dx,\n\nWith this lemma in hand we can prove our lower bound. The crucial insight is that an identi\ufb01cation\nfunction for the Bayes risk of a loss eliciting a property can, through a link, be used to identify that\nproperty. Corollary 1 tells us that k + 1 parameters suf\ufb01ce for the Bayes risk of a k-dimensional\nproperty, and our lower bound shows this is often necessary. Only k parameters suf\ufb01ce, however,\nwhen the property value itself provides all the information required to compute the Bayes risk; for\nexample, dropping the y2 term from squared loss gives L(x, y) = x2 \u2212 2xy and L(p) = \u2212Ep[y]2,\ngiving elic(L) = 1. Thus the theorem splits the lower bound into two cases.\nTheorem 2. If a loss L elicits some \u0393 \u2208 Ek(P) with elicitation complexity elicI(\u0393) = k, then its\nBayes risk L has elicI(L) \u2265 k. Moreover, if we can write L = f \u25e6 \u0393 for some function f : Rk \u2192 R,\nthen we have elicI(L) = k; otherwise, elicI(L) = k + 1.\nProof. Let \u02c6\u0393 \u2208 E(cid:96) such that L = g \u25e6 \u02c6\u0393 for some g : R(cid:96) \u2192 R.\n\n8Note that one could easily lift the requirement that \u0393 be a function, and allow \u0393(p) to be the set of mini-\n\nmizers of the loss (cf. [18]). We will use this additional power in Example 4.4.\n\n5\n\n\fWe show by contradiction that for all p, p(cid:48) \u2208 P, \u02c6\u0393(p) = \u02c6\u0393(p(cid:48)) implies \u0393(p) = \u0393(p(cid:48)). Otherwise, we\nhave p, p(cid:48) with \u02c6\u0393(p) = \u02c6\u0393(p(cid:48)), and thus L(p) = L(p(cid:48)), but \u0393(p) (cid:54)= \u0393(p(cid:48)). Lemma 4 would then give\nus some p\u03bb = \u03bbp + (1 \u2212 \u03bb)p(cid:48) with L(p\u03bb) > L(p). But as the level sets \u02c6\u0393\u02c6r are convex by Prop. 1,\nwe would have \u02c6\u0393(p\u03bb) = \u02c6\u0393(p), which would imply L(p\u03bb) = L(p).\nWe now can conclude that there exists h : R(cid:96) \u2192 Rk such that \u0393 = h\u25e6 \u02c6\u0393. But as \u02c6\u0393 \u2208 E(cid:96), this implies\nelicI(\u0393) \u2264 (cid:96), so clearly we need (cid:96) \u2265 k. Finally, if (cid:96) = k we have L = g \u25e6 \u02c6\u0393 = g \u25e6 h\u22121 \u25e6 \u0393. The\nupper bounds follow from Corollary 1.\n\n4 Examples and Applications\n\nWe now give several applications of our results. Several upper bounds are novel, as well as all lower\nbounds greater than 2. In the examples, unless we refer to \u2126 explicitly we will assume \u2126 = R and\nwrite y \u2208 \u2126 so that y \u223c p. In each setting, we also make several standard regularity assumptions\nwhich we suppress for ease of exposition \u2014 for example, for the variance and variantile we assume\n\ufb01nite \ufb01rst and second moments (which must span R2), and whenever we discuss quantiles we will\nassume that P is as in Lemma 3, though we will not require as much regularity for our upper bounds.\n\n4.1 Variance\n\nIn Section 2 we showed that elicI(\u03c32) = 2. As a warm up, let us see how to recover this statement\nusing our results on the Bayes risk. We can view \u03c32 as the Bayes risk of squared loss L(x, y) = (x\u2212\ny)2, which of course elicits the mean: L(p) = minx\u2208R Ep[(x \u2212 y)2] = Ep[(Ep[y] \u2212 y)2] = \u03c32(p).\nThis gives us elicI(\u03c32) \u2264 2 by Corollary 1, with a matching lower bound by Theorem 2, as the\nvariance is not simply a function of the mean. Corollary 1 gives losses such as L((x, v), y) =\ne\u2212v((x \u2212 y)2 \u2212 v) \u2212 e\u2212v which elict {Ep[y], \u03c32(p)}, but in fact there are losses which cannot\nbe represented by the form (2), showing that we do not have a full characterization; for example,\n\n\u02c6L((x, v), y) = v2 + v(x \u2212 y)(2(x + y) + 1) + (x \u2212 y)2(cid:0)(x + y)2 + x + y + 1(cid:1) . This \u02c6L was\n\nwith respect to the norm (cid:107)z(cid:107)2 = z(cid:62)(cid:104) 1 \u22121/2\n\ngenerated via squared loss\nelicits the \ufb01rst two moments, and link function (z1, z2) (cid:55)\u2192 (z1, z2 \u2212 z2\n1).\n\ny2\n\nz, which\n\n(cid:13)(cid:13)(cid:13)z \u2212(cid:104) y\n\n(cid:105)(cid:13)(cid:13)(cid:13)2\n\n(cid:105)\n\n\u22121/2\n\n1\n\n4.2 Convex Functions of Means\nAnother simple example is \u03b3(p) = G(Ep[X]) for some strictly convex function G : Rk \u2192 R and\nP-integrable X : \u2126 \u2192 Rk. To avoid degeneracies, we assume dim a\ufb00hull{Ep[X] : p \u2208 P} = k,\ni.e. \u0393 is full rank. Letting {dGp}p\u2208P be a selection of subgradients of G, the loss L(r, \u03c9) =\n\u2212(G(r) + dGr(X(\u03c9) \u2212 r)) elicits \u0393 : p (cid:55)\u2192 Ep[X] (cf. [3]), and moreover we have \u03b3(p) = \u2212L(p).\nBy Lemma 1, elicI(\u0393) = k. One easily checks that L = G \u25e6 \u0393, so now by Theorem 2, elicI(\u03b3) = k\nas well. Letting {Xk}k\u2208N be a family of such \u201cfull rank\u201d random variables, this gives us a sequence\nof real-valued properties \u03b3k(p) = (cid:107)Ep[X](cid:107)2 with elicI(\u03b3k) = k, proving Proposition 4.\n\n4.3 Modal Mass\nWith \u2126 = R consider the property \u03b3\u03b2(p) = maxx\u2208R p([x \u2212 \u03b2, x + \u03b2]), namely, the maximum\nprobability mass contained in an interval of width 2\u03b2. Theorem 1 easily shows elicI(\u03b3\u03b2) \u2264 2,\nas \u02c6\u03b3\u03b2(p) = argmaxx\u2208R p([x \u2212 \u03b2, x + \u03b2]) is elicited by L(x, y) = 1|x\u2212y|>\u03b2, and \u03b3\u03b2(p) = 1 \u2212\nL(p). Similarly, in the case of \ufb01nite \u2126, \u03b3(p) = max\u03c9\u2208\u2126 p({\u03c9}) is simply the expected score (gain\nrather than loss) of the mode \u03b3(p) = argmax\u03c9\u2208\u2126 p({\u03c9}), which is elicitable for \ufb01nite \u2126 (but not\notherwise; see Heinrich [19]).\nIn both cases, one can easily check that the level sets of \u03b3 are not convex, so elicI(\u03b3) = 2; alterna-\ntively Theorem 2 applies in the \ufb01rst case. As mentioned following De\ufb01nition 6, the result for \ufb01nite\n\u2126 differs from the de\ufb01nitions of Lambert et al. [17], where the elicitation complexity of \u03b3 is |\u2126|\u2212 1.\n\n6\n\n\f4.4 Expected Shortfall and Other Spectral Risk Measures\n\nES\u03b1(p) = inf\nz\u2208R\n\nOne important application of our results on the elicitation complexity of the Bayes risk is the elic-\nitability of various \ufb01nancial risk measures. One of the most popular \ufb01nancial risk measures is\nexpected shortfall ES\u03b1 : P \u2192 R, also called conditional value at risk (CVaR) or average value at\nrisk (AVaR), which we de\ufb01ne as follows (cf. [20, eq.(18)], [21, eq.(3.21)]):\n\n(cid:8)Ep\n\n(cid:2) 1\n\u03b1 (z \u2212 y)1z\u2265y \u2212 z(cid:3)(cid:9) = inf\n\n(cid:8)Ep\n\n(cid:2) 1\n\u03b1 (z \u2212 y)(1z\u2265y \u2212 \u03b1) \u2212 y(cid:3)(cid:9) .\n\nDespite the importance of elicitability to \ufb01nancial regulation [11, 22], ES\u03b1 is not elicitable [7]. It\nwas recently shown by Fissler and Ziegel [15], however, that elicI(ES\u03b1) = 2. They also consider the\n[0,1] ES\u03b1(p)d\u00b5(\u03b1),\nwhere \u00b5 is a probability measure on [0, 1] (cf. [20, eq. (36)]). In the case where \u00b5 has \ufb01nite support\n\nbroader class of spectral risk measures, which can be represented as \u03c1\u00b5(p) =(cid:82)\n\u00b5 =(cid:80)k\n\ni=1 \u03b2i\u03b4\u03b1i for point distributions \u03b4, \u03b2i > 0, we can rewrite \u03c1\u00b5 using the above as:\n\nz\u2208R\n\n(3)\n\nEp\n\n\u03b2i\n\u03b1i\n\n(zi \u2212 y)(1zi\u2265y \u2212 \u03b1i) \u2212 y\n\ni=1\n\n\u03c1\u00b5(p) =\n\n\u03b2iES\u03b1i(p) = inf\nz\u2208Rk\n\n(4)\nThey conclude elicI(\u03c1\u00b5) \u2264 k + 1 unless \u00b5({1}) = 1 in which case elicI(\u03c1\u00b5) = 1. We show how\nto recover these results together with matching lower bounds. It is well-known that the in\ufb01mum in\neq. (4) is attained by any of the k quantiles in q\u03b11(p), . . . , q\u03b1k (p), so we conclude elicI(\u03c1\u00b5) \u2264 k + 1\nby Theorem 1, and in particular the property {\u03c1\u00b5, q\u03b11, . . . , q\u03b1k} is elicitable. The family of losses\nfrom Corollary 1 coincide with the characterization of Fissler and Ziegel [15] (see \u00a7 D.1). For a\nlower bound, as elicI({q\u03b11 , . . . , q\u03b1k}) = k whenever the \u03b1i are distinct by Lemma 3, Theorem 2\ngives us elicI(\u03c1\u00b5) = k + 1 whenever \u00b5({1}) < 1, and of course elicI(\u03c1\u00b5) = 1 if \u00b5({1}) = 1.\n\ni=1\n\n.\n\n(cid:40)\n\n(cid:34) k(cid:88)\n\nk(cid:88)\n\n(cid:35)(cid:41)\n\n4.5 Variantile\n\nThe \u03c4-expectile, a type of generalized quantile introduced by Newey and Powell [23], is de\ufb01ned as\nthe solution x = \u00b5\u03c4 to the equation Ep [|1x\u2265y \u2212 \u03c4|(x \u2212 y)] = 0. (This also shows \u00b5\u03c4 \u2208 I1.) Here\nwe propose the \u03c4-variantile, an asymmetric variance-like measure with respect to the \u03c4-expectile:\njust as the mean is the solution x = \u00b5 to the equation Ep[x \u2212 y] = 0, and the variance is \u03c32(p) =\nEp[(\u00b5 \u2212 y)2], we de\ufb01ne the \u03c4-variantile \u03c32\nIt is well-known that \u00b5\u03c4 can be expressed as the minimizer of a asymmetric least squares problem:\nthe loss L(x, y) = |1x\u2265y \u2212 \u03c4|(x \u2212 y)2 elicits \u00b5\u03c4 [23, 7]. Hence, just as the variance turned out to\nbe a Bayes risk for the mean, so is the \u03c4-variantile for the \u03c4-expectile:\nEp\n\n(cid:2)|1x\u2265y \u2212 \u03c4|(x \u2212 y)2(cid:3) =\u21d2 \u03c32\n\n(cid:2)|1\u00b5\u03c4\u2265y \u2212 \u03c4|(\u00b5\u03c4 \u2212 y)2(cid:3).\n\n(cid:2)|1x\u2265y \u2212 \u03c4|(x \u2212 y)2(cid:3) .\n\n\u03c4 (p) = Ep\n\n\u00b5\u03c4 = argmin\n\n\u03c4 by \u03c32\n\nEp\n\n\u03c4 = min\nx\u2208R\n\nWe now see the pair {\u00b5\u03c4 , \u03c32\n\n\u03c4} is elicitable by Corollary 1, and by Theorem 2 we have elicI(\u03c32\n\n\u03c4 ) = 2.\n\nx\u2208R\n\n4.6 Deviation and Risk Measures\nRockafellar and Uryasev [21] introduce \u201crisk quadrangles\u201d in which they relate a risk R, deviation\nD, error E, and a statistic S, all functions from random variables to the reals, as follows:\nR(X) = min\n\n{C + E(X \u2212 C)}, D(X) = min\n\n{E(X \u2212 C)}, S(X) = argmin\n\n{E(X \u2212 C)} .\n\nC\n\nC\n\nOur results provide tight bounds for many of the risk and deviation measures in their paper. The most\nimmediate case is the expectation quadrangle case, where E(X) = E[e(X)] for some e : R \u2192 R.\nIn this case, if S(X) \u2208 I1(P) Theorem 2 implies elicI(R) = elicI(D) = 2 provided S is non-\nconstant and e non-linear. This includes several of their examples, e.g. truncated mean, log-exp, and\nrate-based. Beyond the expectation case, the authors show a Mixing Theorem, where they consider\n\n(cid:40) k(cid:88)\n\n\u03bbiEi(X \u2212 C \u2212 Bi)(cid:12)(cid:12)(cid:88)\n\n(cid:41)\n\nC\n\nmin\n\nB1,..,Bk\n\nD(X) = min\n.\nOnce again, if the Ei are all of expectation type and Si \u2208 I1, Theorem 1 gives elicI(D) =\nelicI(R) \u2264 k + 1, with a matching lower bound from Theorem 2 provided the Si are all inde-\npendent. The Reverting Theorem for a pair E1,E2 can be seen as a special case of the above where\n\n\u03bbiEi(X \u2212 B(cid:48)\ni)\n\n= min\n1,..,B(cid:48)\nB(cid:48)\n\n\u03bbiBi = 0\n\ni=1\n\ni=1\n\ni\n\nk\n\nC\n\n(cid:40) k(cid:88)\n\n(cid:41)\n\n7\n\n\fone replaces E2(X) by E2(\u2212X). Consequently, we have tight bounds for the elicitation complex-\nity of several other examples, including superquantiles (the same as spectral risk measures), the\nquantile-radius quadrangle, and optimized certainty equivalents of Ben-Tal and Teboulle [24].\nOur results offer an explaination for the existence of regression procedures for some of these\nrisk/deviation measures. For example, a proceedure called superquantile regression was introduced\nin Rockafellar et al. [25], which computes spectral risk measures. In light of Theorem 1, one could\ninterpret their procedure as simply performing regression on the k different quantiles as well as the\nBayes risk. In fact, our results show that any risk/deviation generated by mixing several expectation\nquadrangles will have a similar procedure, in which the B(cid:48)\ni variables are simply computed along side\nthe measure of interest. Even more broadly, such regression procedures exist for any Bayes risk.\n\n5 Discussion\n\nWe have outlined a theory of elicitation complexity which we believe is the right notion of complex-\nity for ERM, and provided techniques and results for upper and lower bounds. In particular, we now\nhave tight bounds for the large class of Bayes risks, including several applications of note such as\nspectral risk measures. Our results also offer an explanation for why procedures like superquantile\nregression are possible, and extend this logic to all Bayes risks.\nThere many natural open problems in elicitation complexity. Perhaps the most apparent are the\ncharacterizations of the complexity classes {\u0393 : elic(\u0393) = k}, and in particular, determining the\nelicitation complexity of properties which are known to be non-elicitabile, such as the mode [19]\nand smallest con\ufb01dence interval [18].\nIn this paper we have focused on elicitation complexity with respect to the class of identi\ufb01able\nproperties I, which we denoted elicI. This choice of notation was deliberate; one may de\ufb01ne\nelicC := min{k : \u2203\u02c6\u0393 \u2208 Ek \u2229 C,\u2203f, \u0393 = f \u25e6 \u02c6\u0393} to be the complexity with respect to some arbitrary\nclass of properties C. Some examples of interest might be elicE for expected values, of interest to\nthe prediction market literature [8], and eliccvx for properties elicitable by a loss which is convex in\nr, of interest for ef\ufb01ciently performing ERM.\nAnother interesting line of questioning follows from the notion of conditional elicitation, properties\nwhich are elicitable as long as the value of some other elicitable property is known. This notion\nwas introduced by Emmer et al. [11], who showed that the variance and expected shortfall are both\nconditionally elicitable, on Ep[y] and q\u03b1(p) respectively. Intuitively, knowing that \u0393 is elicitable\nconditional on an elicitable \u0393(cid:48) would suggest that perhaps the pair {\u0393, \u0393(cid:48)} is elicitable; Fissler and\nZiegel [15] note that it is an open question whether this joint elicitability holds in general. The Bayes\nrisk L for \u0393 is elicitable conditioned on \u0393, and as we saw above, the pair {\u0393, L} is jointly elicitable\nas well. We give a counter-example in Figure 1, however, which also illustrates the subtlety of\ncharacterizing all elicitable properties.\n\nFigure 1: Depictions of the level sets of two properties, one elicitable and the other not. The left is a Bayes risk\ntogether with its property, and thus elicitable, while the right is shown in [3] not to be elicitable. Here the planes\nare shown to illustrate the fact that these are both conditionally elicitable: the height of the plane (the intersept\n(p3, 0, 0) for example) is elicitable from the characterizations for scalar properties [9, 1], and conditioned on\nthe plane, the properties are both linear and thus links of expected values, which are also elicitable.\n\n8\n\np1p2p3p1p2p3\fReferences\n[1] Ingo Steinwart, Chlo Pasin, Robert Williamson, and Siyu Zhang. Elicitation and Identi\ufb01cation of Proper-\n\nties. In Proceedings of The 27th Conference on Learning Theory, pages 482\u2013526, 2014.\n\n[2] A. Agarwal and S. Agrawal. On Consistent Surrogate Risk Minimization and Property Elicitation. In\n\nCOLT, 2015.\n\n[3] Rafael Frongillo and Ian Kash. Vector-Valued Property Elicitation. In Proceedings of the 28th Conference\n\non Learning Theory, pages 1\u201318, 2015.\n\n[4] L.J. Savage. Elicitation of personal probabilities and expectations. Journal of the American Statistical\n\nAssociation, pages 783\u2013801, 1971.\n\n[5] Kent Harold Osband. Providing Incentives for Better Cost Forecasting. University of California, Berkeley,\n\n1985.\n\n[6] T. Gneiting and A.E. Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the\n\nAmerican Statistical Association, 102(477):359\u2013378, 2007.\n\n[7] T. Gneiting. Making and Evaluating Point Forecasts. Journal of the American Statistical Association,\n\n106(494):746\u2013762, 2011.\n\n[8] J. Abernethy and R. Frongillo. A characterization of scoring rules for linear properties. In Proceedings of\n\nthe 25th Conference on Learning Theory, pages 1\u201327, 2012.\n\n[9] N.S. Lambert. Elicitation and Evaluation of Statistical Forecasts. Preprint, 2011.\n[10] N.S. Lambert and Y. Shoham. Eliciting truthful answers to multiple-choice questions. In Proceedings of\n\nthe 10th ACM conference on Electronic commerce, pages 109\u2013118, 2009.\n\n[11] Susanne Emmer, Marie Kratz, and Dirk Tasche. What is the best risk measure in practice? A comparison\n\nof standard measures. arXiv:1312.1645 [q-\ufb01n], December 2013. arXiv: 1312.1645.\n\n[12] Fabio Bellini and Valeria Bignozzi. Elicitable risk measures. This is a preprint of an article accepted for\n\npublication in Quantitative Finance (doi 10.1080/14697688.2014. 946955), 2013.\n\n[13] Johanna F. Ziegel. Coherence and elicitability. Mathematical Finance, 2014. arXiv: 1303.1690.\n[14] Ruodu Wang and Johanna F. Ziegel. Elicitable distortion risk measures: A concise proof. Statistics &\n\nProbability Letters, 100:172\u2013175, May 2015.\n\n[15] Tobias Fissler and Johanna F. Ziegel. Higher order elicitability and Osband\u2019s principle. arXiv:1503.08123\n\n[math, q-\ufb01n, stat], March 2015. arXiv: 1503.08123.\n\n[16] A. Banerjee, X. Guo, and H. Wang. On the optimality of conditional expectation as a Bregman predictor.\n\nIEEE Transactions on Information Theory, 51(7):2664\u20132669, July 2005.\n\n[17] N.S. Lambert, D.M. Pennock, and Y. Shoham. Eliciting properties of probability distributions. In Pro-\n\nceedings of the 9th ACM Conference on Electronic Commerce, pages 129\u2013138, 2008.\n\n[18] Rafael Frongillo and Ian Kash. General truthfulness characterizations via convex analysis. In Web and\n\nInternet Economics, pages 354\u2013370. Springer, 2014.\n\n[19] C. Heinrich. The mode functional is not elicitable. Biometrika, page ast048, 2013.\n[20] Hans Fllmer and Stefan Weber. The Axiomatic Approach to Risk Measures for Capital Determination.\n\nAnnual Review of Financial Economics, 7(1), 2015.\n\n[21] R. Tyrrell Rockafellar and Stan Uryasev. The fundamental risk quadrangle in risk management, optimiza-\ntion and statistical estimation. Surveys in Operations Research and Management Science, 18(1):33\u201353,\n2013.\n\n[22] Tobias Fissler, Johanna F. Ziegel, and Tilmann Gneiting. Expected Shortfall is jointly elicitable with\n\nValue at Risk - Implications for backtesting. arXiv:1507.00244 [q-\ufb01n], July 2015. arXiv: 1507.00244.\n\n[23] Whitney K. Newey and James L. Powell. Asymmetric least squares estimation and testing. Econometrica:\n\nJournal of the Econometric Society, pages 819\u2013847, 1987.\n\n[24] Aharon Ben-Tal and Marc Teboulle. AN OLD-NEW CONCEPT OF CONVEX RISK MEASURES: THE\n\nOPTIMIZED CERTAINTY EQUIVALENT. Mathematical Finance, 17(3):449\u2013476, 2007.\n\n[25] R. T. Rockafellar, J. O. Royset, and S. I. Miranda. Superquantile regression with applications to buffered\nreliability, uncertainty quanti\ufb01cation, and conditional value-at-risk. European Journal of Operational\nResearch, 234:140\u2013154, 2014.\n\n9\n\n\f", "award": [], "sourceid": 1815, "authors": [{"given_name": "Rafael", "family_name": "Frongillo", "institution": "CU Boulder"}, {"given_name": "Ian", "family_name": "Kash", "institution": "Microsoft"}]}