{"title": "Convex Elicitation of Continuous Properties", "book": "Advances in Neural Information Processing Systems", "page_first": 10404, "page_last": 10413, "abstract": "A property or statistic of a distribution is said to be elicitable if it can be expressed as the minimizer of some loss function in expectation. Recent work shows that continuous real-valued properties are elicitable if and only if they are identifiable, meaning the set of distributions with the same property value can be described by linear constraints. From a practical standpoint, one may ask for which such properties do there exist convex loss functions. In this paper, in a finite-outcome setting, we show that in fact every elicitable real-valued property can be elicited by a convex loss function. Our proof is constructive, and leads to convex loss functions for new properties.", "full_text": "Convex Elicitation of Continuous Real Properties\n\nJessica Finocchiaro\n\nDepartment of Computer Science\nUniversity of Colorado, Boulder\n\nRafael Frongillo\n\nDepartment of Computer Science\nUniversity of Colorado, Boulder\n\njessica.finocchiaro@colorado.edu\n\nraf@colorado.edu\n\nAbstract\n\nA property or statistic of a distribution is said to be elicitable if it can be expressed\nas the minimizer of some loss function in expectation. Recent work shows that\ncontinuous real-valued properties are elicitable if and only if they are identi\ufb01able,\nmeaning the set of distributions with the same property value can be described\nby linear constraints. From a practical standpoint, one may ask for which such\nproperties do there exist convex loss functions. In this paper, in a \ufb01nite-outcome\nsetting, we show that in fact essentially every elicitable real-valued property can be\nelicited by a convex loss function. Our proof is constructive, and leads to convex\nloss functions for new properties.\n\n1\n\nIntroduction\n\nProperty elicitation is the study of statistics, or properties, of probability distributions which one\ncan incentivize an expected-utility-maximizing agent to reveal. In a machine learning context, this\n\u201cagent\u201d is an algorithm following the principle of empirical risk minimization (ERM), wherein a\nhypothesis is \ufb01t to the data by minimizing its error on a training data set, as judged by some loss\nfunction. The interest in property elicitation across the economics, statistics, and machine learning\ncommunities is re\ufb02ected in the literature, with important results appearing in all three.\nA central thread of this literature, weaving between all three communities, asks which continuous\nreal-valued properties are elicitable, and which loss functions elicit them. Building on earlier work\nof Osband [16] and Lambert [12], Steinwart et al. [22] show that a property is elicitable if and only\nif it is identi\ufb01able, a concept introduced by Osband which says that the set of distributions sharing\nthe same property value can be described by a set of linear constraints. Moreover, these papers give\ncharacterizations of the loss functions eliciting these identi\ufb01able properties, showing that every loss\ncan be written as the integral of a positive-weighted identi\ufb01cation function.\nA question of practical interest remains, however: for which properties do there exist convex loss\nfunctions eliciting them? Convex losses give concrete algorithms to ef\ufb01ciently solve ERM problems,\nand are also useful more broadly in statistical and economic settings (see \u00a7 6). At \ufb01rst glance, the\nanswer to this question might appear to follow immediately from the comprehensive loss function\ncharacterizations of Lambert [12] and Steinwart et al. [22]. Unfortunately, it is far from clear in\nthese characterizations whether there exists a weight function rendering their construction convex.\nIn this paper, we address this question of convex elicitability in the \ufb01nite-outcome setting. Surpris-\ningly, we \ufb01nd that, under somewhat mild smoothness assumptions, every identi\ufb01able real-valued\nproperty is convex elicitable. Our proof proceeds by pinpointing a few key attributes of identi\ufb01ca-\ntion functions, and then solving the following abstract problem: given a set of functions F from R\nto R, when does there exist a weight function \u03bb : R \u2192 R>0 making \u03bbf increasing over the report\nspace R for all f \u2208 F? We give a constructive solution to this problem under certain conditions,\nand show that identi\ufb01cation functions happen to satisfy these conditions.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00b4eal, Canada.\n\n\fAfter reviewing the relevant prior work in more detail (\u00a7 2), we give our main result (\u00a7 3). We then\ngive intuition for our key technical proposition, the solution to the abstract problem mentioned above\n(\u00a7 4), followed by examples illustrating the constructive nature of our approach (\u00a7 5). We conclude\nwith a discussion of applications to information elicitation, and future work (\u00a7 6). See the Appendix\nfor all omitted proofs.\nNotation. We will use the following notation throughout the paper. Let R>0 := {r : r \u2208 R, r > 0}\ndenote the positive reals. For an interval I, let \u02daI denote the interior of I. We let Y denote the\noutcomes space, here taken to be \ufb01nite, and \u2206(Y) denote the set of probability distributions over Y.\n\n2 Setting and Background\n\nIn property elicitation, we aim to learn some distributional property by minimizing a loss function. A\nproperty is simply a function \u0393 : P \u2192 R, which assigns a desired report R \u2286 Rk to each probability\ndistribution in a convex set P \u2286 \u2206(Y). Without loss of generality, we often restrict R = \u0393(P),\nbut this does not affect the result. Common properties include moments, quantiles, and expectiles.\nThroughout the paper we will assume that \u0393 is a continuous real-valued function, implying R \u2286 R\nis an interval. We also restrict to the \ufb01nite outcome setting, where |Y| < \u221e, and consider P \u2282 R|Y|,\nmeaning we identify each distribution with the corresponding vector of probabilities.\nWe are interested in when properties are elicitable, meaning they can be expressed as the minimizer\nof expected loss for some loss function. In the present paper, we will additionally ask when the loss\nfunction can be convex.\nDe\ufb01nition 1. A loss function L : R \u00d7 Y \u2192 R \u222a {\u221e} elicits a property \u0393 if for all p \u2208 P,\n\n{\u0393(p)} = arg min\n\nEY \u223cpL(r, Y ) .\n\n(1)\nIn this case, we say \u0393 is elicitable. If L(\u00b7, y) is convex for every y \u2208 Y, we say \u0393 is convex elicitable.\nA central notion in property elicitation is that of identi\ufb01ability, where the level sets {p : \u0393(p) = r}\ncan be expressed as a linear constraint.\nDe\ufb01nition 2. Let property \u0393 : P \u2192 R be given, where R = \u0393(P). A function V : R \u00d7 Y \u2192 R\nidenti\ufb01es \u0393 if\n\nr\n\nEY \u223cp [V (r, Y )] = 0 \u21d0\u21d2 r = \u0393(p)\n\nEY \u223cpL(r, Y ) = 0 \u21d0\u21d2 \u0393(p) = r.\n\n(2)\nfor all r \u2208 \u02daR and p \u2208 P. In this case we say \u0393 is identi\ufb01able. We say V is oriented if we additionally\nhave EY \u223cp [V (r, Y )] > 0 \u21d0\u21d2 r > \u0393(p), for all r \u2208 \u02daR and p \u2208 P.\nNote that by the terminology of Steinwart et al. [22], an identi\ufb01cation satisfying eq. (2) on all of \u02daR is\ncalled strong, as otherwise it must hold almost everywhere. We can loosely think of an identi\ufb01cation\nfunction as a derivative of a loss; if L is differentiable and elicits \u0393, then roughly speaking, we expect\nd\ndr\nFinally, we will often assume our properties to possess two important qualities: continuity, and being\nnowhere-locally-constant.\nDe\ufb01nition 3 (Lambert [12]). A continuous property \u0393 : P \u2192 R is nowhere-locally-constant if there\ndoes not exist any open neighborbood U of P such that \u0393(p) = r for all p \u2208 U.\nIntuitively, restricting to nowhere-locally-constant properties is merely to ease bookkeeping, as one\ncould always collapse different report values together afterwards.\nIt is known that for continuous, nowhere-locally-constant, real-valued properties, identi\ufb01ability im-\nplies elicitability. In this paper, we show that under slightly stronger assumptions, identi\ufb01ability\nimplies convex elicitability. To place this result in the proper context, we now brie\ufb02y tour the history\nof property elicitation.\n\n2.1 Relevant prior work\n\nWhile Savage [20] studied the elicitation of expected values, the literature on the elicitation of gen-\neral properties began with Osband [16], who gave several important results. One of Osband\u2019s ob-\nservations is that the level sets {p : \u0393(p) = r} of an elicitable property \u0393 must be convex [16,\n\n2\n\n\fProposition 2.5]. He also introduced the notion of an identi\ufb01cation function, and the so-called Os-\nband\u2019s principle, which states that (under a mild regularity assumption) every loss function eliciting\na given property can be written as the integral of a weighted identi\ufb01cation function [16, Theorem\n2.1]. He also gave several other results, such as the separability of loss functions jointly eliciting\nquantiles.\nIndependent of Osband, Lambert et al. [14, 15, 12] provide a geometric approach to both continuous\nand \ufb01nite properties (that is, properties taking values in a \ufb01nite set R) when the set of outcomes Y is\n\ufb01nite. They represent the identi\ufb01cation function as a vector, and relating \ufb01nite-valued properties to\npower diagrams in computational geometry. They rediscover several results of Osband for the real-\nvalued case, such as convexity of level sets and a one-dimensional version of Osband\u2019s principle.\nMoreover, the proof of [12, Theorem 5] shows the following. (Steinwart, et al. [22] extend this result\nto the case of in\ufb01nite Y.)\nTheorem 1 (Lambert [12]). Let \u0393 : P \u2192 R be a continuous, nowhere-locally-constant property. If\nthe level sets {p \u2208 P : \u0393(p) = r} are convex, then \u0393 is elicitable, and has a continuous, bounded,\nand oriented identi\ufb01cation function.\n\nNone of the above-mentioned papers address the question of when the loss eliciting the property in\nquestion is convex. This question has arisen in the context of surrogate risk minimization, where\nunlike our setting, one seeks to indirectly elicit a given \ufb01nite-valued property (such as the mode, or\nmost likely label) by \ufb01rst eliciting a real-valued property, and then applying a link function [5, 23].\nFor example, support vector machines (SVMs) commonly use hinge loss and then apply the sign\nfunction, a combination which indirectly elicits the mode [21]. This literature is related to elicitation\ncomplexity, where one asks how many dimensions of the surrogate property are needed so that the\ndesired property can be computed via the link function [7, 9, 14]. This relationship was perhaps \ufb01rst\nidenti\ufb01ed by Agarwal and Agarwal [3], who restate prior work in terms of property elicitation, and\nspeci\ufb01cally focus on convex losses. Finally, Reid, Vernet, and Williamson [18, 24] consider losses\nwhich indirectly elicit the full distribution, and consider convexity of the composite loss.\nIn contrast to this line of work, we seek the direct elicitation of continuous properties. While convex\nlosses are well-known for several continuous properties of interest, including the mean and other\nexpected values (squared loss), ratios of expectations (weighted squared loss), and the median and\nother quantiles (pinball loss), to our knowledge, to date there have been no results on the direct\nconvex elicitation of general continuous properties.\n\n3 Main Result\n\nWe will show that, under mild conditions, every elicitable real-valued property is also convex elic-\nitable. Let us \ufb01rst give some intuition why one might suspect this statement to be true. From a\ngeometric perspective, the level sets {p : \u0393(p) = r} of continuous elicitable properties are hyper-\nplanes intersected with P. As one changes r, the level sets may be locally parallel, in which case the\nproperty is locally a link of a linear property (expected value), or the level sets may not be parallel,\nin which case the property locally resembles a link of a ratio of expectations. In fact, the second\ncase also covers the \ufb01rst, so we can say that, roughly speaking, every continuous property looks\nlocally like a ratio of expectations. The following proposition states that if the property can actually\nbe written as a \ufb01nite piecewise ratio of expectations, it is convex elicitable. Hence, taking the limit\nas one approximates a given property better and better by ratios of expectations, one may suspect\nthat indeed every continuous property is convex elicitable.\nProposition 1. Continuous piecewise ratio-of-expectation properties are convex elicitable.\nProof. First, we formalize the statement. Recall that Y is a \ufb01nite set. Let \u03c6i\n: Y \u2192 R and\n\u03c8i : Y \u2192 R>0 be arbitrary for i = 1, . . . , k, and let \u03b3i(p) = EY \u223cp\u03c6i(Y )/EY \u223cp\u03c8i(Y ). Assume\nthat we have a0 < \u00b7\u00b7\u00b7 < ak such that for all p \u2208 P, there is a unique i \u2208 {1, . . . , k} such that\neither \u03b3i\u22121(p) \u2208 (ai\u22121, ai) or \u03b3i\u22121(p) = \u03b3i(p) = ai\u22121. Call this i(p), and by extension i(r) where\nr = \u03b3i(p) for this i. We will show that \u0393(p) := \u03b3i(p)(p) is convex elicitable with respect to the full\nprobability simplex P = \u2206(Y).\nObserve that by construction, for each i \u2208 {1, . . . , k \u2212 1} the level sets for ai coincide: Si = {p :\n\u0393(p) = ai} = {p : \u03b3i(p) = ai} = {p : \u03b3i\u22121(p) = ai}. Moreover, for all such i, these level sets\n\n3\n\n\fare full-dimensional in P, i.e., there are (n \u2212 2)-dimensional af\ufb01ne sets which are the intersection\nof a hyperplane and P. Now let Vi(r, y) = \u03c8i(y)r \u2212 \u03c6i(y), which identi\ufb01es \u03b3i, and is strictly\nincreasing in r as \u03c8i(y) > 0 for all y. We now see that the hyperplane which is the span of Si in Rn\nis orthogonal to the vectors Vi\u22121(ai,\u00b7) \u2208 Rn and Vi(ai,\u00b7) \u2208 Rn, by the de\ufb01nition of identi\ufb01ability.\nWe conclude that there is some coef\ufb01cient \u03b1i\u22121 such that Vi\u22121(ai, y) = \u03b1i\u22121Vi(ai, y) for all y \u2208 Y.\nj=0 \u03b1j and\n\n(In fact, \u03b1i\u22121 > 0, as the coef\ufb01cient of r must be positive.) We then construct \u03b2i(r) =(cid:81)i(r)\n\nwrite the identi\ufb01cation as V (r, y) = \u03b2i(r)Vi(r)(r, y).\nMoving now to the formal result, let I \u2286 R be an interval. Our main technical ingredient shows,\ngiven a collection F of functions f : I \u2192 R satisfying certain conditions, how to construct a\nmultiplier \u03bb : I \u2192 R>0 making \u03bbf strictly increasing on \u02daI for all f \u2208 F. In our proof, the family\nF will be the set of identi\ufb01cation functions {V (\u00b7, y)}y\u2208Y, and \u03bb will play the role of the weight\nfunction in previous work ([16, Theorem 2.1], [11, Theorem 2.7], [14, Theorem 3]) showing that\n\n\u03bb(x)V (x, y)dx elicits \u0393. As \u03bbf is increasing, L will additionally be convex.\n\nL(r, y) =(cid:82) r\n\nr0\n\ndz f (z) \u2265 0 whenever f is differentiable.\n\nTherefore, the conditions below are only mildly stronger than what Lambert shows to be true of the\ndesired properties. We begin with our three conditions; the \ufb01rst we will assume, and the second and\nthird we will prove hold for any oriented identi\ufb01cation function.\nCondition 1. Every f \u2208 F is continuous on \u02daI, and continuously differentiable on \u02daI except on a\n\ufb01nite set Sf (cid:40) \u02daI. When f is differentiable, d\ndx f (x) is \ufb01nite. Additionally, if x \u2208 \u02daI and f (x) = 0,\nthen for all z in some open neighborhood U of x, d\nCondition 2. Every f \u2208 F is bounded and has at most one zero xf \u2208 \u02daI so that if xf exists,\nf (x) < 0 for x < xf and f (x) > 0 for x > xf . If f does not have a zero on \u02daI, then either\nf (x) < 0 or f (x) > 0 for all x \u2208 \u02daI. For all x \u2208 \u02daI, at least one function f \u2208 F is nonzero at x.\nCondition 3. For all f, g \u2208 F and all open subintervals I(cid:48) \u2286 \u02daI such that f > 0 > g on I(cid:48), the\nfunction g\nOur main technical tool follows; we sketch the proof in \u00a7 4 and defer the full proof to the Appendix.\nProposition 2. If F satis\ufb01es Condition 1, 2, and 3, then there exists a function \u03bb : I \u2192 R>0 so that\n\u03bbf is increasing over \u02daI for every f \u2208 F.\n\nf is strictly increasing on I(cid:48).\n\nWith this tool in hand, we are ready to prove our main result.\nTheorem 2. For P = \u2206(Y), let \u0393 : P \u2192 R be a continuous, nowhere-locally-constant property\nwhich is identi\ufb01ed by a bounded and oriented V : R \u00d7 Y \u2192 R. If F = {V (\u00b7, y)}y\u2208Y satis\ufb01es\nCondition 1, then \u0393 is convex elicitable.\nProof. We have assumed all f \u2208 F = {V (\u00b7, y)}y\u2208Y are bounded, oriented, and satisfy Condition 1,\nand thus to apply Proposition 2, we need only establish Conditions 2 and 3. A fact we use throughout\nis that V (r, y) = EY \u223c\u03b4y V (r, Y ), where \u03b4y is the point distribution on y \u2208 Y.\nTo establish Condition 2, we procede in order. First, boundedness of each f \u2208 F follows by\nassumption. Second, we show that each f has at most one zero on \u02daR. As V identi\ufb01es \u0393, note that\nV (r, y) = 0 \u21d0\u21d2 \u0393(\u03b4y) = r when r \u2208 \u02daR. As \u0393 is single-valued, there can be at most one such\nr \u2208 \u02daR. Third, we must show that if f has a zero on \u02daR, it changes sign from negative to positive at\nthat zero, and if not, f never changes sign on \u02daR. The \ufb01rst case follows from the fact that \u0393(\u03b4y) = r\nand that V is oriented. For the second case, V (\u00b7, y) has no zero on \u02daR, and thus by continuity of V ,\ncannot change sign on \u02daR. Fourth, to see that F has at least one nonzero function for all r \u2208 \u02daR, note\nthat if V (r, y) = 0 for all y \u2208 Y, then EY \u223cpV (r, Y ) = 0 for all p \u2208 P. Thus, as V identi\ufb01es \u0393 and\nr \u2208 \u02daR, we would have \u0393(p) = r for all p, contradicting nowhere-locally-constancy.\nFor Condition 3, consider V (\u00b7, y0), V (\u00b7, y1) \u2208 F and open interval I(cid:48) = (a, b) such that V (r, y0) >\n0 > V (r, y1) for all r \u2208 I(cid:48). We de\ufb01ne p\u03b1 = (1 \u2212 \u03b1)\u03b4y0 + \u03b1\u03b4y1 and \u03b3(\u03b1) = \u0393(p\u03b1) for \u03b1 \u2208 [0, 1].\nSince \u0393 is continuous and nowhere-locally-constant, [22, Cor. 9] implies that \u0393 is quasi-monotone,\nwhich in turn implies that \u03b3 is nondecreasing on [0, 1].\n\n4\n\n\fWe \ufb01rst show I(cid:48) \u2286 \u03b3([0, 1]) = [\u03b3(0), \u03b3(1)]. By de\ufb01nition of I(cid:48), we know r \u2208 I(cid:48) =\u21d2 V (r, y1) <\n0 < V (r, y0) and the orientation of V then implies \u0393(\u03b4y1) > r > \u0393(\u03b4y0 ). Thus, \u0393(\u03b4y1) = \u03b3(1) \u2265\nb > a \u2265 \u0393(\u03b4y0 ) = \u03b3(0), with the strict inequality since I(cid:48) is nonempty. We then see that r \u2208\n(a, b) =\u21d2 r \u2208 [\u0393(\u03b4y0 ), \u0393(\u03b4y1 )] = \u03b3([0, 1]), and therefore I(cid:48) \u2286 \u03b3([0, 1])\nNext, we show that \u03b3 is not only nondecreasing but strictly increasing on A = \u03b3\u22121(I(cid:48)). Note\nthat A is itself an open interval as \u03b3 is continuous. Let \u03b1, \u03b1(cid:48) \u2208 A, and suppose for a contradic-\ntion that \u03b3(\u03b1) = \u03b3(\u03b1(cid:48)) = r \u2208 I(cid:48) \u2286 \u02daR. Then \u0393(p\u03b1) = \u0393(p\u03b1(cid:48)) = r, and as V identi\ufb01es \u0393,\nwe have EY \u223cp\u03b1 V (r, Y ) = EY \u223cp\u03b1(cid:48) V (r, Y ) = 0. Thus, EY \u223cp0V (r, Y ) = (\u03b1(cid:48)EY \u223cp\u03b1V (r, Y ) \u2212\n\u03b1EY \u223cp\u03b1(cid:48) V (r, Y ))/(\u03b1(cid:48) \u2212 \u03b1) = 0, and similarly for p1. By identi\ufb01ability again, we must now have\n\u0393(p0) = \u0393(p1) = r, contradicting \u0393(p0) < \u0393(p1) as observed above.\nSince V identi\ufb01es \u0393, we have for \u03b1 \u2208 A,\n\n0 = EY \u223cp\u03b1 V (\u03b3(\u03b1), Y ) = (1 \u2212 \u03b1)EY \u223c\u03b4y0\n\n[V (\u03b3(\u03b1), Y )] + \u03b1EY \u223c\u03b4y1\n\n[V (\u03b3(\u03b1), Y )]\n\n= (1 \u2212 \u03b1)V (\u03b3(\u03b1), y0) + \u03b1V (\u03b3(\u03b1), y1) ,\n\nfrom which we conclude the function F (\u03b1) = V (\u03b3(\u03b1), y1)/V (\u03b3(\u03b1), y0) = (\u03b1 \u2212 1)/\u03b1 = 1 \u2212 1/\u03b1,\nwhich is strictly increasing in \u03b1. Observe that as \u03b3 is strictly increasing on A, its inverse is strictly\nincreasing on I(cid:48). Thus V (r, p1)/V (r, p0) = F (\u03b3\u22121(r)) = 1 \u2212 1/\u03b3\u22121(r) is strictly increasing on\nI(cid:48), as desired.\nAs we have now established that F satis\ufb01es Conditions 1-3, Proposition 2 gives us a weight function\n\u03bb : R \u2192 R>0 such that for all y \u2208 Y, the map r (cid:55)\u2192 \u03bb(r)V (r, y) is strictly increasing on \u02daR. Thus,\n\u03bb(r(cid:48))V (r(cid:48), y)dr(cid:48) is convex in r for each y \u2208 Y, as noted by\n\n\ufb01xing r0 \u2208 \u02daR, the loss L(r, y) =(cid:82) r\n\nRockafellar [19, Theorem 24.2]. Moreover, as \u03bb > 0, L elicits \u0393 by Lambert [12, Theorem 6].\nWhile we defer discussion of future work to \u00a7 6, it is worth noting here that the argument establishing\nCondition 3 immediately extends to in\ufb01nite outcome spaces. Beginning with p0, p1 being arbitrary\ndistributions, \u0393(p0) (cid:54)= \u0393(p1), one simply observes that V (\u03b3(\u03b1), p0)/V (\u03b3(\u03b1), p1) = 1\u2212 1/\u03b1 by the\nsame logic. The central challenge to such an extension therefore lies in the proof of Proposition 2.\nLoosely speaking, when combining Theorem 2 with the existing literature, we conclude that every\n\u201cnice\u201d elicitable property is additionally convex elicitable. We formalize this in two corollaries, one\nstated as an implication, and the other given in the style of Steinwart et al. [22, Cor. 9].\nCorollary 1. Let \u0393 : P \u2192 R be continuous, nowhere-locally-constant, and elicited by a loss\nL with bounded and continuous \ufb01rst and second derivatives. Suppose also, for all p \u2208 P, that\n\nr0\n\nEY \u223cpL(r, Y ) = 0 for at most one r \u2208 \u02daR. Then \u0393 is convex elicitable.\n\nd\ndr\n\nProof. As L(r, y) elicits \u0393 and is differentiable, for all r \u2208 \u02daR, and all p with \u0393(p) = r, we must\nEY \u223cpL(r, Y ) = 0. By our assumption on the critical points of L, we see that V (r, \u03b4y) =\nhave d\ndr\ndr L(r, \u03b4y) is a bounded and oriented identi\ufb01cation function for \u0393, and is continuously differentiable\nd\nwith bounded derivative. Thus, F = {V (r, \u03b4y)}y\u2208Y satis\ufb01es Condition 1, and the result follows\nfrom Theorem 2.\nCorollary 2. Let P = \u2206(Y) be the probability simplex over n outcomes, and let \u0393 : P \u2192 R\nbe a nowhere-locally-constant property with a bounded and nowhere vanishing \ufb01rst derivative, a\nbounded second derivative, and a differentiable right inverse.1 Then the following are equivalent:\n1. For all r \u2208 R, the level set {p : \u0393(p) = r} is convex.\n2. \u0393 is quasi-monotonic.\n3. \u0393 is identi\ufb01able and has a bounded and oriented identi\ufb01cation function.\n4. \u0393 is elicitable.\n5. There exists a non-negative, measurable, locally Lipschitz continuous loss function eliciting \u0393.\n6. \u0393 is convex elicitable.\n\nProof. We essentially reduce to a similar result of Steinwart et al. [22, Corollary 9]. First, note\nthat the de\ufb01nition of nowhere-locally-constant from Lambert et al. [14] coincides with the de\ufb01nition\nvi \u2264 1} so that the derivatives are well de\ufb01ned. In the\n\n1We may identify P with {v \u2208 R|Y|\u22121\n\n:(cid:80)|Y|\u22121\n\nproof, for ease of notation, we will still write dot products in R|Y|.\n\n+\n\ni=1\n\n5\n\n\fof Steinwart et al. [22, De\ufb01nition 4] in \ufb01nite dimensions. Second, as our assumptions are stronger\nthan theirs, the equivalence of the \ufb01rst \ufb01ve conditions follows. As convex elicitability implies convex\nlevel sets (by the same argument of Lambert [12, Theorem 5], which follows even if L can be in\ufb01nite\non the boundary of R), it then suf\ufb01ces to show that identi\ufb01ability implies convex elicitability.\nBy standard arguments, the convexity of the level sets {p : \u0393(p) = r} for r \u2208 \u02daR imply that each\nlevel set must be a hyperplane intersected with P. (See e.g. Theorem 1 of [14].) Letting \u02c6p be the\nright inverse of \u0393, so that \u0393(\u02c6p(r)) = r for all r \u2208 R, we may de\ufb01ne\nV (r, y) = \u2207 \u02c6p(r)\u0393 \u00b7 (\u03b4y \u2212 \u02c6p(r)) ,\n\n(3)\n\na form taken from Frongillo and Kash [8, Proposition 18].\nNow for any p with \u0393(p) = r, as the level set is a hyperplane intersected with P, we must have\n\u0393(\u03b1p + (1 \u2212 \u03b1)\u02c6p(r)) = r, and we conclude \u2207 \u02c6p(r)\u0393 \u00b7 (p \u2212 \u02c6p(r)) = 0. (Simply take the derivative\nwith respect to \u03b1.) Thus, as \u2207\u0393 (cid:54)= 0, the vector \u2207 \u02c6p(r)\u0393 \u2212 \u2207 \u02c6p(r)\u0393 \u00b7 \u02c6p(r)1 de\ufb01nes the same hy-\nperplane as {p : \u0393(p) = r}, and thus V identi\ufb01es \u0393. (Here 1 \u2208 R|Y| denotes the all-ones vector.)\nThat V is also bounded and oriented follows easily from our assumptions. As V has a bounded\nderivative everywhere by assumption, it satis\ufb01es Condition 1, and convex elicitability then follows\nfrom Theorem 2.\n\n4 Proof Sketch and Intuition\n\nWe now give a sketch of the construction of the weight function \u03bb in Proposition 2. See the Appendix\nfor the full proof. For the purposes of this section, let us simplify our three conditions as follows:\nCondition 1\u2019. Every f \u2208 F is continuously differentiable.\nCondition 2\u2019. Each f \u2208 F has a single zero, and moves from negative to positive.\nCondition 3\u2019. When f > 0 > g, the ratio g/f is increasing.\n\nTwo function case. To begin, let us consider two functions satisfying Conditions 1\u2019, 2\u2019, and 3\u2019, such\nthat f > 0 > g on the interval I. We wish to \ufb01nd some \u03bb : I \u2192 R>0 making both \u03bbf and \u03bbg\nstrictly increasing. By Condition 3\u2019, we know g/f is increasing on \u02daI. Let us choose \u03bb as follows,\n(4)\nAs \u2212(f g)(r) > 0 for all r \u2208 I, we have \u03bb(r) > 0 as well. Moreover, one easily checks that\n\n\u03bbf =(cid:112)\u2212f /g and \u03bbg =(cid:112)\u2212g/f, which are both increasing as monotonic transformations of g/f.\n\n\u03bb(r) := (\u2212f (r)g(r))\u22121/2 .\n\nGeneral case. More generally, we wish to \ufb01nd a \u03bb such that for all x \u2208 \u02daR, d\nWhen f > 0, this constraint is equivalent to d\n\u2212 d\ndx log f (x). Similarly, if f (x) < 0, then we need \u2212 d\nFinally, the case f (x) = 0 follows easily from Condition 2\u2019, as d\nthese constraints, we see that for all f > 0 and all g < 0, we must have\n\ndx (\u03bbf )(x) > 0.\ndx log(\u03bbf )(x) > 0, which is in turn equivalent to\ndx log(\u2212f (x)).\ndx f (x) > 0 and \u03bb > 0. Combining\n\ndx log \u03bb(x) > d\n\ndx log \u03bb(x) < d\n\nd\n\ndx log(\u2212g(x)) < \u2212 d\n\ndx log \u03bb(x) < d\ndx log f (x) .\ndx log(\u2212g(x)) < d\nIn order for these constraints to be feasible, we must have d\nf < 0 < g, which is seen to be equivalent to Condition 3\u2019 after some manipulation.\nPerhaps the most natural way to satisfy constraint (5) is to simply take the midpoint between the\nmaximum lower bound m : R \u2192 R and minimum upper bound m : R \u2192 R de\ufb01ned as follows:\n\n(5)\ndx log f (x) for all\n\nm(x) := \u2212\n\nsup\n\ng\u2208F :g(x)<0\n\nd\n\ndx log(\u2212g(x))\n\nm(x) := \u2212\n\ninf\n\nf\u2208F :f (x)>0\n\nd\ndx log(f (x))\n\nThis yields the following construction (where r0 \u2208 \u02daR is arbitrary),\n\n(cid:19)\n\n(cid:18)(cid:90) x\n\nr0\n\nh(x) =\n\n1\n2\n\n(m(x) + m(x)) ,\n\n\u03bb(x) = exp\n\nh(z)dz\n\n,\n\n(6)\n\nwhere one notes h(x) = d\ndx log \u03bb(x). Provided our three conditions hold, we now have a positive\nweight function \u03bb satisfying the constraint (5), and we conclude that \u03bbf is increasing for all f \u2208 F.\n\n6\n\n\fLet us observe that our general construction in eq. (6) really is a generalization of the two-function\ncase in eq. (4). That is, we are primarily concerned with the \u201cmost decreasing\u201d and \u201cleast increasing\u201d\nfunctions, which allows us to focus on two functions instead of the entire set F. When we only have\ntwo functions f > 0 > g, eq. (6) reduces to h(x) = \u2212 1\n\ndx log f (x)(cid:1), whence\n\n(cid:0) d\ndx log(\u2212g(x)) + d\n\n2 log(\u2212g(x)f (x))(cid:1) = 1/(cid:112)\u2212g(x)f (x).\n\n\u03bb(x) = exp(cid:0) 1\n\n2\n\nHurdles and technicalities. As stated, the above construction has two issues, which we now\nbrie\ufb02y identify and describe how our proof circumvents. First, in general our functions f will\npass through 0, possibly making h and therefore \u03bb unbounded. Recall that we only needed to satisfy\neq. (5), and thus rather than taking the midpoint of the lower and upper bounds as in eq. (6), which\nwill diverge whenever one of the bounds diverges, we can always choose h in a slightly more clever\nmanner to be closer to the smallest magnitude bound. See the Appendix for one such construction.\nThe second problem is that our actual Condition 1 allows for nondifferentiability, which arises in\nsettings of particular interest, like Proposition 1. Fortunately, in the \ufb01nite-outcome setting, it is\nessentially without loss of generality to consider continuous f \u2208 F (see Theorem 1). We can\ntherefore address the \ufb01nite nondifferentiabilities using continuity arguments, allowing us to focus\non the set Ic \u2286 I where every f \u2208 F is continuously differentiable.\n\n5 Examples\n\nTo illustrate the constructive nature of Theorem 2, we now give two examples. The \ufb01rst is the Beta\nfamily scoring rule found in Buja et al. [6, \u00a711] and Gneiting and Raftery [11, \u00a73], which we use\nto illustrate the construction itself. The second is a simple elicitable property for which the obvious\nidenti\ufb01cation function does not give a convex loss; we show how to convexify it.\n1. Beta families. Consider the Beta family of loss functions discussed in Buja et al. [6], which\nelicit the mean over outcomes Y = {0, 1}, with R = [0, 1]. After some manipulation, one can write\nthe loss and identi\ufb01cation function as follows, for any \u03b1, \u03b2 > \u22121,\n\nL(r, y) =\n\nz\u03b1\u22121(1 \u2212 z)\u03b2\u22121(z \u2212 y)dz\n\nV (r, y) = r\u03b1\u22121(1 \u2212 r)\u03b2\u22121(r \u2212 y) .\n\n(cid:90) r\n\n0\n\nWhile some choices of the parameters yield convex losses, such as \u03b1 = \u03b2 = 0 (log loss) and\n\u03b1 = \u03b2 = 1 (squared loss), not all do, e.g. \u03b1 = 1/5, \u03b2 = \u22121/2.\nApplying the two-function construction from Section 4, we choose \u03bb(r) = r1/2\u2212\u03b1(1 \u2212 r)1/2\u2212\u03b2,\ngiving the identi\ufb01cation function V (cid:48)(r, y) = r1/2(1\u2212 r)1/2(r\u2212 y), which is itself in the Beta family\nwith \u03b1 = \u03b2 = 1/2. Intergrating V (cid:48) yields the following convex loss,\n\nz1/2(1 \u2212 z)1/2(z \u2212 y)dz = arcsin((cid:112)|y \u2212 r|) \u2212(cid:112)r(1 \u2212 r) ,\n\nL(cid:48)(r, y) =\n\n(cid:90) r\n\n(7)\n\n0\n\n2\n\n2p2\n\n1\u2212\n\n\u221a\n\n1\u22124p1p2+2p2\n\nalso discovered by Buja et al., which serves as a intermediary between log and squared loss.\n2. A quadratic property. Let Y = {1, 2, 3}, and \u0393(p) =\n, where \u0393(p) = p1\nwhen p2 = 0 for continuity (from L\u2019H\u02c6opital\u2019s rule). Here, py denotes the probability outcome y is\nobserved. Some of the level sets of \u0393 can be seen in Figure 2. A very natural choice of identi\ufb01cation\n2 + r \u2212 r2, V (r, 3) = r, as one readily veri\ufb01es. Yet\nfunction for \u0393 is V (r, 1) = r \u2212 1, V (r, 2) = 1\nwe see in Figure 1(b) that V (\u00b7, 2) is not strictly increasing, so the loss given by integrating V will\nnot be convex.\nThe set F = {V (\u00b7, y)}y\u2208Y satis\ufb01es Conditions 1\u20133, however, and thus we may use our construction\n\u03bb(x)V (x, y)dx elicits \u0393 and is convex in\nr. Unfortunately, for this particular example, the construction given in the proof of Proposition 2\nproduces a somewhat unwieldy function \u03bb. Fortunately, while our constructed \u03bb is guaranteed to\nmake \u03bbf monotone for every function f in F, it is generally not unique, and in many cases a\nsimpler choice of \u03bb can be found. In particular, our proof shows that any function h satisfying the\ncriteria laid out in Claim 1 of the Appendix will lead to suitable choice of \u03bb; among these criteria\nare that h(r) = \u2212 d\ndr log \u03bb(r) must lie between m(r) and m(r) for all r. For practical purposes,\n\nto obtain a positive function \u03bb for which L(r, y) = (cid:82) r\n\nr0\n\n7\n\n\f(a) V (\u00b7, 1)\n\n(b) V (\u00b7, 2)\n\n(c) V (\u00b7, 3)\n\n(d) \u03bb(\u00b7)V (\u00b7, 1)\n\n(e) \u03bb(\u00b7)V (\u00b7, 2)\n\n(f) \u03bb(\u00b7)V (\u00b7, 3)\n\nFigure 1: The functions V (\u00b7, y) are not always increasing for all y \u2208 Y, but our function \u03bb \u201cmono-\ntonizes\u201d them, as shown in (d)\u2013(f).\n\nFigure 3: m(\u00b7) in solid blue, m(\u00b7) in orange,\nvalid h(\u00b7) in green and dashed blue.\n\nFigure 2: Level sets of \u0393\n\ntherefore, we may use the following general technique in lieu of the construction given in the proof\nof Proposition 2:\n\n1. Compute the bounds m(r) and m(r).\n2. Search over some class of practical (e.g. linear) functions for an h which satis\ufb01es the criteria\n\nof Claim 1.\n\nWe illustrate this more practical construction in Figure 3; for the case of our quadratic property, the\nchoice h(x) = 4x \u2212 1 (shown as dashed blue) suf\ufb01ces, giving us the simpler \u03bb(r) = exp(2x2 \u2212 x).\nThis choice of \u03bb gives\n\n\u03bb(r)V (r, 1) = exp(2r2 \u2212 r)(r \u2212 1)\n\u03bb(r)V (r, 2) = exp(2r2 \u2212 r)((1/2) + r \u2212 r2)\n\u03bb(r)V (r, 3) = r exp(2r2 \u2212 r) ,\n\nwhich we can integrate to obtain a convex loss.\n\n8\n\n0.20.40.60.81.0-1.0-0.8-0.6-0.4-0.20.20.40.60.81.00.550.600.650.700.750.20.40.60.81.00.20.40.60.81.00.20.40.60.81.0r-1.0-0.8-0.6-0.4-0.20.20.40.60.81.0r0.60.81.01.21.40.20.40.60.81.0r0.51.01.52.02.53.00.00.20.40.60.81.00.00.20.40.60.81.0Pr(Outcome1)Pr(Outcome2)0.20.40.60.81.0-202468\f6 Conclusion and future work\n\nWe have shown that all real-valued properties over \ufb01nite outcomes, which are identi\ufb01ed by a mostly-\nsmooth continuous identi\ufb01cation function, are convex elicitable. Beyond natural relevance to ma-\nchine learning, and statistical estimation more broadly, these results bring insights into the area\nof information elicitation. For example, a generalization of a common prediction market frame-\nwork, the Scoring Rule Market, is well-de\ufb01ned for any loss function [10, 14]. Yet it is not clear\nwhether practical markets exist for any elicitable property. Among the practical considerations are\naxioms such as Tractable Trading (TT), which states that participants can compute their optimal\ntrade/action under a budget [2], and Bounded Trader Budget (BTB), which states that traders with\narbitrarily small budgets can still fruitfully participate in the market [10]. Our results imply that\nessentially every continuous real-valued elicitable property over \ufb01nite outcomes has a market mech-\nanism which satis\ufb01es these axioms. There are likely also implications for wagering mechanisms [13]\nand forecasting competitions [25], among other settings in information elicitation.\nThere are several avenues for future work, which we outline below.\nRelaxing our conditions. We believe one could allow V to be smooth almost everywhere. One\nmay still be able to use the fact that g/f is strictly increasing to have an almost-everywhere de\ufb01ned\nderivative, but again, there are several challenges to this approach.\nIn\ufb01nite outcomes. A challenging but important extension would be to allow in\ufb01nite Y, for ex-\nample, Y = [0, 1] \u2286 R. As discussed following Theorem 2, many pieces of our argument extend\nimmediately, such as the argument establishing Condition 3. We believe the key hurdle to such an\nextension will be in Proposition 2, as several quantities become harder to control. As one example,\nthe de\ufb01nition of h in eq. (6) involves a maximum and minimum which may not be obtained. Extend-\ning to in\ufb01nite outcomes requires the relaxation of our continuity assumption, as many properties of\ninterest have discontinuous identi\ufb01cation functions in the in\ufb01nite-outcome space, like the median.\nStrongly convex losses. Just as convex loss functions are useful so that empirical risk minimization\nis a tractable problem, strongly convex losses are even more tractable. Roughly speaking (ignoring\nthe log transformation), if the gap in eq. (5) is bounded away from zero, \u03bbf will be increasing at\nleast as fast as some linear function for all f, meaning its integral will be strongly convex. It is not\nclear what meaningful conditions on \u0393 suf\ufb01ce for this to hold, however, and a full characterization\nis far from clear. Similarly, characterizations for exp-concave losses would also be interesting.\nVector-valued properties.\nFinally, we would like to extend our construction to vector-valued\nproperties \u0393 : P \u2192 Rk. In light of our results, this question is only interesting for properties which\nare not vectors of elicitable properties: if the k components of \u0393 are themselves elicitable, we may\nconstruct a convex loss for each, and the sum will be a convex loss eliciting \u0393. Unfortunately, we\nlack a characterization of elicitable vector-valued properties, so the question of whether all elicitable\nvector-valued properties are convex elicitable seems even further from reach.\nAcknowledgements. We would like to thank Bo Waggoner and Arpit Agarwal for their insights\nand the discussion which led to this project, and we thank Krisztina Dearborn for consultation on\nanalysis results. Additionally, we would like to thank our reviewers for their feedback and sugges-\ntions. This project was funded by National Science Foundation Grant CCF-1657598.\n\nReferences\n[1] S. Abbott. Understanding analysis. Springer, 2001.\n\n[2] J. D. Abernethy and R. M. Frongillo. A collaborative mechanism for crowdsourcing prediction problems.\n\nIn Advances in Neural Information Processing Systems 24, pages 2600\u20132608, 2011.\n\n[3] A. Agarwal and S. Agarwal. On consistent surrogate risk minimization and property elicitation. In JMLR\n\nWorkshop and Conference Proceedings, volume 40, pages 1\u201319, 2015.\n\n[4] T. M. Apostol. Mathematical analysis. 1974.\n\n[5] P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe. Convexity, classi\ufb01cation, and risk bounds. Journal of the\n\nAmerican Statistical Association, 101(473):138\u2013156, 2006.\n\n[6] A. Buja, W. Stuetzle, and Y. Shen. Loss functions for binary class probability estimation and classi\ufb01cation:\n\nStructure and applications. 2005.\n\n9\n\n\f[7] T. Fissler, J. F. Ziegel, and others. Higher order elicitability and Osband\u2019s principle. The Annals of\n\nStatistics, 44(4):1680\u20131707, 2016.\n\n[8] R. Frongillo and I. Kash. Vector-Valued Property Elicitation. In Proceedings of the 28th Conference on\n\nLearning Theory, pages 1\u201318, 2015.\n\n[9] R. Frongillo and I. A. Kash. On Elicitation Complexity. In Advances in Neural Information Processing\n\nSystems 29, 2015.\n\n[10] R. Frongillo and B. Waggoner. An Axiomatic Study of Scoring Rule Markets. Preprint, 2017.\n\n[11] T. Gneiting and A. E. Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the\n\nAmerican Statistical Association, 102(477):359\u2013378, 2007.\n\n[12] N. S. Lambert. Elicitation and evaluation of statistical forecasts. 2018.\n\n[13] N. S. Lambert, J. Langford, J. W. Vaughan, Y. Chen, D. M. Reeves, Y. Shoham, and D. M. Pennock. An\naxiomatic characterization of wagering mechanisms. Journal of Economic Theory, 156:389\u2013416, 2015.\n\n[14] N. S. Lambert, D. M. Pennock, and Y. Shoham. Eliciting properties of probability distributions.\n\nProceedings of the 9th ACM Conference on Electronic Commerce, pages 129\u2013138, 2008.\n\nIn\n\n[15] N. S. Lambert and Y. Shoham. Eliciting truthful answers to multiple-choice questions. In Proceedings of\n\nthe 10th ACM conference on Electronic commerce, pages 109\u2013118, 2009.\n\n[16] K. H. Osband. Providing Incentives for Better Cost Forecasting. University of California, Berkeley, 1985.\n\n[17] R. L. Pouso. A simple proof of the fundamental theorem of calculus for the lebesgue integral. arXiv\n\npreprint arXiv:1203.1462, 2012.\n\n[18] M. Reid and R. Williamson. Composite binary losses. The Journal of Machine Learning Research,\n\n9999:2387\u20132422, 2010.\n\n[19] R. Rockafellar. Convex analysis, volume 28 of Princeton Mathematics Series. Princeton University Press,\n\n1997.\n\n[20] L. Savage. Elicitation of personal probabilities and expectations. Journal of the American Statistical\n\nAssociation, pages 783\u2013801, 1971.\n\n[21] I. Steinwart and A. Christmann. Support Vector Machines. Springer Science & Business Media, Sept.\n\n2008. Google-Books-ID: HUnqnrpYt4IC.\n\n[22] I. Steinwart, C. Pasin, R. Williamson, and S. Zhang. Elicitation and Identi\ufb01cation of Properties.\n\nProceedings of The 27th Conference on Learning Theory, pages 482\u2013526, 2014.\n\nIn\n\n[23] A. Tewari and P. L. Bartlett. On the consistency of multiclass classi\ufb01cation methods. The Journal of\n\nMachine Learning Research, 8:1007\u20131025, 2007.\n\n[24] R. C. Williamson, E. Vernet, and M. D. Reid. Composite multiclass losses. Journal of Machine Learning\n\nResearch, 17(223):1\u201352, 2016.\n\n[25] J. Witkowski, R. Freeman, J. W. Vaughan, D. M. Pennock, and A. Krause. Incentive-compatible fore-\nIn Proceedings of the Thirty-Second AAAI Conference on Arti\ufb01cial Intelligence\n\ncasting competitions.\n(AAAI-18), 2018.\n\n10\n\n\f", "award": [], "sourceid": 6666, "authors": [{"given_name": "Jessica", "family_name": "Finocchiaro", "institution": "University of Colorado Boulder"}, {"given_name": "Rafael", "family_name": "Frongillo", "institution": "CU Boulder"}]}