{"title": "Generalized Inverse Optimization through Online Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 86, "page_last": 95, "abstract": "Inverse optimization is a powerful paradigm for learning preferences and restrictions that explain the behavior of a decision maker, based on a set of external signal and the corresponding decision pairs. However, most inverse optimization algorithms are designed specifically in batch setting, where all the data is available in advance. As a consequence, there has been rare use of these methods in an online setting suitable for real-time applications. In this paper, we propose a general framework for inverse optimization through online learning. Specifically, we develop an online learning algorithm that uses an implicit update rule which can handle noisy data. Moreover, under additional regularity assumptions in terms of the data and the model, we prove that our algorithm converges at a rate of $\\mathcal{O}(1/\\sqrt{T})$ and is statistically consistent. In our experiments, we show the online learning approach can learn the parameters with great accuracy and is very robust to noises, and achieves a dramatic improvement in computational efficacy over the batch learning approach.", "full_text": "Generalized Inverse Optimization through Online\n\nLearning\n\nDepartment of Industrial Engineering\n\nDepartment of Electrical and Computer Engineering\n\nChaosheng Dong\n\nUniversity of Pittsburgh\nchaosheng@pitt.edu\n\nYiran Chen\n\nDuke University\n\nyiran.chen@duke.edu\n\nBo Zeng\n\nDepartment of Industrial Engineering\n\nUniversity of Pittsburgh\n\nbzeng@pitt.edu\n\nAbstract\n\nInverse optimization is a powerful paradigm for learning preferences and restric-\ntions that explain the behavior of a decision maker, based on a set of external\nsignal and the corresponding decision pairs. However, most inverse optimization\nalgorithms are designed speci\ufb01cally in batch setting, where all the data is avail-\nable in advance. As a consequence, there has been rare use of these methods in\nan online setting suitable for real-time applications.\nIn this paper, we propose\na general framework for inverse optimization through online learning. Speci\ufb01-\ncally, we develop an online learning algorithm that uses an implicit update rule\nwhich can handle noisy data. Moreover, under additional regularity assumptions\n\u221a\nin terms of the data and the model, we prove that our algorithm converges at a\nrate of O(1/\nT ) and is statistically consistent. In our experiments, we show the\nonline learning approach can learn the parameters with great accuracy and is very\nrobust to noises, and achieves a dramatic improvement in computational ef\ufb01cacy\nover the batch learning approach.\n\n1\n\nIntroduction\n\nPossessing the ability to elicit customers\u2019 preferences and restrictions (PR) is crucial to the success\nfor an organization in designing and providing services or products. Nevertheless, as in most sce-\nnarios, one can only observe their decisions or behaviors corresponding to external signals, while\ncannot directly access their decision making schemes. Indeed, decision makers probably do not have\nexact information regarding their own decision making process [1]. To bridge that discrepancy, in-\nverse optimization has been proposed and received signi\ufb01cant research attention, which is to infer or\nlearn the missing information of the underlying decision models from observed data, assuming that\nhuman decision makers are rationally making decisions [2, 3, 4, 5, 1, 6, 7, 8, 9, 10, 11]. Nowadays,\nextending from its initial form that only considers a single observation [2, 3, 4, 5] with clean data,\ninverse optimization has been further developed and applied to handle more realistic cases that have\nmany observations with noisy data [1, 6, 7, 9, 10, 11].\nDespite of these remarkable achievements, traditional inverse optimization (typically in batch set-\nting) has not proven fully applicable for supporting recent attempts in AI to automate the elicitation\nof human decision maker\u2019s PR in real time. Consider, for example, recommender systems (RSs)\nused by online retailers to increase product sales. The RSs \ufb01rst elicit one customer\u2019s PR from the\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fFigure 1: An overview of inverse optimization through batch learning versus through online learn-\ning. Left: Framework of inverse optimization in batch setting. Right: Framework of the generalized\ninverse optimization in online setting proposed in our paper.\n\nhistorical sequence of her purchasing behaviors, and then make predictions about her future shop-\nping actions. Indeed, building RSs for online retailers is challenging because of the sparsity issue.\nGiven the large amount of products available, customer\u2019s shopping vector, each element of which\nrepresents the quantity of one product purchased, is highly sparse. Moreover, the shift of the cus-\ntomer\u2019s shopping behavior along with the external signal (e.g. price, season) aggravates the sparsity\nissue. Therefore, it is particularly important for RSs to have access to large data sets to perform\naccurate elicitation [12]. Considering the complexity of the inverse optimization problem (IOP), it\nwill be extremely dif\ufb01cult and time consuming to extract user\u2019s PR from large, noisy data sets using\nconventional techniques. Thus, incorporating traditional inverse optimization into RSs is impractical\nfor real time elicitation of user\u2019s PR.\nTo automate the elicitation of human decision maker\u2019s PR, we aim to unlock the potential of inverse\noptimization through online learning in this paper. Speci\ufb01cally, we formulate such learning prob-\nlem as an IOP considering noisy data, and develop an online learning algorithm to derive unknown\nparameters occurring in either the objective function or constraints. At the heart of our algorithm\nis taking inverse optimization with a single observation as a subroutine to de\ufb01ne an implicit update\nrule. Through such an implicit rule, our algorithm can rapidly incorporate sequentially arrived obser-\nvations into this model, without keeping them in memory. Indeed, we provide a general mechanism\nfor the incremental elicitation, revision and reuse of the inference about decision maker\u2019s PR.\nRelated work Our work is most related to the subject of inverse optimization with multiple obser-\nvations. The goal is to \ufb01nd an objective function or constraints that explains the observations well.\nThis subject actually carries the data-driven concept and becomes more applicable as large amounts\nof data are generated and become readily available, especially those from digital devices and online\ntransactions. Solution methods in batch setting for such type of IOP include convex optimization\napproach [1, 13, 10] and non-convex optimization approach [7]. The former approach often yields\nincorrect inferences of the parameters [7] while the later approach is known to lead to intractable\nprograms to solve [10]. In contrast, we do inverse optimization in online setting, and the proposed\nonline learning algorithm signi\ufb01cantly accelerate the learning process with performance guarantees,\nallowing us to deal with more realistic and complex PR elicitation problems.\nAlso related to our work is [6], which develops an online learning method to infer the utility func-\ntion from sequentially arrived observations. They prove a different regret bound for that method\nunder certain conditions, and demonstrate its applicability to handle both continuous and discrete\ndecisions. However, their approach is only possible when the utility function is linear and the data\nis assumed to be noiseless. Differently, our approach does not make any such assumption and only\nrequires the convexity of the underlying decision making problem. Besides the regret bound, we\nalso show the statistical consistency of our algorithm by applying both the consistency result proven\nin [7] and the regret bound provided in this paper, which guarantees that our algorithm will asymp-\ntotically achieves the best prediction error permitted by the inverse model we consider.\nOur contributions To the best of authors\u2019 knowledge, we propose the \ufb01rst general framework for\neliciting decision maker\u2019s PR using inverse optimization through online learning. This framework\ncan learn general convex utility functions and constraints with observed (signal, noisy decision)\npairs. In Figure 1, we provide the comparison of inverse optimization through batch learning ver-\nsus through online learning. Moreover, we prove that the online learning algorithm, which adopts\n\n2\n\n\f\u221a\n\nan implicit update rule, has a O(\nT ) regret under certain regularity conditions. In addition, this\nalgorithm is statistically consistent when the data satis\ufb01es some rather common conditions, which\nguarantees that our algorithm will asymptotically achieves the best prediction error permitted by the\ninverse model we consider. Numerical results show that our algorithm can learn the parameters with\ngreat accuracy, is robust to noises even if some assumptions do not hold, and achieves a dramatic\nimprovement over the batch learning approach on computational ef\ufb01cacy.\n\n2 Problem setting\n\n2.1 Decision making problem\nWe consider a family of parameterized decision making problems, in which x \u2208 Rn is the decision\nvariable, u \u2208 U \u2286 Rm is the external signal, and \u03b8 \u2208 \u0398 \u2286 Rp is the parameter.\n\nDMP\nwhere f : Rn \u00d7 Rm \u00d7 Rp (cid:55)\u2192 R is a real-valued function, and g : Rn \u00d7 Rm \u00d7 Rp (cid:55)\u2192 Rq is a\nvector-valued function. We denote X(u, \u03b8) = {x \u2208 Rn : g(x, u, \u03b8) \u2264 0} the feasible region of\nDMP. We let S(u, \u03b8) = arg min{f (x, u, \u03b8) : x \u2208 X(u, \u03b8)} be the optimal solution set of DMP.\n\nf (x, u, \u03b8)\n\nmin\nx\u2208Rn\ns.t. g(x, u, \u03b8) \u2264 0,\n\nInverse optimization and online setting\n\n2.2\nConsider a learner who monitors the signal u \u2208 U and the decision maker\u2019 decision x \u2208 X(u, \u03b8)\nin response to u. We assume that the learner does not know the decision maker\u2019s utility function or\nconstraints in DMP. Since the observed decision might carry measurement error or is generated with\na bounded rationality of the decision maker, i.e., being suboptimal, we denote y the observed noisy\ndecision for u \u2208 U. Note that y does not necessarily belong to X(u, \u03b8), i.e., it might be infeasible\nwith respect to X(u, \u03b8). Throughout the paper, we assume that the (signal,noisy decision) pair (u, y)\nis distributed according to some unknown distribution P supported on {(u, y) : u \u2208 U, y \u2208 Y}.\nIn our inverse optimization model, the learner aims to learn the decision maker\u2019s objective function\nor constraints from (signal, noisy decision) pairs. More precisely, the goal of the learner is to es-\ntimate the parameter \u03b8 of the DMP. In our online setting, the (signal, noisy decision) pair become\navailable to the learner one by one. Hence, the learning algorithm produces a sequence of hypothe-\nses (\u03b81, . . . , \u03b8T +1). Here, T is the total number of rounds, and \u03b81 is an arbitrary initial hypothesis\nand \u03b8t for t \u2265 2 is the hypothesis chosen after observing the (t \u2212 1)th (signal,noisy decision) pair.\nLet l(yt, ut, \u03b8t) denote the loss the learning algorithm suffers when it tries to predict the tth decision\ngiven ut based on {(u1, y1),\u00b7\u00b7\u00b7 , (ut\u22121, yt\u22121)}. The goal of the learner is to minimize the regret,\nt\u2208[T ] l(yt, ut, \u03b8t) against the possible loss when the whole batch of\n\nwhich is the cumulative loss(cid:80)\n\n(signal,noisy decision) pairs are available. Formally, the regret is de\ufb01ned as\n\n(cid:88)\n\nt\u2208[T ]\n\nRT =\n\nl(yt, ut, \u03b8t) \u2212 min\n\u03b8\u2208\u0398\n\n(cid:88)\n\nt\u2208[T ]\n\nl(yt, ut, \u03b8).\n\nIn the following, we make a few assumptions to simplify our understanding, which are actually mild\nand frequently appear in the inverse optimization literature [1, 13, 10, 7].\nAssumption 2.1. Set \u0398 is a convex compact set. There exists D > 0 such that (cid:107)\u03b8(cid:107)2 \u2264 D for all\n\u03b8 \u2208 \u0398. In addition, for each u \u2208 U, \u03b8 \u2208 \u0398, both f (x, u, \u03b8) and g(x, u, \u03b8) are convex in x.\n\n3 Learning the parameters\n\n3.1 The loss function\n\nDifferent loss functions that capture the mismatch between predictions and observations have been\nused in the inverse optimization literature. In particular, the (squared) distance between the observed\ndecision and the predicted decision enjoys a direct physical meaning, and thus is most widely used\n[14, 15, 16, 7]. Hence, we take the (squared) distance as our loss function in this paper.In batch\n\n3\n\n\fsetting, statistical properties of inverse optimization with such a loss function have been analyzed\nextensively in [7]. In this paper, we focus on exploring the performance of the online setting.\nGiven a (signal,noisy decision) pair (u, y) and a hypothesis \u03b8, we de\ufb01ne the following loss function\nas the minimum (squared) distance between y and the optimal solution set S(u, \u03b8).\n\nl(y, u, \u03b8) = min\n\nx\u2208S(u,\u03b8)\n\n(cid:107)y \u2212 x(cid:107)2\n2.\n\nLoss Function\n\n3.2 Online implicit updates\n\nOnce receiving the tth (signal,noisy decision) pair (ut, yt), \u03b8t+1 can be obtained by solving the\nfollowing optimization problem:\n\n\u03b8t+1 = arg min\n\u03b8\u2208\u0398\n\n1\n\n2(cid:107)\u03b8 \u2212 \u03b8t(cid:107)2\n\n2 + \u03b7tl(yt, ut, \u03b8),\n\n(1)\n\nwhere \u03b7t is the learning rate in round t, and l(yt, ut, \u03b8) is de\ufb01ned in (Loss Function).\nThe updating rule (1) seeks to balance the tradeoff between \"conservativeness\" and correctiveness\",\nwhere the \ufb01rst term characterizes how conservative we are to maintain the current estimation, and\nthe second term indicates how corrective we would like to modify with the new estimation. As there\nis no closed form for \u03b8t+1 in general, we call (1) an implicit update rule [17, 18].\nTo solve (1), we can replace x \u2208 S(u, \u03b8) by KKT conditions (or other optimality conditions) of\nthe DMP, and get a mixed integer nonlinear program. Consider, for example, a decision making\nproblem that is a quadratic optimization problem. Namely, the DMP has the following form:\n\n1\n\n2 xT Qx + cT x\n\nmin\nx\u2208Rn\ns.t. Ax \u2265 b.\n\nQP\n\nSuppose that b changes over time t. That is, b is the external signal for QP and equals to bt at time\nt. If we seek to learn c, the optimal solution set for QP can be characterized by KKT conditions as\n+ , uT (Ax \u2212 bt) = 0, Qx + c \u2212 AT u = 0}. Here, u is the dual\nS(bt) = {x : Ax \u2265 bt, u \u2208 Rm\nvariable for the constraints. Then, the single level reformulation of the update rule by solving (1) is\n2(cid:107)c \u2212 ct(cid:107)2\nmin\nc\u2208\u0398\ns.t. Ax \u2265 bt,\nu \u2264 M z,\nAx \u2212 bt \u2264 M (1 \u2212 z),\nQx + c \u2212 AT u = 0,\nc \u2208 Rm, x \u2208 Rn, u \u2208 Rm\n\n2 + \u03b7t(cid:107)yt \u2212 x(cid:107)2\n\n+ , z \u2208 {0, 1}m,\n\nIQP\n\n1\n\n2\n\nwhere z is the binary variable used to linearize KKT conditions, and M is an appropriate number\nused to bound the dual variable u and Ax \u2212 bt. Clearly, IQP is a mixed integer second order conic\nprogram (MISOCP). More examples are given in the supplementary material.\nOur application of the implicit updates to learn the parameter of DMP proceeds in Algorithm 1.\nRemark 3.1. (i) In Algorithm 1, we let \u03b8t+1 = \u03b8t if the prediction error l(yt, ut, \u03b8t) is zero. But in\npractice, we can set a threshold \u0001 > 0 and let \u03b8t+1 = \u03b8t once l(yt, ut, \u03b8t) < \u0001. (ii) Normalization\nof \u03b8t+1 is needed in some situations, which eliminates the impact of trivial solutions. (iii) Mini-\nbatches One technique to enhance online learning is to consider multiple observations per update.\nIn our framework, this means that computing \u03b8t+1 using |Nt| > 1 noisy decisions in (1).\nRemark 3.2. To obtain a strong initialization of \u03b8 in Algorithm 1, we can incorporate an idea in [1],\nwhich imputes a convex objective function by minimizing the residuals of KKT conditions incurred\n\nby the noisy data. Assume we have a historical data set (cid:101)T , which may be of poor qualities for the\n\ncurrent learning. This leads to the following initialization problem:\n\nmin\n\u03b8\u2208\u0398\ns.t.\n\n1\n\nc + rt\ns\n\n(cid:0)rt\n\n(cid:1)\n(cid:80)\n|(cid:101)T|\nt\u2208[(cid:101)T ]\n|uT\nt g(yt, ut, \u03b8)| \u2264 rt\nc,\nt g(yt, ut, \u03b8)(cid:107)2 \u2264 rt\n(cid:107)\u2207f (yt, ut, \u03b8) + \u2207uT\nut \u2208 Rm\nc \u2208 R+, rt\ns \u2208 R+,\n\n+ , rt\n\n\u2200t \u2208 (cid:101)T ,\ns, \u2200t \u2208 (cid:101)T ,\n\u2200t \u2208 (cid:101)T ,\n\n(2)\n\n4\n\n\fAlgorithm 1 Implicit Online Learning for Generalized Inverse Optimization\n1: Input: (signal,noisy decision) pairs {(ut, yt)}t\u2208[T ]\n2: Initialization: \u03b81 could be an arbitrary hypothesis of the parameter.\n3: for t = 1 to T do\nreceive (ut, yt)\n4:\nsuffer loss l(yt, ut, \u03b8t)\n5:\nif l(yt, ut, \u03b8t) = 0 then\n6:\n7:\n8:\n9:\n10:\nend if\n11:\n12: end for\n\n\u03b8t+1 \u2190 \u03b8t\nset learning rate \u03b7t \u221d 1/\nupdate \u03b8t+1 = arg min\n\u03b8\u2208\u0398\n\n\u221a\n2(cid:107)\u03b8 \u2212 \u03b8t(cid:107)2\n\n2 + \u03b7tl(yt, ut, \u03b8) (solve (1))\n\nelse\n\nt\n\n1\n\nc and rt\n\nwhere rt\ns are residuals corresponding to the complementary slackness and stationarity in KKT\nconditions for the t-th noisy decision yt, and ut is the dual variable corresponding to the constraints\nin DMP. Note that (2) is a convex program. It can be solved quite ef\ufb01ciently compared to solving\nthe inverse optimization problem in batch setting [7]. Other initialization approaches using similar\nideas e.g., computing a variational inequality based approximation of inverse model [13], can also\nbe incorporated into our algorithm.\n\n3.3 Theoretical analysis\n\nNote that the implicit online learning algorithm is generally applicable to learn the parameter of\n\u221a\nany convex DMP. In this section, we prove that the average regret RT /T converges at a rate of\nO(1/\nT ) under certain regularity conditions. Furthermore, we will show that the proposed algo-\nrithm is statistically consistent when the data satis\ufb01es some common regularity conditions. We begin\nby introducing a few assumptions that are rather common in literature [1, 13, 10, 7].\nAssumption 3.1. (a) For each u \u2208 U and \u03b8 \u2208 \u0398, X(u, \u03b8) is closed, and has a nonempty relative\ninterior. X(u, \u03b8) is also uniformly bounded. That is, there exists B > 0 such that (cid:107)x(cid:107)2 \u2264\nB for all x \u2208 X(u, \u03b8).\n\n(b) f (x, u, \u03b8) is \u03bb-strongly convex in x on Y for \ufb01xed u \u2208 U and \u03b8 \u2208 \u0398. That is, \u2200x, y \u2208 Y,\n\n(cid:18)\n\n(cid:19)T\n\n\u2207f (y, u, \u03b8) \u2212 \u2207f (x, u, \u03b8)\n\n(y \u2212 x) \u2265 \u03bb(cid:107)x \u2212 y(cid:107)2\n2.\n\nRemark 3.3. For strongly convex program, there exists only one optimal solution. Therefore, As-\nsumption 3.1.(b) ensures that S(u, \u03b8) is a single-valued set for each u \u2208 U. However, S(u, \u03b8) might\nbe multivalued for general convex DMP for \ufb01xed u. Consider, for example, minx1,x2{x1 + x2 :\nx1 + x2 \u2265 1}. Note that all points on line x1 + x2 = 1 are optimal. Indeed, we \ufb01nd such case is\nquite common when there are many variables and constraints. Actually, it is one of the major chal-\nlenges when learning parameters of a function that\u2019s not strongly convex using inverse optimization.\n\nFor convenience of analysis, we assume below that we seek to learn the objective function while\nconstraints are known. Then, the performance of Algorithm 1 also depends on how the change of \u03b8\naffects the objective values. For \u2200x \u2208 Y,\u2200u \u2208 U,\u2200\u03b81, \u03b82 \u2208 \u0398, we consider the difference function\n(3)\n\nh(x, u, \u03b81, \u03b82) = f (x, u, \u03b81) \u2212 f (x, u, \u03b82).\n\nAssumption 3.2. \u2203\u03ba > 0, \u2200u \u2208 U,\u2200\u03b81, \u03b82 \u2208 \u0398, h(\u00b7, u, \u03b81, \u03b82) is Lipschitz continuous on Y:\n\n|h(x, u, \u03b81, \u03b82) \u2212 h(y, u, \u03b81, \u03b82)| \u2264 \u03ba(cid:107)\u03b81 \u2212 \u03b82(cid:107)2(cid:107)x \u2212 y(cid:107)2,\u2200x, y \u2208 Y.\n\nBasically, this assumption says that the objectives functions will not change very much when ei-\nther the parameter \u03b8 or the variable x is perturbed. It actually holds in many common situations,\nincluding the linear program and quadratic program.\nLemma 3.1. Under Assumptions 2.1 - 3.2, the loss function l(y, u, \u03b8) is uniformly 4(B+R)\u03ba\nLipschitz continuous in \u03b8. That is, \u2200y \u2208 Y,\u2200u \u2208 U,\u2200\u03b81, \u03b82 \u2208 \u0398, we have\n(cid:107)\u03b81 \u2212 \u03b82(cid:107)2.\n\n|l(y, u, \u03b81) \u2212 l(y, u, \u03b82)| \u2264 4(B + R)\u03ba\n\n-\n\n\u03bb\n\n\u03bb\n\n5\n\n\fThe establishment of Lemma 3.1 is based on the key observation that the perturbation of S(u, \u03b8)\ndue to \u03b8 is bounded by the perturbation of \u03b8 through applying Proposition 6.1 in [19]. Details of the\nproof are given in the supplementary material.\nRemark 3.4. When we seek to learn the constraints or jointly learn the constraints and objective\nfunction, similar result can be established by applying Proposition 4.47 in [20] while restricting not\nonly the Lipschitz continuity of the difference function in (3), but also the Lipschitz continuity of\nthe distance between the feasible sets X(u, \u03b81) and X(u, \u03b82) (see Remark 4.40 in [20]).\nAssumption 3.3. For the DMP, \u2200y \u2208 Y,\u2200u \u2208 U,\u2200\u03b81, \u03b82 \u2208 \u0398, \u2200\u03b1, \u03b2 \u2265 0 s.t. \u03b1 + \u03b2 = 1, we have\n\n(cid:107)\u03b1S(u, \u03b81) + \u03b2S(u, \u03b82) \u2212 S(u, \u03b1\u03b81 + \u03b2\u03b82)(cid:107)2 \u2264 \u03b1\u03b2(cid:107)S(u, \u03b81) \u2212 S(u, \u03b82)(cid:107)2/(2(B + R)).\n\nEssentially, this assumption indicates that the distance between S(u, \u03b1\u03b81 + \u03b2\u03b82) and the convex\ncombination of S(u, \u03b81) and S(u, \u03b82) shall be small when S(u, \u03b81) and S(u, \u03b82) are close. An\nexample is provided in the supplementary material to show that this assumption can be satis\ufb01ed.\nYet, we note that it probably is restrictive and hard to verify in general.\nLet \u03b8\u2217 be an optimal inference to min\u03b8\u2208\u0398\ni.e., an inference derived with\nthe whole batch of observations available. Then, the following theorem asserts that RT =\n\n(cid:80)\n\u221a\nt\u2208[T ](l(yt, \u03b8t) \u2212 l(yt, \u03b8\u2217)) of the implicit online learning algorithm is of O(\n\nt\u2208[T ] l(yt, \u03b8),\n\n(cid:80)\n\nT ).\n\n1\nT\n\nTheorem 3.2 (Regret bound). Suppose Assumptions 2.1 - 3.3 hold. Then, choosing \u03b7t =\n\u221a\n2\n\n, we have\n\n2(B+R)\u03ba\n\n1\u221a\nt\n\nD\u03bb\n\n\u221a\nRT \u2264 8\n\n2(B + R)D\u03ba\n\n\u03bb\n\n\u221a\n\nT .\n\nRemark 3.5. We establish of the above regret bound by extending Theorem 3.2.\nin [18]. Our\nextension involves several critical and complicated analyses for the structure of the optimal solution\nset S(u, \u03b8) as well as the loss function, which is essential to our theoretical understanding. Moreover,\nwe relax the requirement of smoothness of loss function in that theorem to Lipschitz continuity\nthrough a similar argument in Lemma 1 of [21] and [22].\n\nBy applying both Theorem 3 in [7] and the regret bound proved in Theorem 3.2, we show the risk\nconsistency of the online learning algorithm in the sense that the average cumulative loss converges\nin probability to the true risk in the batch setting.\nTheorem 3.3 (Risk consistency). Let \u03b80 = arg min\u03b8\u2208\u0398{E [l(y, u, \u03b8)]} be the optimal solution that\nminimizes the true risk in batch setting. Suppose the conditions in Theorem 3.2 hold. If E[y2] < \u221e,\nthen choosing \u03b7t =\n\n, we have\n\nD\u03bb\n\n\u221a\n2\n\n2(B+R)\u03ba\n1\nT\n\n1\u221a\nt\n\n(cid:88)\n\nt\u2208[T ]\n\np\u2212\u2192 E(cid:2)l(y, u, \u03b80)(cid:3) .\n\nl(yt, ut, \u03b8t)\n\nCorollary 3.3.1. Suppose that the true parameter \u03b8true \u2208 \u0398, and y = x + \u0001, where x \u2208 S(u, \u03b8true)\nfor some u \u2208 U, E[\u0001] = 0, E[\u0001T \u0001] < \u221e, and u, x are independent of \u0001. Let the conditions in\nTheorem 3.2 hold. Then choosing \u03b7t =\n\nD\u03bb\n\n\u221a\n2\n\n(cid:88)\n\nt\u2208[T ]\n\n1\nT\n\n2(B+R)\u03ba\n\nl(yt, ut, \u03b8t)\n\n1\u221a\nt\n\n, we have\np\u2212\u2192 E[\u0001T \u0001].\n\nRemark 3.6. (i) Theorem 3.3 guarantees that the online learning algorithm proposed in this paper\nwill asymptotically achieves the best prediction error permitted by the inverse model we consider.\n(ii) Corollary 3.3.1 suggests that the prediction error is inevitable as long as the data carries noise.\nThis prediction error, however, will be caused merely by the noisiness of the data in the long run.\n\n4 Applications to learning problems in IOP\n\nIn this section, we will provide sketches of representative applications for inferring objective func-\ntions and constraints using the proposed online learning algorithm. Our preliminary experiments\nhave been run on Bridges system at the Pittsburgh Supercomputing Center (PSC) [23]. The mixed\ninteger second order conic programs, which are derived from using KKT conditions in (1), are solved\nby Gurobi. All the algorithms are programmed with Julia [24].\n\n6\n\n\f(a)\n\n(b)\n\n(c)\n\nFigure 2: Learning the utility function over T = 1000 rounds. (a) We run 100 repetitions of the\nexperiments using Algorithm 1 with two settings. Cold-start means that we initialize r as a vector of\nzeros. Warm-start means that we initialize r by solving (2) with 1000 (price,noisy decision) pairs.\nWe plot the estimation errors over round t in pink and brown for all the 100 repetitions, respectively.\nWe also plot the average estimation errors of the 100 repetitions in red line and dashed brown line,\nrespectively. (b) The dotted brown line is the error bar plot of the average running time over 10\nrepetitions in batch setting. The blue line is the error bar plot of the average running time over 100\nrepetitions in online setting. Here, the error bar is [mean-std, mean+std]. (c) We randomly pick one\nrepetition. The loss over round is indicated by the dot. The average cumulative loss is indicated by\nthe line. The dotted line indicates the variance of the noise. Here, E[\u0001T \u0001] = 0.2083.\n\n4.1 Learning consumer behavior\n\nWe now study the consumer\u2019s behavior problem in a market with n products. The prices for the\n+ which varies over time t \u2208 [T ]. We assume throughout that\nproducts are denoted by pt \u2208 Rn\nthe consumer has a rational preference relation, and we take u to be the utility function represent-\ning these preferences. The consumer\u2019s decision making problem of choosing her most preferred\nconsumption bundle x given the price vector pt and budget b can be stated as the following utility\nmaximization problem (UMP) [25].\n\nmax\nx\u2208Rn\n+\n\nu(x)\nt x \u2264 b,\n\nUMP\n\ns.t. pT\nt x \u2264 b is the budget constraint at time t.\n\nwhere pT\nFor this application, we will consider a concave quadratic representation for u(x). That is, u(x) =\n2 xT Qx + rT x, where Q \u2208 Sn\u2212 (the set of symmetric negative semide\ufb01nite matrices), r \u2208 Rn.\n1\nWe consider a problem with n = 10 products, and the budget b = 40. Q and r are randomly\ngenerated and are given in the supplementary material. Suppose prices are changing in T rounds. In\neach round, the learner would receive one (price,noisy decision) pair (pt, yt). Her goal is to learn the\nutility function or budget of the consumer. The (price,noisy decision) pair in each round is generated\ni \u223c U [pmin, pmax],\nas follows. In round t, we generate the prices from a uniform distribution, i.e. pt\nwith pmin = 5 and pmax = 25. Then, we solve UMP and get the optimal decision xt. Next, the\nnoisy decision yt is obtained by corrupting xt with noise that has a jointly uniform distribution with\nsupport [\u22120.25, 0.25]2. Namely, yt = xt + \u0001t, where each element of \u0001t \u223c U (\u22120.25, 0.25).\nLearning the utility function In the \ufb01rst set of experiments, the learner seeks to learn r given\n{(pt, yt)}t\u2208[T ] that arrives sequentially in T = 1000 rounds. We assume that r is within [0, 5]10.\n\u221a\nThe learning rate is set to \u03b7t = 5/\nt. Then, we implement Algorithm 1 with two settings. We report\nour results in Figure 2. As can be seen in Figure 2a, solving the initialization problem provides quite\ngood initialized estimations of r, and Algorithm 1 with Warm-start converges faster than that with\nCold-start. Note that (2) is a convex program and the time to solve it is negligible in Algorithm 1.\nThus, the running times with and without Warm-start are roughly the same. This suggests that one\nmight prefer to use Algorithm 1 with Warm-start if she wants to get a relatively good estimation\nof the parameters in few iterations. However, as shown in the \ufb01gure, both settings would return\nvery similar estimations on r in the long run. To keep consistency, we would use Algorithm 1 with\nCold-start in the remaining experiments. We can also see that estimation errors over rounds for\ndifferent repetitions concentrate around the average, indicating that our algorithm is pretty robust to\nnoises. Moreover, Figure 2b shows that inverse optimization in online setting is drastically faster\n\n7\n\n0200400600800100010-1100101Cold-start: Estimation errorCold-start: Average errorWarm-start: Estimation errorWarm-start: Average errorT= 5T= 10T= 150100020003000400050006000700080009000Batch settingOnline setting0200400600800100000.511.52LossAverage cumulative lossLoss per roundE[T]\f(a)\n\n(b)\n\n(c)\n\nFigure 3: Learning the budget over T = 1000 rounds. (a) We run 100 repetitions of the experiments.\nWe plot the estimation error over round t for all the 100 repetitions in pink. We also plot the average\nestimation error of the 100 repetitions in red. (b) The dotted brown line is the error bar plot of the\naverage running time over 10 repetitions in batch setting. The blue line is the error bar plot of the\naverage running time over 100 repetitions in online setting. (c) We randomly pick one repetition.\nThe loss over round is indicated by the dot. The average cumulative loss is indicated by the line.\nThe dotted line is the reference line indicating the variance of the noise. Here, E[\u0001T \u0001] = 0.2083.\n\nthan in batch setting. This also suggests that windowing approach for inverse optimization might be\npractically infeasible since it fails even with a small subset of data, such as window size equals to\n10. We then randomly pick one repetition and plot the loss over round and the average cumulative\nloss in Figure 2c. We see clearly that the average cumulative loss asymptotically converges to the\nvariance of the noise. This makes sense because the loss merely re\ufb02ects the noise in the data when\nthe estimation converges to the true value as stated in Remark 3.6.\n\u221a\nLearning the budget\nIn the second set of experiments, the learner seeks to learn the budget b in\nT = 1000 rounds. We assume that b is within [0, 100]. The learning rate is set to \u03b7t = 100/\nt.\nThen, we apply Algorithm 1 with Cold-start. We show the results in Figure 3. All the analysis for\nthe results in learning the utility function apply here. One thing to emphasize is that learning the\nbudget is much faster than learning the utility function, as shown in Figure 2b and 3b. The main\nreason is that the budget b is a one dimensional vector, while the utility vector r is a ten dimensional\nvector, making it drastically more complex to solve (1).\n\n4.2 Learning the transportation cost\nWe now consider the transshipment network G = (Vs \u222a Vd, E), where nodes Vs are producers and\nthe remaining nodes Vd are consumers. The production level is yv for node v \u2208 Vs, and has a\nv for node v \u2208 Vs and varies over time t \u2208 [T ].\nmaximum capacity of wv. The demand level is dt\nWe assume that producing yv incurs a cost of C v(yv) for node v \u2208 Vs; furthermore, we also assume\nthat there is a transportation cost cexe associated with edge e \u2208 E, and the \ufb02ow xe has a maximum\ncapacity of ue. The transshipment problem can be formulated in the following:\n\nC v(yv) + (cid:80)\nmin (cid:80)\nxe \u2212 (cid:80)\ns.t. (cid:80)\n(cid:80)\nxe \u2212 (cid:80)\n\ne\u2208\u03b4+(v)\n\nv\u2208Vs\n\ne\u2208E\n\ne\u2208\u03b4\u2212(v)\n\ncexe\nxe = yv, \u2200v \u2208 Vs,\nv, \u2200v \u2208 Vd,\n\nxe = dt\ne\u2208\u03b4+(v)\n0 \u2264 xe \u2264 ue, 0 \u2264 yv \u2264 wv,\n\ne\u2208\u03b4\u2212(v)\n\n\u2200e \u2208 E,\u2200v \u2208 Vs,\n\nTP\n\nwhere we want to learn the transportation cost ce for e \u2208 E. For this application, we will consider a\nconvex quadratic cost for C v(yv). That is, C v(yv) = 1\nWe create instances of the problem based on the network in Figure 4a. \u03bb1, \u03bb2, {ue}e\u2208E, {wv}v\u2208Vs\nand the randomly generated {ce}e\u2208E are given in supplementary material. In each round, the learner\nwould receive the demands {dt\nv}v\u2208Vd, the production levels {yv}v\u2208Vs and the \ufb02ows {xe}e\u2208E, where\nv for v \u2208 Vd from a uniform\nthe later two are corrupted by noises. In round t, we generate the dt\nv \u223c U [\u22121.25, 0]. Then, we solve TP and get the optimal production levels and\ndistribution, i.e. dt\n\ufb02ows. Next, the noisy production levels and \ufb02ows are obtained by corrupting the optimal ones with\nnoise that has a jointly uniform distribution with support [\u22120.25, 0.25]8.\n\nv, where \u03bbv \u2265 0.\n\n2 \u03bbvy2\n\n8\n\n02004006008001000010203040Estimation error per roundAverage estimation errorT= 50T= 100T= 2500500100015002000Batch settingOnline setting0200400600800100000.20.40.60.811.2LossAverage cumulative lossLoss per roundE[T]\f(a)\n\n(b)\n\n(c)\n\nFigure 4: Learning the transportation cost over T = 1000 rounds. (a) We plot the \ufb01ve-node network\nin our experiment. (b) Denote c \u2208 R|E| the vector of transportation costs. We run 100 repetitions of\nthe experiments. We plot the estimation error at each round t for all the 100 experiments. We also\nplot the average estimation error of the 100 repetitions. (c) We randomly pick one repetition. The\nloss over round is indicated by the dot. The average cumulative loss is indicated by the line. The\ndotted line is the reference line indicating the variance of the noise. Here, E[\u0001T \u0001] = 0.1667.\n\nSuppose the transportation cost on edge (2, 3) and (2, 5) are unknown, and the learner seeks to\n\u221a\nlearn them given the (demand,noisy decision) pairs that arrive sequentially in T = 1000 rounds. We\nassume that ce for e \u2208 E is within [1, 10]. The learning rate is set to \u03b7t = 2/\nt. Then, we implement\nAlgorithm 1 with Cold-start. Figure 4b shows the estimation error of c in each round over the 100\nrepetitions. We also plot the average estimation error of the 100 repetitions. As shown in this \ufb01gure,\nct asymptotically converges to the true transportation cost cture pretty fast. Also. estimation errors\nover rounds for different repetitions concentrate around the average, indicating that our algorithm is\npretty robust to noises. We then randomly pick one repetition and plot the loss over round and the\naverage cumulative loss in Figure 4c. Note that the variance of the noise E[\u0001T \u0001] = 0.1667. We can\nsee that the average cumulative loss asymptotically converges to the variance of the noise.\n\n5 Conclusions and \ufb01nal remarks\n\nIn this paper, an online learning method to infer preferences or restrictions from noisy observations\nis developed and implemented. We prove a regret bound for the implicit online learning algorithm\nunder certain regularity conditions, and show that the algorithm is statistically consistent, which\nguarantees that our algorithm will asymptotically achieves the best prediction error permitted by\nthe inverse model. Experiment results show that our algorithm can learn the parameters with great\naccuracy, is robust to noises even if some assumptions are not satis\ufb01ed or dif\ufb01cult to be veri\ufb01ed,\nand achieves a dramatic improvement over the batch learning approach on computational ef\ufb01cacy.\nFuture research directions include the algorithm development with more sophisticated online learn-\ning techniques for a stronger performance, and the theoretical investigation with less restriction\nassumptions and a broader applicability.\n\nAcknowledgments\n\nThis work was partially supported by CMMI-1642514 from the National Science Foundation. This\nwork used the Bridges system, which is supported by NSF award number ACI-1445606, at the\nPittsburgh Supercomputing Center (PSC).\n\nReferences\n[1] Arezou Keshavarz, Yang Wang, and Stephen Boyd.\n\nImputing a convex objective function.\nIn Intelligent Control (ISIC), 2011 IEEE International Symposium on, pages 613\u2013619. IEEE,\n2011.\n\n[2] Ravindra K Ahuja and James B Orlin. Inverse optimization. Operations Research, 49(5):771\u2013\n\n783, 2001.\n\n[3] Garud Iyengar and Wanmo Kang. Inverse conic programming with applications. Operations\n\nResearch Letters, 33(3):319\u2013330, 2005.\n\n9\n\n341250200400600800100001234Estimation error per roundAverage estimation error0200400600800100000.20.40.60.811.2LossAverage cumulative lossLoss per roundE[T]\f[4] Andrew J. Schaefer. Inverse integer programming. Optimization Letters, 3(4):483\u2013489, 2009.\n[5] Lizhi Wang. Cutting plane algorithms for the inverse mixed integer linear programming prob-\n\nlem. Operations Research Letters, 37(2):114\u2013116, 2009.\n\n[6] Andreas B\u00e4rmann, Sebastian Pokutta, and Oskar Schneider. Emulating the expert: Inverse\n\noptimization through online learning. In ICML, 2017.\n\n[7] Anil Aswani, Zuo-Jun Shen, and Auyon Siddiq. Inverse optimization with noisy data. Opera-\n\ntions Research, 2018.\n\n[8] Timothy CY Chan, Tim Craig, Taewoo Lee, and Michael B Sharpe. Generalized inverse mul-\ntiobjective optimization with application to cancer therapy. Operations Research, 62(3):680\u2013\n695, 2014.\n\n[9] Dimitris Bertsimas, Vishal Gupta, and Ioannis Ch Paschalidis. Inverse optimization: A new\n\nperspective on the black-litterman model. Operations research, 60(6):1389\u20131403, 2012.\n\n[10] Peyman Mohajerin Esfahani, Soroosh Sha\ufb01eezadeh-Abadeh, Grani A Hanasusanto, and Daniel\nKuhn. Data-driven inverse optimization with imperfect information. Mathematical Program-\nming, pages 1\u201344, 2017.\n\n[11] Chaosheng Dong and Bo Zeng. Inferring parameters through inverse multiobjective optimiza-\n\ntion. arXiv preprint arXiv:1808.00935, 2018.\n\n[12] Charu C Aggarwal. Recommender Systems: The Textbook. Springer, 2016.\n[13] Dimitris Bertsimas, Vishal Gupta, and Ioannis Ch Paschalidis. Data-driven estimation in equi-\n\nlibrium using inverse optimization. Mathematical Programming, 153(2):595\u2013633, 2015.\n\n[14] Hai Yang, Tsuna Sasaki, Yasunori Iida, and Yasuo Asakura. Estimation of origin-destination\nmatrices from link traf\ufb01c counts on congested networks. Transportation Research Part B:\nMethodological, 26(6):417\u2013434, 1992.\n[15] Stephan Dempe and Sebastian Lohse.\n\nInverse linear programming.\n\nIn Recent Advances in\n\nOptimization, pages 19\u201328. Springer, 2006.\n\n[16] Timothy CY Chan, Taewoo Lee, and Daria Terekhov.\n\nInverse optimization: Closed-form\n\nsolutions, geometry, and goodness of \ufb01t. Management Science, 2018.\n\n[17] Li Cheng, Dale Schuurmans, Shaojun Wang, Terry Caelli, and Svn Vishwanathan. Implicit\n\nonline learning with kernels. In NIPS, 2007.\n\n[18] Brian Kulis and Peter L. Bartlett. Implicit online learning. In ICML, 2010.\n[19] J Fr\u00e9d\u00e9ric Bonnans and Alexander Shapiro. Optimization problems with perturbations: A\n\nguided tour. SIAM Review, 40(2):228\u2013264, 1998.\n\n[20] J Fr\u00e9d\u00e9ric Bonnans and Alexander Shapiro. Perturbation analysis of optimization problems.\n\nSpringer Science & Business Media, 2013.\n\n[21] Jialei Wang, Weiran Wang, and Nathan Srebro. Memory and communication ef\ufb01cient dis-\ntributed stochastic optimization with minibatch-prox. Proceedings of Machine Learning Re-\nsearch, 65:1\u201337, 2017.\n\n[22] John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning\nand stochastic optimization. Journal of Machine Learning Research, 12(Jul):2121\u20132159, 2011.\n[23] Nicholas A. Nystrom, Michael J. Levine, Ralph Z. Roskies, and J. Ray Scott. Bridges: A\nuniquely \ufb02exible hpc resource for new communities and data analytics. In Proceedings of the\n2015 XSEDE Conference: Scienti\ufb01c Advancements Enabled by Enhanced Cyberinfrastructure,\nXSEDE \u201915, pages 30:1\u201330:8, New York, NY, USA, 2015. ACM.\n\n[24] Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B Shah. Julia: A fresh approach to\n\nnumerical computing. SIAM Review, 59(1):65\u201398, 2017.\n\n[25] Andreu Mas-Collell, Michael Whinston, and Jerry R Green. Microeconomic theory. 1995.\n\n10\n\n\f", "award": [], "sourceid": 77, "authors": [{"given_name": "Chaosheng", "family_name": "Dong", "institution": "University of Pittsburgh"}, {"given_name": "Yiran", "family_name": "Chen", "institution": "Duke University"}, {"given_name": "Bo", "family_name": "Zeng", "institution": "pitt"}]}