{"title": "Large-Scale Price Optimization via Network Flow", "book": "Advances in Neural Information Processing Systems", "page_first": 3855, "page_last": 3863, "abstract": "This paper deals with price optimization, which is to find the best pricing strategy that maximizes revenue or profit, on the basis of demand forecasting models. Though recent advances in regression technologies have made it possible to reveal price-demand relationship of a number of multiple products, most existing price optimization methods, such as mixed integer programming formulation, cannot handle tens or hundreds of products because of their high computational costs. To cope with this problem, this paper proposes a novel approach based on network flow algorithms. We reveal a connection between supermodularity of the revenue and cross elasticity of demand. On the basis of this connection, we propose an efficient algorithm that employs network flow algorithms. The proposed algorithm can handle hundreds or thousands of products, and returns an exact optimal solution under an assumption regarding cross elasticity of demand. Even in case in which the assumption does not hold, the proposed algorithm can efficiently find approximate solutions as good as can other state-of-the-art methods, as empirical results show.", "full_text": "Large-Scale Price Optimization via Network Flow\n\nShinji Ito\n\nNEC Corporation\n\ns-ito@me.jp.nec.com\n\nRyohei Fujimaki\nNEC Corporation\n\nrfujimaki@nec-labs.com\n\nAbstract\n\nThis paper deals with price optimization, which is to \ufb01nd the best pricing strategy\nthat maximizes revenue or pro\ufb01t, on the basis of demand forecasting models.\nThough recent advances in regression technologies have made it possible to reveal\nprice-demand relationship of a large number of products, most existing price\noptimization methods, such as mixed integer programming formulation, cannot\nhandle tens or hundreds of products because of their high computational costs. To\ncope with this problem, this paper proposes a novel approach based on network\n\ufb02ow algorithms. We reveal a connection between supermodularity of the revenue\nand cross elasticity of demand. On the basis of this connection, we propose an\nef\ufb01cient algorithm that employs network \ufb02ow algorithms. The proposed algorithm\ncan handle hundreds or thousands of products, and returns an exact optimal solution\nunder an assumption regarding cross elasticity of demand. Even if the assumption\ndoes not hold, the proposed algorithm can ef\ufb01ciently \ufb01nd approximate solutions as\ngood as other state-of-the-art methods, as empirical results show.\n\n1\n\nIntroduction\n\nPrice optimization is a central research topic with respect to revenue management in marketing\nscience [10, 16, 18]. The goal is to \ufb01nd the best price strategy (a set of prices for multiple products)\nthat maximizes revenue or pro\ufb01t. There is a lot of literature regarding price optimization [1, 5, 10,\n13, 17, 18, 20], and signi\ufb01cant success has been achieved in industries such as online retail [7],\nfast-fashion [5], hotels [13, 14], and airlines [16]. One key component in price optimization is\ndemand modeling, which reveals relationships between price and demand. Though traditional studies\nhave focused more on a single price-demand relationship, such as price elasticity of demand [13, 14]\nand the law of diminishing marginal utility [16], multi-product relationships such as cross price\nelasticity of demand [15] have recently received increased attention [5, 17]. Recent advances in\nregression technologies (non-linear, sparse, etc.) make demand modeling over tens or even hundreds\nof products possible, and data oriented demand modeling has become more and more important.\nGiven demand models of multiple products, the role of optimization is to \ufb01nd the best price strategy.\nMost existing studies for multi-product price optimization employ mixed-integer programming [5, 13,\n14] due to the discrete nature of individual prices, but their methods cannot be applied to large scale\nproblems with tens or hundreds of products since their computational costs exponentially increases\nover increasing numbers of products. Though restricting demand models might make optimization\nproblems tractable [5, 7], such approaches cannot capture complicated price-demand relationships\nand often result in poor performance. Ito and Fujimaki [9] have recently proposed a prescriptive\nprice optimization framework to ef\ufb01ciently solve multi-product price optimization with non-linear\ndemand models. In this prescriptive price optimization, the problem is transformed into a sort of\nbinary quadratic programming problem, and they have proposed an ef\ufb01cient relaxation method\nbased on semi-de\ufb01nite programming (SDP). Although their approach has signi\ufb01cantly improved\ncomputational ef\ufb01ciency over that of mixed-integer approaches, the computational complexity of\ntheir SDP formulation requires O(M 6) in theory, where M is the number of products, and it is not\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fsuf\ufb01ciently scalable for large scale problems with hundreds of products, as our empirical evaluation\nshow in Section 5.\nThe goal of this paper is to develop an ef\ufb01cient algorithm for large scale multi-product price optimiza-\ntion problems that can handle hundreds of products as well as \ufb02exible demand models. Our main\ntechnical contributions are two-fold. First, we reveal the connection between submodularity of the\nrevenue and cross elasticity of demand. More speci\ufb01cally, we show that the gross pro\ufb01t function of\nthe prescriptive price optimization is supermodular (i.e., the maximization of the gross pro\ufb01t function\nis equivalent to the submodular minimization) under the assumption regarding cross elasticity of\ndemand that there are no pairs of complementary goods (we refer to this property as a substitute-goods\nproperty).1 On the basis of the submodularity, we propose a practical, ef\ufb01cient algorithm that employs\nnetwork \ufb02ow algorithms for minimum cut problems and returns exact solutions for problems with the\nsubstitute-goods property. Further, even in cases in which the property does not hold, it can ef\ufb01ciently\n\ufb01nd approximate solutions by iteratively improving submodular lower bounds. Our empirical results\nshow that the proposed algorithm can successfully handle hundreds of products and derive solutions\nas good as other state-of-the-art methods, while its computational cost is much cheaper, regardless of\nwhether the substitute-goods property holds or not.\n\n2 Literature review\n\nOur price optimization problems are reduced to binary quadratic problems such as (4). It is well\nknown that submodular binary quadratic programming problems can be reduced to minimum cut\nproblems [12], and hence it can be solved by maximum \ufb02ow algorithms. Also for unconstrained non-\nsubmodular binary quadratic programming problems, there is a lot of literature regarding optimization\nalgorithm using minimum cut, especially in the context of Markov random \ufb01elds inference or energy\nminimization in computer vision [2, 3, 4, 8, 11, 22]. Above all, QPBO method [2, 11] and its\nextensions such as QPBOI method [19] are known to be state-of-the-art methods in terms of scalability\nand theoretical properties. These QPBO/QPBOI and our method are similar in that they all employ\nnetwork \ufb02ow algorithms and derive not only partial/approximate solutions but also lower bounds of\nthe exact optimal (minimum) value. Our methods, however, differs from QPBO and its extensions in\nnetwork structures, accuracy and scalability, as is shown in Section 5.\n\n3 Price optimization and submodularity in cross elasticity of demand\nSuppose we have M products and a product index is denoted by i \u2208 {1, . . . , M}. In prescriptive price\noptimization [9], for a price strategy p = [p1, . . . , pM ](cid:62), where pi is the price of the i-th product,\nand for external variables r = [r1, . . . , rD](cid:62) such as weather, temperature and days of the week, the\nsales quantity (demand) for the i-th product is modeled by the following regression formula:\n\ngit(rt),\n\nfij(pj) +\n\nqi(p, r) =\n\n(1)\nwhere fii expresses the effect of price elasticity of demand, fij (i (cid:54)= j) re\ufb02ects the effect of cross\nelasticity, and git represent how the t-th external variable affect the sales quantity. Note that fij for\nall (i, j) can be arbitrary functions, and Eq. (1) covers various regression (demand) models, such\nas linear regression, additive models [21], linear regression models with univariate basis functions,\netc. This paper assumes that the regression models are given using existing methods and focuses its\ndiscussion on optimization.\nGiven qi(p) for all i and a cost vector c = [c1, . . . , cM ](cid:62), and \ufb01xed external variables r, the gross\npro\ufb01t can be represented as\n\n(cid:96)(p) =\n\n(pi \u2212 ci)qi(p) =\n\n(pi \u2212 ci)\n\nfij(pj) +\n\ngit(rt)\n\n(2)\n\ni=1\n\ni=1\n\nj=1\n\nt=1\n\n1\"Complementary goods\" and \"substitute goods\" are terms in economics. A good example of complementary\ngoods might be wine and cheese, i.e., if we discount wine, the sales of cheese will increase. An example of\nsubstitute goods might be products of different brands in the same product category. If we discount one product,\nsales of the other products will decrease.\n\n2\n\nD(cid:88)\n\n\uf8f6\uf8f8 .\n\nM(cid:88)\n\nj=1\n\nM(cid:88)\n\nD(cid:88)\n\nt=1\n\n\uf8eb\uf8ed M(cid:88)\n\nM(cid:88)\n\n\fThe goal of price optimization is to \ufb01nd p maximizing (cid:96)(p). In practice, pm is often chosen from\nthe \ufb01nite set Pi = {Pi1, . . . , PiK} \u2286 R of K price candidates, where PiK might be a list price and\nPik (k < K) might be discounted prices such as 10%-off, 5%-off, 3%-off. Then, the problem of\nmaximizing the gross pro\ufb01t can be formulated as the following combinatorial optimization problem:\n(3)\n\nsubject to pi \u2208 Pi.\n\nMaximize\n\n(cid:96)(p)\n\nIt is trivial to show that (3) is NP-hard in general.\nLet us formally de\ufb01ne the \"substitute-goods property\" as follows.\nDe\ufb01nition 1 (Substitute-Goods Property). The demand model de\ufb01ned by (1) of the i-th product is\nsaid to satisfy the substitute-goods property if fij is monotone non-decreasing for all j (cid:54)= i.\nThe concept of substitute-goods property is practical and important because retailers often deal with\nsubstitute goods. Suppose the situation that a retailer decides a price strategy of different brand in the\nsame products category. For example, supermarkets sell milk of different brands and car dealerships\nsell various types of cars. These products are usually substitute goods. This kind of cross elasticity\neffect is one of advanced topics in revenue management and is practically important [13, 14, 17].\nOur key observation is the connection between the substitute-goods property in marketing science\nand the supermodularity of the gross pro\ufb01t function, which is formally described in the following\nproposition.\nProposition 2. The gross pro\ufb01t function (cid:96) : P1 \u00d7\u00b7\u00b7\u00b7\u00d7PM \u2192 R is supermodular2 if demand models\nde\ufb01ned by (1) for all products satis\ufb01es the substitute-goods property.\n\nThe above proposition implies that, under the assumption of the substitute-goods property, problem\n(3) can be solved precisely using submodular minimization algorithms, where time complexity is a\npolynomial in M and K. This fact, however, does not necessarily imply that there exists a practical,\nef\ufb01cient algorithm for problem (3). Indeed, general submodular minimization algorithms are slow in\npractice even though their time complexities are polynomial. Further, actual models do not always\nsatisfy the substitute-goods property. We propose solutions to these problems in the next section.\n\n4 Network \ufb02ow-based algorithm for revenue maximization\n\n4.1 Binary quadratic programming formulation\n\nThis section shows that problem (3) can be reduced to the following binary quadratic programming\nproblem (notations are explained in the latter part of this section):\n\nMinimize\nsubject to\n\nx(cid:62)Ax + b(cid:62)x\nx = [x1, . . . , xn](cid:62) \u2208 {0, 1}n,\nxu \u2264 xv\n\n((u, v) \u2208 C),\n\n(4)\n\nEach variable pi takes Pik if and only if the binary vector xi = [xi1, . . . , xi,K\u22121](cid:62) \u2208 {0, 1}(K\u22121)\nsatis\ufb01es:\n\n](cid:62) (k = 1, . . . , K).\n\n(5)\n\nxi = ck := [1, . . . , 1\n\n(cid:124) (cid:123)(cid:122) (cid:125)\n\nk\u22121\n\n(cid:124) (cid:123)(cid:122) (cid:125)\n\n, 0, . . . , 0\n\nK\u2212k\n\nM ](cid:62) \u2208 {0, 1}(K\u22121)M and rede\ufb01ne the indices of the entries of x as\nAlso we de\ufb01ne x = [x(cid:62)\nx = [x1, x2, . . . , x(K\u22121)M ], i.e. xi,k = xi(K\u22121)+k for notational simplicity.\nDe\ufb01ning (cid:96)ij : Pi \u00d7 Pj \u2192 R by (cid:96)ij(pi, pj) = (pi \u2212 ci)fij(pj) for i (cid:54)= j and (cid:96)i : Pi \u2192 R by\n\n1 , . . . , x(cid:62)\n\n(cid:96)i(pi) = (pi \u2212 ci)(fii(pi) +(cid:80)D\n\nt=1 git(rt)), we can express (cid:96) as\n\n(cid:88)\n\n1\u2264i,j\u2264M,i(cid:54)=j\n\nM(cid:88)\n\ni=1\n\n(cid:96)(p) =\n\n(cid:96)ij(pi, pj) +\n\n(cid:96)i(pi).\n\n(6)\n\n2We say that a function f : D1 \u00d7 \u00b7\u00b7\u00b7 \u00d7 Dn \u2192 R (Dj \u2286 R) is submodular if f (x) + f (y) \u2264 f (x \u2228 y) +\nf (x \u2227 y) for all x, y, where x \u2228 y and x \u2227 y denote the coordinate-wise maximum and minimum, respectively.\nWe say a function f is supermodular if \u2212f is submodular.\n\n3\n\n\fAlgorithm 1 s-t cut for price optimization with the substitute-goods property\nInput: Problem instance (A, b, C) of (4), where all entries of A are non-positive.\nOutput: An optimal solution x\u2217 to (4).\n1: Construct a weighted directed graph G = (V, E, w) satisfying (9).\n2: Add edges C with weight \u221e to G, i.e., set E \u2190 E \u222a C and w(u, v) \u2190 \u221e for all (u, v) \u2208 C.\n3: Compute a minimum s-t cut U\u2217 of G, de\ufb01ne x\u2217 by (10) and return x\u2217.\n\nUsing xi, we can construct matrices Aij \u2208 R(K\u22121)\u00d7(K\u22121) for which it holds that\n\n(cid:96)ij(pi, pj) = \u2212x(cid:62)\n\ni Aijxj + const.\n\nIndeed, matrices Aij = [aij\n\nuv]1\u2264u,v\u2264K\u22121 \u2208 R(K\u22121)\u00d7(K\u22121) de\ufb01ned by\n\nuv = \u2212(cid:96)ij(Pi,u+1, Pj,v+1) + (cid:96)ij(Pi,u, Pj,v+1) + (cid:96)ij(Pi,u+1, Pj,v) \u2212 (cid:96)ij(Pi,u, Pj,v)\naij\n\nsatisfy (7). In a similar way, we can construct bi \u2208 RK\u22121 such that (cid:96)i(pi) = \u2212b(cid:62)\ni xi + const.\nAccordingly, the objective function (cid:96) of problem (3) satis\ufb01es (cid:96)(p) = \u2212(x(cid:62)Ax + b(cid:62)x) + const,\nwhere we de\ufb01ne A = [Aij]1\u2264i,j\u2264M \u2208 R(K\u22121)M\u00d7(K\u22121)M and b = [bi]1\u2264i\u2264M \u2208 R(K\u22121)M . The\nconditions xi \u2208 {c1, . . . , cK}\n((u, v) \u2208 C),\nwhere we de\ufb01ne C := {((K \u2212 1)(i \u2212 1) + k + 1, (K \u2212 1)(i \u2212 1) + k) | 1 \u2264 i \u2264 M, 1 \u2264 k \u2264\nK \u2212 2}. Consequently, problem (3) is reduced to problem (4). Although [9] also gives another\nBQP formulation for the problem (3) and relaxes it to a semi-de\ufb01nite programming problem, our\nconstruction of the BQP problem can be solved much more ef\ufb01ciently, as is explained in the next\nsection.\n\n(i = 1, . . . , M ) can be expressed as xu \u2264 xv\n\n(7)\n\n(8)\n\n4.2 Minimum cut for problems with substitute goods property\n\nAs is easily seen from (8), if the problem satis\ufb01es the substitute-goods property, matrix A has only\nnon-positive entries. It is well known that unconstrained binary quadratic programming problems\nsuch as (4) with non-positive A \u2208 Rn\u00d7n and C = \u2205 can be ef\ufb01ciently solved3 by algorithms\nfor minimum cut [6]. Indeed, we can construct a positive weighted directed graph, G = (V =\n{s, t, 1, 2, . . . , n}, E \u2286 V \u00d7 V, w : E \u2192 R>0 \u222a {\u221e})4 for which\n\nx(cid:62)Ax + b(cid:62)x = cG({s} \u222a {u | xu = 1}) + const\n\n(9)\nholds for all x \u2208 {0, 1}n, where cG is the cut function of graph G5. Hence, once we can compute a\nminimum s-t cut U that is a vertex set U \u2286 V minimizing cG(U ) subject to s \u2208 U and t /\u2208 U, we\ncan construct an optimal solution x = [x1, . . . , xn](cid:62) to the problem (4) by setting\n\n(cid:26) 1\n\n0\n\n(u \u2208 U )\n(u /\u2208 U )\n\nxu =\n\n(10)\nFor constrained problems such as (4) with C (cid:54)= \u2205, the constraint xu \u2264 xv is equivalent to xu =\n1 =\u21d2 xv = 1. This condition can be, in the minimum cut problem, expressed as u \u2208 U =\u21d2 v \u2208 U.\nBy adding a directed edge (u, v) with weight \u221e, we can forbid the minimum cut to violate the\nconstraints. In fact, if both u \u2208 U and v /\u2208 U hold, the value of the cut function is \u221e, and hence such\na U cannot be a minimum cut. We summarize this in Algorithm 1.\n\n(u = 1, . . . , n).\n\n4.3 Submodular relaxation for problems without the substitute-goods property\n\nFor problems without the substitute-goods property, we \ufb01rst decompose the matrix A into A+ and\nA\u2212 so that A+ + A\u2212 = A, where A+ = [a+\n\nuv] \u2208 Rn\u00d7n are given by\n\n(cid:26) auv\n\n0\n\na+\nuv =\n\n(cid:26) 0\n\nuv] and A\u2212 = [a\u2212\na\u2212\nuv =\n\nauv\n\n(auv \u2265 0)\n(auv < 0)\n\n,\n\n(auv \u2265 0)\n(auv < 0)\n\n(u, v \u2208 N ).\n\n(11)\n\n3The computational cost of the minimum cut depends on the choice of algorithms. For example, if we use\n\nDinic\u2019s method, the time complexity is O(n3 log n) = O((KM )3 log(KM )).\n\n4 s, t are auxiliary vertices different from 1, . . . , n corresponding to source, sink in maximum \ufb02ow problems.\n5 For details about the construction of G, see, e.g., [4, 12].\n\n4\n\n\fThis leads to a decomposition of the objective function of Problem (4) into supermodular and\nsubmodular terms:\n\nx(cid:62)Ax + b(cid:62)x = x(cid:62)A+x + x(cid:62)A\u2212x + b(cid:62)x,\n\n(12)\nwhere x(cid:62)A+x is supermodular and x(cid:62)A\u2212x + b(cid:62)x is submodular. Our approach is to replace the\nsupermodular term x(cid:62)A+x by a linear function to construct a submodular function approximating\nx(cid:62)Ax+b(cid:62)x, that can be minimized by Algorithm 1. Similar approaches can be found in the literature,\ne.g. [8, 22], but ours has a signi\ufb01cant point of difference; our method constructs approximate functions\nbounding objectives from below, which provides information about the degree of accuracy.\nConsider an af\ufb01ne function h(x) such that h(x) \u2264 x(cid:62)A+x for all x \u2208 {0, 1}n. Such an h can be\nconstructed as follows. Since\n\n\u03b3uv(xu + xv \u2212 1) \u2264 xuxv\n\n(xu, xv \u2208 {0, 1})\n\n(13)\n\nholds for all \u03b3uv \u2208 [0, 1], an arbitrary matrix \u0393 \u2208 [0, 1]n\u00d7n satis\ufb01es\n\nx(cid:62)A+x \u2265 x(cid:62)(A+ \u25e6 \u0393)1 + 1(cid:62)(A+ \u25e6 \u0393)x \u2212 1(cid:62)(A+ \u25e6 \u0393)1 =: h\u0393(x),\n\nwhere A+ \u25e6 \u0393 denotes the Hadamard product, i.e., (A+ \u25e6 \u0393)uv = a+\nthe optimal value of the following problem,\n\n(14)\nuv \u00b7 \u03b3uv. From inequality (14),\n\nMinimize\nsubject to\n\nx(cid:62)A\u2212x + b(cid:62)x + h\u0393(x)\nx = [x1, . . . , xn](cid:62) \u2208 {0, 1}n,\nxu \u2264 xv\n((u, v) \u2208 C),\n\n(15)\n\nis a lower bound for that of problem (4). Since A\u2212 has non-positive entries and b(cid:62)x + h\u0393(x) is\naf\ufb01ne, we can solve (15) using Algorithm 1 to obtain an approximate solution for (4) and a lower\nbound for the optimal value of (4).\n\n4.4 Proximal gradient method with sequential submodular relaxation\nAn essential problem in submodular relaxation is how to choose \u0393 \u2208 [0, 1]n\u00d7n and to optimize x\ngiven \u0393. Let \u03c8(\u0393) denote the optimal value of (15), i.e., de\ufb01ne \u03c8(\u0393) by \u03c8(\u0393) = minx\u2208R x(cid:62)A\u2212x +\nb(cid:62)x + h\u0393(x), where R is the feasible region of (15). Then, for simultaneous optimization of x and\n\u0393, we consider the following problem:\n\nMaximize \u03c8(\u0393)\n\nsubject to \u0393 \u2208 [0, 1]n\u00d7n,\n\nwhich can be rewritten as follows:6\n\nMinimize \u2212 \u03c8(\u0393) + \u2126(\u0393)\n\nsubject to \u0393 \u2208 Rn\u00d7n,\n\nwhere we de\ufb01ne \u2126 : Rn\u00d7n \u2192 R \u222a {\u221e} by\n\n(cid:26) 0\n\n\u221e\n\n\u2126(\u0393) =\n\n(\u0393 \u2208 [0, 1]n\u00d7n)\n(\u0393 /\u2208 [0, 1]n\u00d7n)\n\n.\n\n(16)\n\n(17)\n\n(18)\n\nThen, \u2212\u03c8(\u0393) is convex and (17) can be solved using a proximal gradient method.\nLet \u0393t \u2208 Rn\u00d7n denote the solution on the t-th step. Let xt be the optimal solution of (15) with\n\u0393 = \u0393t, i.e.,\n\n(19)\nThe partial derivative of \u2212h\u0393(x) w.r.t. \u0393 at (\u0393t, xt), denoted by St, is then a subgradient of \u2212\u03c8(\u0393)\nat \u0393t, which can be computed as follows:\n\n{x(cid:62)A\u2212x + b(cid:62)x + h\u0393t(x)}.\n\nxt \u2208 arg min\nx\u2208R\n\nSt = A+ \u25e6 (11(cid:62) \u2212 xt1(cid:62) \u2212 1x(cid:62)\nt )\n\n(20)\n\n6Problem (16) can be also solved using the ellipsoid method, which guarantees polynomial time-complexity\nin the input size. However, it is known that the order of its polynomial is large and that the performance of\nthe algorithm can be poor in practice, especially for large size problems. To try to achieve more practical\nperformance, this paper proposes a proximal gradient algorithm.\n\n5\n\n\fAlgorithm 2 An iterative relaxation algorithm for (4)\nInput: Problem instance (A, b, C) of (4).\nOutput: An approximate solution \u02c6x to (4) satisfying (25), a lower bound \u03c8 of optimal value of (4).\n1: Set \u03931 = 11(cid:62)/2, t = 1, min_value = \u221e, \u03c8 = \u2212\u221e.\n2: while Not converged do\n3:\n\nCompute xt satisfying (19) by using Algorithm 1, and compute\nvaluet = x(cid:62)\nif valuet < max_value then\n\nt Axt + b(cid:62)xt, \u03c8t = x(cid:62)\n\nt A\u2212xt + b(cid:62)xt + h\u0393t(xt), \u03c8 = max{\u03c8, \u03c8t}\n\n4:\n5:\n\nUpdate value and \u02c6x by\n\nmin_value = valuet,\n\n\u02c6x = xt.\n\n(24)\n\nend if\nCompute \u0393t+1 by (22) and (23).\n\n6:\n7:\n8: end while\n9: Return \u02c6x, min_value and \u03c8.\n\nBy using St and a decreasing sequence {\u03b7t} of positive real numbers, we can express the update\nscheme for the proximal gradient method as follows:\n{St \u00b7 \u0393 +\n\n(cid:107)\u0393 \u2212 \u0393t(cid:107)2 + \u2126(\u0393)},\n\n(21)\n\n\u0393t+1 \u2208 arg min\n\u0393\u2208Rn\u00d7n\nWe can compute \u0393t+1 satisfying (21) by\n\n1\n2\u03b7t\n\nwhere Proj[0,1](X) is de\ufb01ned by\n\n\u0393t+1 = Proj[0,1]n\u00d7n (\u0393t \u2212 \u03b7tSt),\n\n(cid:40) 0\n\n(Proj[0,1](X))uv =\n\n1\n(X)uv\n\n((X)uv < 0)\n((X)uv > 1)\n(otherwise)\n\n.\n\n(22)\n\n(23)\n\nThe proposed algorithm can be summarized as Algorithm 2.\nThe choice of {\u03b7t} has a major impact on the rate of the convergence of the algorithm. From a\n\u221a\n\u221a\nt), it is guaranteed\nconvergence analysis of the proximal gradient method, when we set \u03b7t = \u0398(1/\nthat \u03c8t converge to the optimal value \u03c8\u2217 of (16) and |\u03c8t \u2212 \u03c8\u2217| = O(1/\nt). Because \u03c8(\u0393) is\nnon-smooth and not strongly concave, there is no better guarantee of convergence rate, to the best of\nour knowledge. In practice, however, we can observe the convergence in \u223c 10 steps iteration.\n\nInitialization of \u0393\n\n4.5\nLet \u02dcx\u0393 denote an optimal solution to (15). We employ \u03931 = 1/211(cid:62) for the initialization of\n\u0393 because (xu + xu \u2212 1)/2 is the tightest lower bound of xuxv in the max-norm sense, i.e.,\nh(xu, xv) = (xu + xv \u2212 1)/2 is the unique minimizer of maxxu,xv\u2208{0,1}{|xuxv \u2212 h(xu, xv)|},\nsubject to the constraints that h(xu, xv) is af\ufb01ne and bounded from above by xuxv. In this case, \u02dcx\u0393\nis an approximate solution satisfying the following performance guarantee.\nProposition 3. If \u0393 = 11(cid:62)/2, then \u02dcx\u0393 satis\ufb01es\n\u0393 A\u02dcx\u0393 + b(cid:62) \u02dcx\u0393 \u2264 x(cid:62)\n\u02dcx(cid:62)\n\n\u2217 Ax\u2217 + b(cid:62)x\u2217 +\n\n(25)\n\n1(cid:62)A+1,\n\n1\n2\n\nwhere x\u2217 is an optimal solution to (4).\n\n5 Experiments\n\n5.1 Simulations\n\nThis section investigates behavior of Algorithm 2 on the basis of the simulation model used in [9],\nand we compare the proposed method with state-of-the-art methods: the SDP relaxation method [9]\n\n6\n\n\fTable 1: Ranges of parameters in regression models.\n(i) is supermodular, (ii) is supermodular + submodular,\nand (iii) is submodular.\n\nTable 2: Results on real retail data. (a) is\ncomputational time, (b) is estimated gross\npro\ufb01t, (c) is upper bound.\n\n\u03b2ij (i (cid:54)= j)\n[0, 2]\n[\u221225, 25]\n[\u22122, 0]\n\n(i)\n(ii)\n(iii)\n\n\u03b2ii\n[\u22122M,\u2212M ]\n[\u22122M, 0]\n[M \u2212 3, M \u2212 1]\n\n\u03b1i\n[M, 3M ]\n[M, 3M ]\n[1, 3]\n\n(a)\n(b)\n(c)\n\nactual\n\n1403700\n\n-\n\n-\n\nproposed\n\n36[s]\n\n1883252\n1897393\n\nQPBO\n964[s]\n1245568\n1894555\n\nand the QPBO and QBPOI methods [11]. We use SDPA 7.3.8 to solve SDP problems7 and use the\nimplementation of QPBO and QPBOI written by Kolmogolov.8 QPBO methods computes partial\nlabeling, i.e., there might remain unlabeled variables, and we set unlabeled variables to 0 in our\nexperiments. For computing a minimum s-t cut, we use Dinic\u2019s algorithm [6]. All experiments were\nconducted in a machine equipped with Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz, 768GB RAM.\nWe limited all processes to a single CPU core.\n\nregression model qi = \u03b1i +(cid:80)M\n\nRevenue simulation model [9] The sales quantity qi of the i-th product was generated from the\nj=1 \u03b2ijpj, where {\u03b1i} and {\u03b2ij} were generated by uniform distribu-\ntions. We considered three types of uniform distributions to investigate the effect of submodularity,\nas shown in Table 1, which correspond to three different situations: (i) all pairs of products are\nsubstitute goods, i.e., the gross pro\ufb01t function is supermodular, (ii) half pairs are substitute goods\nand the others are complementary goods, i.e., the gross pro\ufb01t function contains submodular terms\nand supermodular terms, and (iii) all pairs are complementary goods, i.e., the gross pro\ufb01t function is\nsubmodular. Price candidates Pi and cost ci for each product are \ufb01xed to Pi = {0.6, 0.7, . . . , 1.0}\nand ci = 0, respectively.\n\nScalability and accuracy comparison We evaluated four methods in terms of computational\ntime (sec) and optimization accuracy (i.e. optimal values calculated by four methods). In addition\nto calculating approximate optimal solutions and values, all four algorithms derive upper bounds of\nexact optimal value, which provide information about how accurate the calculated solution.9 Fig. 1\nshows the results with M = 30, 60, . . . , 300 for situations (i),(ii) and (iii). The plotted values are\narithmetic means of 5 random problem instances. We can observe that proposed, QPBO and QPBOI\nmethods derived exact solutions in the case (i), which can be con\ufb01rmed from the computed upper\nbounds coinciding with the values of objective function. For situations (ii) and (iii), on the other\nhand, the upper bound and the objective value did not coincide and the solutions by QPBO were\nworse than the others. The solutions by QPBOI and SDPrelax are as good as the proposed methods,\nbut their computational costs are signi\ufb01cantly higher especially for the situations (ii) and (iii). For\nall situations, the proposed method successfully derived solutions as good as the best of the four\nmethods did, and its computational cost was the lowest.\n\n5.2 Real-world retail data\n\nData and settings We applied the proposed method to actual retail data from a middle-size su-\npermarket located in Tokyo [23].10 We selected 50 regularly-sold beer products. The data range is\napproximately three years from 2012/01 to 2014/12, and we used the \ufb01rst 35 months (1065 samples)\nfor training regression models and simulated the best price strategy for the next 20 days. Therefore,\nthe problem here was to determine 1000 prices (50 products \u00d7 20 days).\nFor forecasting the sales quantity q(d)\nof the i-product on the d-th day, we use prices features\n{p(d(cid:48))\n}1\u2264j\u226450,d\u221219\u2264d(cid:48)\u2264d of 50 products for the 20 days before the d-th day. In addition to these\n1000 linear price features, we employed \u201cday of the week\" and \u201cmonth\" features (both binary), as\nwell as temperature forecasting features (continuous), as external features. The price candidates\n\nj\n\ni\n\n7http://sdpa.sourceforge.net/\n8http://pub.ist.ac.at/~vnk/software.html\n9 For example, the coincidence of the upper bound and the calculated optimal value implies that the algorithm\n\ncomputed the exact optimal solution.\n\n10The Data has been provided by KSP-SP Co., LTD, http://www.ksp-sp.com.\n\n7\n\n\f(i) supermodular\n\n(ii) supermodular + submodular\n\n(iii) submodular\n\nFigure 1: Comparisons of proposed, QPBO, QPBOI, and SDPrelax methods on revenue simulation\ndata. The horizontal axis represents the number M of products. The vertical axes represent computa-\ntional time (top) and optimal values of four methods (3) (bottom). For the bottom, circle markers\nwith dashed line represent the computed upper bounds of the optimal values, and optimal values and\nupper bounds are normalized so that upper bounds with the proposed method are equal to 1.\n\n{P (d)\nik }5\nk=1 were generated by splitting equally the range [Pi1, Pi5], where Pi1 and Pi5 are the highest\nand lowest prices of the i-th product in the historical data. We assumed that the cost c(d)\ni was\n0.3Pi5 (30% of the list prices). Our objective was to obtain a price strategy for 50-products over\nthe 20 days, from the 1066-th to 1085-th, which involves 1000-dimensional variables, in order\nto maximize the sum of the gross pro\ufb01t for the 20 days. We estimated parameters in regression\nmodels, using the ridge regression method. The estimated model contained 310293 pairs with the\nsubstitute-goods property and 189207 pairs with complementary goods property.\nThe results are summarized in Table 2, where \u201cactual\u201d means the gross pro\ufb01t computed on the basis\nof the historical data regarding sales quantities and prices over the 20 days, from the 1066-th to\n1085-th, and costs c(d)\ni = 0.3Pi5. Thus, the target is to \ufb01nd a strategy that expectedly achieves better\ngross pro\ufb01t than \u201cactual\u201d. We have omitted results for QPBOI and SDPrelax here because they did\nnot terminate after running over 8 hours. We observe that the proposed method successfully derived\na price strategy over 1000 products, which can be expected to increase gross pro\ufb01t signi\ufb01cantly\nin spite of its cheap computational cost, in contrast to QPBO, which failed with more expensive\ncomputation. Although Table 2 shows results using a single CPU core for fair comparison, the\nalgorithm can be easily parallelized that can \ufb01nish optimization in a few seconds. This makes it\npossible to dynamically change prices in real time or enables price managers to \ufb02exibly explore a\nbetter price strategy (changing a price range, target products, domain constraints, etc.)\n\n6 Conclusion\n\nIn this paper we dealt with price optimization based on large-scale demand forecasting models. We\nhave shown that the gross pro\ufb01t function is supermodular under the assumption of the substitute-goods\nproperty. On the basis of this supermodularity, we have proposed an ef\ufb01cient algorithm that employs\nnetwork \ufb02ow algorithms and that returns exact solutions for problems with the substitute-goods\nproperty. Even in case in which the property does not hold, the proposed algorithm can ef\ufb01ciently\n\ufb01nd approximate solutions. Our empirical results have shown that the proposed algorithm can handle\nhundreds/thousands products with much cheaper computational cost than other existing methods.\n\nReferences\n[1] G. Bitran and R. Caldentey. An overview of pricing models for revenue management. Manufacturing &\n\nService Operations Management, 5(3):203\u2013229, 2003.\n\n8\n\n050100150200250300M: number of products050100150200250computational time [s]proposedQPBOQPBOISDPrelax050100150200250300M: number of products050100150200250computational time [s]proposedQPBOQPBOISDPrelax050100150200250300M: number of products050100150200250computational time [s]proposedQPBOQPBOISDPrelax050100150200250300M: number of products0.60.70.80.91.01.11.2value of objective functionproposedQPBOQPBOISDPrelax050100150200250300M: number of products0.700.750.800.850.900.951.001.051.10value of objective functionproposedQPBOQPBOISDPrelax050100150200250300M: number of products0.00.20.40.60.81.01.2value of objective functionproposedQPBOQPBOISDPrelax\f[2] E. Boros and P. L. Hammer. Pseudo-boolean optimization. Discrete applied mathematics, 123(1):155\u2013225,\n\n2002.\n\n[3] Y. Boykov and V. Kolmogorov. An experimental comparison of min-cut/max-\ufb02ow algorithms for energy\nminimization in vision. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 26(9):1124\u2013\n1137, 2004.\n\n[4] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. Pattern\n\nAnalysis and Machine Intelligence, IEEE Transactions on, 23(11):1222\u20131239, 2001.\n\n[5] F. Caro and J. Gallien. Clearance pricing optimization for a fast-fashion retailer. Operations Research,\n\n60(6):1404\u20131422, 2012.\n\n[6] T. H. Cormen. Introduction to algorithms. MIT press, 2009.\n[7] K. J. Ferreira, B. H. A. Lee, and D. Simchi-Levi. Analytics for an online retailer: Demand forecasting and\n\nprice optimization. Manufacturing & Service Operations Management, pages 69\u201388, 2015.\n\n[8] L. Gorelick, Y. Boykov, O. Veksler, I. Ayed, and A. Delong. Submodularization for binary pairwise\nenergies. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages\n1154\u20131161, 2014.\n\n[9] S. Ito and R. Fujimaki. Optimization beyond prediction: Prescriptive price optimization. ArXiv e-prints,\n\nhttp://arxiv.org/abs/1605.05422, 2016.\n\n[10] R. Klein. Revenue Management. Springer, 2008.\n[11] V. Kolmogorov and C. Rother. Minimizing nonsubmodular functions with graph cuts-a review. Pattern\n\nAnalysis and Machine Intelligence, IEEE Transactions on, 29(7):1274\u20131279, 2007.\n\n[12] V. Kolmogorov and R. Zabin. What energy functions can be minimized via graph cuts? Pattern Analysis\n\nand Machine Intelligence, IEEE Transactions on, 26(2):147\u2013159, 2004.\n\n[13] D. Koushik, J. A. Higbie, and C. Eister. Retail price optimization at intercontinental hotels group. Interfaces,\n\n42(1):45\u201357, 2012.\n\n[14] S. Lee. Study of demand models and price optimization performance. PhD thesis, Georgia Institute of\n\nTechnology, 2011.\n\n[15] A. Marshall. Principles of Economics. Library of Economics and Liberty, 1920.\n[16] J. I. McGill and G. J. Van Ryzin. Revenue management: Research overview and prospects. Transportation\n\nscience, 33(2):233\u2013256, 1999.\n\n[17] M. Natter, T. Reutterer, and A. Mild. Dynamic pricing support systems for diy retailers - a case study from\n\naustria. Marketing Intelligence Review, 1:17\u201323, 2009.\n\n[18] R. L. Phillips. Pricing and Revenue Optimization. Stanford University Press, 2005.\n[19] C. Rother, V. Kolmogorov, V. Lempitsky, and M. Szummer. Optimizing binary mrfs via extended roof\nduality. In Computer Vision and Pattern Recognition, 2007. CVPR\u201907. IEEE Conference on, pages 1\u20138.\nIEEE, 2007.\n\n[20] P. Rusmevichientong, B. Van Roy, and P. W. Glynn. A nonparametric approach to multiproduct pricing.\n\nOperations Research, 54(1):82\u201398, 2006.\n\n[21] C. J. Stone. Additive regression and other nonparametric models. The annals of Statistics, pages 689\u2013705,\n\n1985.\n\n[22] M. Tang, I. B. Ayed, and Y. Boykov. Pseudo-bound optimization for binary energies. In Computer\n\nVision\u2013ECCV 2014, pages 691\u2013707. Springer, 2014.\n\n[23] J. Wang, R. Fujimaki, and Y. Motohashi. Trading interpretability for accuracy: Oblique treed sparse\n\nadditive models. In KDD, pages 1245\u20131254, 2015.\n\n9\n\n\f", "award": [], "sourceid": 1917, "authors": [{"given_name": "Shinji", "family_name": "Ito", "institution": "NEC Coorporation"}, {"given_name": "Ryohei", "family_name": "Fujimaki", "institution": "NEC Data Science Research Laboratories"}]}