{"title": "Restricted Boltzmann machines modeling human choice", "book": "Advances in Neural Information Processing Systems", "page_first": 73, "page_last": 81, "abstract": "We extend the multinomial logit model to represent some of the empirical phenomena that are frequently observed in the choices made by humans. These phenomena include the similarity effect, the attraction effect, and the compromise effect. We formally quantify the strength of these phenomena that can be represented by our choice model, which illuminates the flexibility of our choice model. We then show that our choice model can be represented as a restricted Boltzmann machine and that its parameters can be learned effectively from data. Our numerical experiments with real data of human choices suggest that we can train our choice model in such a way that it represents the typical phenomena of choice.", "full_text": "Restricted Boltzmann machines modeling human\n\nchoice\n\nTakayuki Osogami\nIBM Research - Tokyo\n\nosogami@jp.ibm.com\n\nMakoto Otsuka\n\nIBM Research - Tokyo\nmotsuka@ucla.edu\n\nAbstract\n\nWe extend the multinomial logit model to represent some of the empirical phe-\nnomena that are frequently observed in the choices made by humans. These phe-\nnomena include the similarity effect, the attraction effect, and the compromise\neffect. We formally quantify the strength of these phenomena that can be repre-\nsented by our choice model, which illuminates the \ufb02exibility of our choice model.\nWe then show that our choice model can be represented as a restricted Boltzmann\nmachine and that its parameters can be learned effectively from data. Our numer-\nical experiments with real data of human choices suggest that we can train our\nchoice model in such a way that it represents the typical phenomena of choice.\n\n1\n\nIntroduction\n\nChoice is a fundamental behavior of humans and has been studied extensively in Arti\ufb01cial Intelli-\ngence and related areas. The prior work suggests that the choices made by humans can signi\ufb01cantly\ndepend on available alternatives, or the choice set, in rather complex but systematic ways [13]. The\nempirical phenomena that result from such dependency on the choice set include the similarity ef-\nfect, the attraction effect, and the compromise effect. Informally, the similarity effect refers to the\nphenomenon that a new product, S, reduces the share of a similar product, A, more than a dissimilar\nproduct, B (see Figure 1 (a)). With the attraction effect, a new dominated product, D, increases the\nshare of the dominant product, A (see Figure 1 (b)). With the compromise effect, a product, C, has\na relatively larger share when two extreme products, A and B, are in the market than when only\none of A and B is in the market (see Figure 1 (c)). We call these three empirical phenomena as the\ntypical choice phenomena.\nHowever, the standard choice model of the multinomial logit model (MLM) and its variants cannot\nrepresent at least one of the typical choice phenomena [13]. More descriptive models have been\nproposed to represent the typical choice phenomena in some representative cases [14, 19]. However,\nit is unclear when and to what degree the typical choice phenomena can be represented. Also, no\nalgorithms have been proposed for training these descriptive models from data.\n\n(a) Similarity\n\n(b) Attraction\n\n(c) Compromise\n\nFigure 1: Choice sets that cause typical choice phenomena.\n\n1\n\nSABABDABC\fWe extend the MLM to represent the typical choice phenomena, which is our \ufb01rst contribution.\nWe show that our choice model can be represented as a restricted Boltzmann machine (RBM). Our\nchoice model is thus called the RBM choice model. An advantage of this representation as an RBM\nis that training algorithms for RBMs are readily available. See Section 2.\nWe then formally de\ufb01ne the measure of the strength for each typical choice phenomenon and quan-\ntify the strength of each typical choice phenomenon that the RBM choice model can represent. Our\nanalysis not only gives a guarantee on the \ufb02exibility of the RBM choice model but also illuminates\nwhy the RBM choice model can represent the typical choice phenomena. These de\ufb01nitions and\nanalysis constitute our second contribution and are presented in Section 3.\nOur experiments suggest that we can train the RBM choice model in such a way that it represents\nthe typical choice phenomena. We show that the trained RBM choice model can then adequately\npredict real human choice on the means of transportation [2]. These experimental results constitute\nour third contribution and are presented in Section 4.\n\n2 Choice model with restricted Boltzmann machine\nWe extend the MLM to represent the typical choice phenomena. Let I be the set of items. For A \u2208\nX \u2286 I, we study the probability that an item, A, is selected from a choice set, X . This probability\nis called the choice probability. The model of choice, equipped with the choice probability, is called\na choice model. We use A, B, C, D, S, or X to denote an item and X ,Y, or a set such as {A, B} to\ndenote a choice set.\nFor the MLM, the choice probability of A from X can be represented by\n\np(A|X ) =\n\n,\n\n(1)\nwhere we refer to \u03bb(X|X ) as the choice rate of X from X . The choice rate of the MLM is given by\n(2)\nwhere bX can be interpreted as the attractiveness of X. One could de\ufb01ne bX through uX, the\nvector of the utilities of the attributes for X, and \u03b1, the vector of the weight on each attribute (i.e.,\nbX \u2261 \u03b1\u00b7uX). Observe that \u03bbMLM(X|X ) is independent of X as long as X \u2208 X . This independence\ncauses the incapability of the MLM in representing the typical choice phenomena.\nWe extend the choice rate of (2) but keep the choice probability in the form of (1). Speci\ufb01cally, we\nconsider the following choice rate:\n\n\u03bbMLM(X|X ) = exp(bX ),\n\n(cid:80)\n\u03bb(A|X )\nX\u2208X \u03bb(X|X )\n\n\u03bb(X|X ) \u2261 exp(bX )\n\n(cid:0)1 + exp(cid:0)T kX + U k\n\nX\n\n(cid:1)(cid:1) ,\n\n(3)\n\n(cid:89)\nT kX \u2261 (cid:88)\n\nk\u2208K\n\nwhere we de\ufb01ne\n\nT k\nY .\n\nX, that depend on k. The set of these indices is denoted by K.\n\n(4)\nY \u2208X\nX for X \u2208 X , k \u2208 K, that take values in (\u2212\u221e,\u221e).\nOur choice model has parameters, bX , T kX , U k\nEquation (3) modi\ufb01es exp(bX ) by multiplying factors. Each factor is associated with an index, k,\nand has parameters, T kX and U k\nWe now show that our choice model can be represented as a restricted Boltzmann machine (RBM).\nThis means that we can use existing algorithms for RBMs to learn the parameters of the RBM choice\nmodel (see Appendix A.1).\nAn RBM consists of a layer of visible units, i \u2208 V, and a layer of hidden units, k \u2208 H. A visible unit,\ni . The units within each layer are disconnected\ni, and a hidden unit, k, are connected with weight, W k\nfrom each other. Each unit is associated with a bias. The bias of a visible unit, i, is denoted by bvis\n.\nThe bias of a hidden unit, k, is denoted by bhid\nk . A visible unit, i, is associated with a binary variable,\nzi, and a hidden unit, k, is associated with a binary variable, hk, which takes a value in {0, 1}.\nFor a given con\ufb01guration of binary variables, the energy of the RBM is de\ufb01ned as\n\ni\n\ni hk + bvis\n\ni zi + bhid\n\nk hk\n\n(5)\n\n(cid:1) ,\n\nE\u03b8(z, h) \u2261 \u2212(cid:88)\n\n(cid:88)\n\ni\u2208V\n\nk\u2208H\n\n(cid:0)zi W k\n\n2\n\n\fFigure 2: RBM choice model\n\n.\n\n(cid:80)\nz(cid:48)(cid:80)\n\nz(cid:48) or(cid:80)\n\nwhere \u03b8 \u2261 {W, bvis, bhid} denotes the parameters of the RBM. The probability of realizing a partic-\nular con\ufb01guration of (z, h) is given by\nP\u03b8(z, h) \u2261\n\nThe summation with respect to a binary vector (i.e.,(cid:80)\n\nexp(\u2212E\u03b8(z, h))\nh(cid:48) exp(\u2212E\u03b8(z(cid:48), h(cid:48)))\n\nthe possible binary vectors of a given length. The length of z(cid:48) is |V|, and the length of h(cid:48) is |H|.\nThe RBM choice model can be represented as an RBM having the structure in Figure 2. Here, the\nlayer of visible units is split into two parts: one for the choice set and the other for the selected item.\nThe corresponding binary vector is denoted by z = (v, w). Here, v is a binary vector associated\nwith the part for the choice set. Speci\ufb01cally, v has length |I|, and vX = 1 denotes that X is in the\nchoice set. Analogously, w has length |I|, and wA = 1 denotes that A is selected. We use T k\nX to\nA to\ndenote the weight between a hidden unit, k, and a visible unit, X, for the choice set. We use U k\ndenote the weight between a hidden unit, k, and a visible unit, A, for the selected item. The bias is\nzero for all of the hidden units and for all of the visible units for the choice set. The bias for a visible\nunit, A, for the selected item is denoted by bA. Finally, let H = K.\nThe choice rate (3) of the RBM choice model can then be represented by\n\nh(cid:48)) denotes the summation over all of\n\n(6)\n\n\u03bb(A|X ) =\n\nwhere we de\ufb01ne the binary vectors, vX , wA, such that vX\nObserve that the right-hand side of (7) is\nexp(\u2212E\u03b8((vX , wA), h)) =\n\n(cid:88)\n\nexp\n\nh\n\nh\n\n(cid:88)\n\nh\n\nexp(cid:0)\u2212E\u03b8\n(cid:88)\n\n= exp(bA)\n\n= exp(bA)\n\n(cid:0)(cid:0)vX , wA(cid:1) , h(cid:1)(cid:1) ,\ni = 1 iff i \u2208 X and wA\n(cid:32)(cid:88)\n(cid:88)\n(cid:88)\n(cid:89)\n(cid:88)\nexp(cid:0)(cid:0)T kX + U k\n(cid:1) hk\n(cid:1)\n(cid:89)\n(cid:88)\nexp(cid:0)(cid:0)T kX + U k\n(cid:1) hk\n\nT k\nX hk +\n\nX\u2208X\n\nU k\n\nA\n\nA\n\nh\n\nk\n\nk\n\nk\n\nhk\u2208{0,1}\n\nk\n\n(cid:1) ,\n\nj = 1 iff j = A.\n\n(cid:33)\n\nA hk + bA\n\n(7)\n\n(8)\n\n(9)\n\n(10)\n\n(12)\n\nwhich is equivalent to (3).\nThe RBM choice model assumes that one item from a choice set is selected. In the context of the\nRBM, this means that wA = 1 for only one A \u2208 X \u2286 I. Using (6), our choice probability (1) can\nbe represented by\n\np(A|X ) =\n\n(11)\nThis is the conditional probability of realizing the con\ufb01guration, (vX , wA), given that the realized\ncon\ufb01guration is either of the (vX , wX ) for X \u2208 X . See Appendix A.2for an extension of the RBM\nchoice model.\n\nh P\u03b8((vX , wX ), h)\n\n.\n\n(cid:80)\n(cid:80)\nX\u2208X(cid:80)\nh P\u03b8((vX , wA), h)\n\n3 Flexibility of the RBM choice model\nIn this section, we formally study the \ufb02exibility of the RBM choice model. Recall that \u03bb(X|X ) in\n(3) is modi\ufb01ed from \u03bbMLM(X|X ) in (2) by a factor,\n\n1 + exp(cid:0)T kX + U k\n\nX\n\n(cid:1) ,\n\n3\n\nk......X......A......TXkUAkHiddenChoice setSelected itembA\ffor each k in K, so that \u03bb(X|X ) can depend on X through T kX . We will see how this modi\ufb01cation\nallows the RBM choice model to represent each of the typical choice phenomena.\nThe similarity effect refers to the following phenomenon [14]:\n\np(A|{A, B}) > p(B|{A, B})\n\nand\n\np(A|{A, B, S}) < p(B|{A, B, S}).\n\n(13)\n\nMotivated by (13), we de\ufb01ne the strength of the similarity effect as follows:\nDe\ufb01nition 1. For A, B \u2208 X , the strength of the similarity effect of S on A relative to B with X is\nde\ufb01ned as follows:\n\nA,B,S,X \u2261 p(A|X )\np(B|X )\n\n\u03c8(sim)\n\np(B|X \u222a {S})\np(A|X \u222a {S})\n\n.\n\n(14)\n\nA,B,S,X = 1, adding S into X does not change the ratio between p(A|X ) and p(B|X ).\nWhen \u03c8(sim)\nNamely, there is no similarity effect. When \u03c8(sim)\np(A|X ) by a factor\nA,B,S,X by the addition of S into X . This corresponds to the similarity effect of (13). When\nof \u03c8(sim)\n\u03c8(sim)\nA,B,S,X < 1, this ratio decreases by an analogous factor. We will study the strength of this (rather\ngeneral) similarity effect without the restriction that S is \u201csimilar\u201d to A (see Figure 1 (a)).\nBecause p(X|X ) has a common denominator for X = A and X = B, we have\n\nA,B,S,X > 1, we can increase p(B|X )\n\n\u03c8(sim)\n\nA,B,S,X =\n\n\u03bb(A|X )\n\u03bb(B|X )\n\n\u03bb(B|X \u222a {S})\n\u03bb(A|X \u222a {S})\n\n.\n\n(15)\n\nThe MLM cannot represent the similarity effect, because the \u03bbMLM(X|X ) in (2) is independent of\nX . For any choice sets, X and Y, we must have\n\u03bbMLM(A|X )\n\u03bbMLM(B|X )\n\n\u03bbMLM(A|Y)\n\u03bbMLM(B|Y)\n\n(16)\n\n=\n\n.\n\nThe equality (16) is known as the independence from irrelevant alternatives (IIA).\nThe RBM choice model can represent an arbitrary strength of the similarity effect. Speci\ufb01cally, by\nadding an element, \u02c6k, into K of (3), we can set \u03bb(A|X\u222a{S})\nat an arbitrary value without affecting\nthe value of \u03bb(B|Y),\u2200B (cid:54)= A, for any Y. We prove the following theorem in Appendix C:\nTheorem 1. Consider an RBM choice model where the choice rate of X from X is given by (2). Let\n\u02c6\u03bb(X|X ) be the corresponding choice rate after adding \u02c6k into K. Namely,\n\n\u03bb(A|X )\n\n\u02c6\u03bb(X|X ) = \u03bb(X|X )\n\n(17)\nConsider an item A \u2208 X and an item S (cid:54)\u2208 X . For any c \u2208 (0,\u221e) and \u03b5 > 0, we can then choose\nT \u02c6k\u00b7 and U \u02c6k\u00b7 such that\n\n1 + exp\n\n.\n\n\u02c6kX + U\nT\n\n(cid:16)\n\n(cid:16)\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) \u02c6\u03bb(B|Y)\n\n\u03bb(B|Y)\n\n\u2212 1\n\n\u02c6k\nX\n\n(cid:17)(cid:17)\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , \u2200Y, B s.t. B (cid:54)= A.\n\nc =\n\n\u02c6\u03bb(A|X \u222a {S})\n\n\u02c6\u03bb(A|X )\n\n;\n\n\u03b5 >\n\n(18)\n\n(19)\n\nBy (15) and Theorem 1, the strength of the similarity effect after adding \u02c6k into K is\n\n\u02c6\u03c8(sim)\n\nA,B,S,X =\n\n\u02c6\u03bb(A|X )\n\n\u02c6\u03bb(A|X \u222a {S})\n\n\u02c6\u03bb(B|X \u222a {S})\n\n\u02c6\u03bb(B|X )\n\n\u2248 1\nc\n\n\u03bb(B|X \u222a {S})\n\n\u03bb(B|X )\n\n.\n\nBecause c can take an arbitrary value in (0,\u221e), the additional factor, (12) with k = \u02c6k, indeed allows\nA,B,S,X to take any positive value without affecting the value of \u03bb(B|Y),\u2200B (cid:54)= A, for any Y. The\n\u02c6\u03c8(sim)\n\ufb01rst part of (18) guarantees that this additional factor does not change p(X|Y) for any X if A /\u2208 Y.\nNote that what we have shown is not limited to the similarity effect of (13). The RBM choice model\ncan represent an arbitrary phenomenon where the choice set affects the ratio of the choice rate.\n\n4\n\n\fAccording to [14], the attraction effect is represented by\n\np(A|{A, B}) < p(A|{A, B, D}).\n\nX\u2208X \u03bbMLM(X|X ) \u2264 (cid:80)\n\nof Y, and we must have(cid:80)\n\n(20)\nThe MLM cannot represent the attraction effect, because the \u03bbMLM(X|Y) in (2) is independent\nX\u2208Y \u03bbMLM(X|Y) for X \u2282 Y, which in turn\nimplies the regularity principle: p(X|X ) \u2265 p(X|Y) for X \u2282 Y.\nMotivated by (20), we de\ufb01ne the strength of the attraction effect as the magnitude of the change in\nthe choice probability of an item when another item is added into the choice set. Formally,\nDe\ufb01nition 2. For A \u2208 X , the strength of the attraction effect of D on A with X is de\ufb01ned as follows:\n(21)\n\nA,D,X \u2261 p(A|X \u222a {D})\n\n\u03c8(att)\n\n.\n\np(A|X )\n\nA,D,X \u2264 1.\nA,D,X > 1. We study the strength of\n\nWhen there is no attraction effect, adding D into X can only decrease p(A|X ); hence, \u03c8(att)\nThe standard de\ufb01nition of the attraction effect (20) implies \u03c8(att)\nthis attraction effect without the restriction that A \u201cdominates\u201d D (see Figure 1 (b)).\nWe prove the following theorem in Appendix C:\nTheorem 2. Consider the two RBM choice models in Theorem 1. The \ufb01rst RBM choice model has\nthe choice rate given by (3), and the second RBM choice model has the choice rate given by (17).\nLet p(\u00b7|\u00b7) denote the choice probability for the \ufb01rst RBM choice model and \u02c6p(\u00b7|\u00b7) denote the choice\nprobability for the second RBM choice model. Consider an item A \u2208 X and an item D (cid:54)\u2208 X . For\nany r \u2208 (p(A|X \u222a {D}), 1/p(A|X )) and \u03b5 > 0, we can choose T \u02c6k\u00b7 , U \u02c6k\u00b7 such that\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) \u02c6\u03bb(B|Y)\n\n\u03bb(B|Y)\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , \u2200Y, B s.t. B (cid:54)= A.\n\n\u2212 1\n\n(22)\n\n\u02c6p(A|X \u222a {D})\n\n\u02c6p(A|X )\n\nr =\n\n;\n\n\u03b5 >\n\nWe expect that the range, (p(A|X \u222a {D}), 1/p(A|X )), of r in the theorem covers the attraction\neffect in practice. Also, this range is the widest possible in the following sense. The factor (12) can\nonly increase \u03bb(X|Y) for any X,Y. The form of (1) then implies that, to decrease p(A|Y), we must\nincrease \u03bb(X|Y) for X (cid:54)= A. However, increasing \u03bb(X|Y) for X (cid:54)= A is not allowed due to the\nsecond part of (22) with \u03b5 \u2192 0. Namely, the additional factor, (12) with k = \u02c6k, can only increase\np(A|Y) for any Y under the condition of the second part of (22). The lower limit, p(A|X \u222a {D}),\nis achieved when \u02c6p(A|X ) \u2192 1, while keeping \u02c6p(A|X \u222a {D}) \u2248 p(A|X \u222a {D}). The upper limit,\n1/p(A|X ), is achieved when \u02c6p(A|X \u222a {D}) \u2192 1, while keeping \u02c6p(A|X ) \u2248 p(A|X ).\nAccording to [18], the compromise effect is formally represented by\n\n> p(C|{A, C}) and\n\n> p(C|{B, C}). (23)\n\n(cid:88)\n\np(C|{A, B, C})\n\np(X|{A, B, C})\n\n(cid:88)\n\np(C|{A, B, C})\n\np(X|{A, B, C})\n\nX\u2208{B,C}\n\nX\u2208{A,C}\nThe MLM cannot represent the compromise effect, because the \u03bbMLM(X|Y) in (2) is independent\nof Y, which in turn makes the inequalities in (23) equalities.\nMotivated by (23), we de\ufb01ne the strength of the compromise effect as the magnitude of the change\nin the conditional probability of selecting an item, C, given that either C or another item, A, is\nselected when yet another item, B, is added into the choice set. More precisely, we also exchange\nthe roles of A and B, and study the minimum magnitude of those changes:\nDe\ufb01nition 3. For a choice set, X , and items, A, B, C, such that A, B, C \u2208 X , let\n\n\u03c6A,B,C,X \u2261\n\nwhere, for Y such that A, C \u2208 Y, we de\ufb01ne\nqAC(C|Y) \u2261\n\nqAC(C|X )\n\nqAC(C|X \\ {B})\n\n,\n\n(cid:80)\nX\u2208{A,C} p(X|Y)\n\np(C|Y)\n\n.\n\nThe strength of the compromise effect of A and B on C with X is then de\ufb01ned as\n\nA,B,C,X \u2261 min{\u03c6A,B,C,X , \u03c6B,A,C,X} .\n\u03c8(com)\n\n(24)\n\n(25)\n\n(26)\n\n5\n\n\fHere, we do not have the restriction that C is a \u201ccompromise\u201d between A and B (see Figure 1 (c)).\nIn Appendix C:we prove the following theorem:\nTheorem 3. Consider a choice set, X , and three items, A, B, C \u2208 X . Consider the two RBM choice\nmodels in Theorem 2. Let \u02c6\u03c8(com)\n\nA,B,C,X be de\ufb01ned analogously to (26) but with \u02c6p(\u00b7|\u00b7). Let\n\nq \u2261 max{qAC(C|X \\ {B}), qBC(C|X \\ {A})}\nq \u2261 min{qAC(C|X ), qBC(C|X )} .\n\nThen, for any r \u2208 (q, 1/q) and \u03b5 > 0, we can choose T k\u00b7 , U k\u00b7 such that\n\n(27)\n(28)\n\n(29)\n\nr = \u02c6\u03c8(com)\n\nA,B,C,X ;\n\n\u03b5 >\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) \u02c6\u03bb(X|Y)\n\n\u03bb(X|Y)\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , \u2200Y, X s.t. X (cid:54)= C.\n\n\u2212 1\n\nWe expect that the range of r in the theorem covers the compromising effect in practice. Also,\nthis range is best possible in the sense analogous to what we have discussed with the range in\nTheorem 2. Because the additional factor, (12) with k = \u02c6k, can only increase p(C|Y) for any Y\nunder the condition of the second part of (29), it can only increase qXC(C|Y) for X \u2208 {A, B}. The\nlower limit, q, is achieved when qXC(C|X \\ {X}) \u2192 1, while keeping qXC(C|X ) approximately\nunchanged, for X \u2208 {A, B}. The upper limit, 1/q, is achieved when qXC(C|X ) \u2192 1, while\nkeeping qXC(C|X \\ {X}) approximately unchanged, for X \u2208 {A, B}.\n\n4 Numerical experiments\n\nWe now validate the effectiveness of the RBM choice model in predicting the choices made by\nhumans. Here we use the dataset from [2], which is based on the survey conducted in Switzerland,\nwhere people are asked to choose a means of transportation from given options. A subset of the\ndataset is used to train the RBM choice model, which is then used to predict the choice in the\nremaining dataset. In Appendix B.2,we also conduct an experiment with arti\ufb01cial dataset and show\nthat the RBM choice model can indeed be trained to represent each of the typical choice phenomena.\nThis \ufb02exibility in the representation is the basis of the predictive accuracy of the RBM choice model\nto be presented in this section. All of our experiments are run on a single core of a Windows PC\nwith main memory of 8 GB and Core i5 CPU of 2.6 GHz.\nThe dataset [2] consists of 10,728 choices that 1,192 people have made from a varying choice set.\nFor those who own a car, the choice set has three items: a train, a maglev, and a car. For those who\ndo not own a car, the choice set consists of a train and a maglev. The train can operate at the interval\nof 30, 60, or 120 minutes. The maglev can operate at the interval of 10, 20, or 30 minutes. The\ntrains (or maglevs) with different intervals are considered to be distinct items in our experiment.\nFigure 3 (a) shows the empirical choice probability for each choice set. Each choice set consists of\na train with a particular interval (blue, shaded) and a maglev with a particular interval (red, mesh)\npossibly with a car (yellow, circles). The interval of the maglev varies as is indicated at the bottom\nof the \ufb01gure. The interval of the train is indicated at the left side of the \ufb01gure. For each combination\nof the intervals of the train and the maglev, there are two choice sets, with or without a car.\nWe evaluate the accuracy of the RBM choice model in predicting the choice probability for an\narbitrary choice set, when the RBM choice model is trained with the data of the choice for the\nremaining 17 choice sets (i.e., we have 18 test cases). We train the RBM choice model (or the\nMLM) by the use of discriminative training with stochastic gradient descent using the mini-batch\nof size 50 and the learning rate of \u03b7 = 0.1 (see Appendix A.1).Each run of the evaluation uses the\nentire training dataset 50 times for training, and the evaluation is repeated \ufb01ve times by varying the\ninitial values of the parameters. The elements of T and U are initialized independently with samples\nthe number of items under consideration, and |K| is the number of hidden nodes. The elements of b\nare initialized with samples from the uniform distribution on [\u22121, 1].\nFigure 3 (b) shows the Kullback-Leibler (KL) divergence between the predicted distribution of the\nchoice and the corresponding true distribution. The dots connected with a solid line show the the\n\nfrom the uniform distribution on [\u221210/(cid:112)max(|I|,|K|),\u221210/(cid:112)max(|I|,|K|)], where |I| = 7 is\n\n6\n\n\f(a) Dataset\n\n(b) Error\n\n(c) RBM\n\n(d) MLM\n\nFigure 3: Dataset (a), the predictive error of the RBM choice model against the number of hidden\nunits (b), and the choice probabilities learned by the RBM choice model (c) and the MLM (d).\n\naverage KL divergence over all of the 18 test cases and \ufb01ve runs with varying initialization. The\naverage KL divergence is also evaluated for training data and is shown with a dashed line. The\ncon\ufb01dence interval represents the corresponding standard deviation. The wide con\ufb01dence interval is\nlargely due to the variance between test instances (see Figure 4 in the appendix.The horizontal axis\nshows the number of the hidden units in the RBM choice model, where zero hidden units correspond\nto the MLM. The average KL divergence is reduced from 0.12 for the MLM to 0.02 for the RBM\nchoice model with 16 hidden units, an improvement by a factor of six.\nFigure 3 (c)-(d) shows the choice probabilities given by (a) the RBM choice model with 16 hidden\nunits and (b) the MLM, after these models are trained for the test case where the choice set consists\nof the train with 30-minute interval (Train30) and the maglev with 20-minute interval (Maglev20).\nObserve that the RBM choice model gives the choice probabilities that are close to the true choice\nprobabilities shown in Figure 3 (a), while the MLM has dif\ufb01culty in \ufb01tting these choice probabilities.\nTaking a closer look at Figure 3 (a), we can observe that the MLM is fundamentally incapable of\nlearning this dataset. For example, Train30 is more popular than Maglev20 for people who do not\nown cars, while the preference is reversed for car owners (i.e., the attraction effect). The attraction\neffect can also be seen for the combination of Maglev30 and Train60. As we have discussed in\nSection 3, the MLM cannot represent such attraction effects, but the RBM choice model can.\n\n5 Related work\n\nWe now review the prior work related to our contributions. We will see that all of the existing\nchoice models either cannot represent at least one of the typical choice phenomena or do not have\nsystematic training algorithms. We will also see that the prior work has analyzed choice models\nwith respect to whether those choice models can represent typical choice phenomena or others but\nonly in speci\ufb01c cases of speci\ufb01c strength. On the contrary, our analysis shows that the RBM choice\nmodel can represent the typical choice phenomena for all cases of the speci\ufb01ed strength.\nA majority of the prior work on the choice model is about the MLM and its variants such as the\nhierarchical MLM [5], the multinomial probit model [6], and, generally, random utility models [17].\n\n7\n\n0.00.20.40.60.81.0Train300.00.20.40.60.81.0Train60Maglev100.00.20.40.60.81.0Train120Maglev20Maglev30Train30Train60Train120Maglev10Maglev20Maglev30Car0124816Number of hidden units0.000.050.100.150.200.250.30Average KL divergenceTrainingTest0.00.20.40.60.81.0Train300.00.20.40.60.81.0Train60Maglev100.00.20.40.60.81.0Train120Maglev20Maglev30Train30Train60Train120Maglev10Maglev20Maglev30Car0.00.20.40.60.81.0Train300.00.20.40.60.81.0Train60Maglev100.00.20.40.60.81.0Train120Maglev20Maglev30Train30Train60Train120Maglev10Maglev20Maglev30Car\fIn particular, the attraction effect cannot be represented by these variants of the MLM [13].\nIn\ngeneral, when the choice probability depends only on the values that are determined independently\nfor each item (e.g., the models of [3, 7]), none of the typical choice phenomena can be represented\n[18]. Recently, Hruschka has proposed a choice model based on an RBM [9], but his choice model\ncannot represent any of the typical choice phenomena, because the corresponding choice rate is\nindependent of the choice set. It is thus nontrivial how we use the RBM as a choice model in such a\nway that the typical choice phenomena can be represented. In [11], a hierarchical Bayesian choice\nmodel is shown to represent the attraction effect in a speci\ufb01c case.\nThere also exist choice models that have been numerically shown to represent all of the typical\nchoice phenomena for some speci\ufb01c cases. For example, sequential sampling models, including\nthe decision \ufb01eld theory [4] and the leaky competing accumulator model [19], are meant to directly\nmimic the cognitive process of the human making a choice [12]. However, no paper has shown an\nalgorithm that can train a sequential sampling model in such a way that the trained model exhibits the\ntypical choice phenomena. Shenoy and Yu propose a hierarchical Bayesian model to represent the\nthree typical choice phenomena [16]. Although they perform inferences of the posterior distributions\nthat are needed to compute the choice probabilities with their model, they do not show how to train\ntheir model to \ufb01t the choice probabilities to given data. Their experiments show that their model\nrepresents the typical choice phenomena in particular cases, where the parameters of the model are\nset manually. Rieskamp et al. classify choice models according to whether a choice model can never\nrepresent a certain phenomenon or can do so in some cases to some degree [13]. The phenomena\nstudied in [13] are not limited to the typical choice phenomena, but they list the typical choice\nphenomena as the ones that are robust and signi\ufb01cant. Also, Otter et al. exclusively study all of the\ntypical choice phenomena [12].\nLuce is a pioneer of the formal analysis of choice models, which however is largely qualitative [10].\nFor example, Lemma 3 of [10] can tell us whether a given choice model satis\ufb01es the IIA in (16)\nfor all cases or it violates the IIA for some cases to some degree. We address the new question of\nto what degree a choice model can represent each of the typical choice phenomena (e.g., to what\ndegree the RBM choice model can violate the IIA).\nFinally, our theorems can be contrasted with the universal approximation theorem of RBMs, which\nstates that an arbitrary distribution can be approximated arbitrarily closely with a suf\ufb01cient number\nof hidden units [15, 8]. This is in contrast to our theorems, which show that a single hidden unit\nsuf\ufb01ces to represent the typical choice phenomena of the strength that is speci\ufb01ed in the theorems.\n\n6 Conclusion\n\nThe RBM choice model is developed to represent the typical choice phenomena that have been\nreported frequently in the literature of cognitive psychology and related areas. Our work motivates a\nnew direction of research on using RBMs to model such complex behavior of humans. Particularly\ninteresting behavior includes the one that is considered to be irrational or the one that results from\ncognitive biases (see e.g. [1]). The advantages of the RBM choice model that are demonstrated in\nthis paper include their \ufb02exibility in representing complex behavior and the availability of effective\ntraining algorithms.\nThe RBM choice model can incorporate the attributes of the items in its parameters. Speci\ufb01cally,\none can represent the parameters of the RBM choice model as functions of uX, the attributes of\nX \u2208 I analogously to the MLM, where bX can be represented as bX = \u03b1\u00b7 uX as we have discussed\nafter (2). The focus of this paper is in designing the fundamental structure of the RBM choice model\nand analyzing its fundamental properties, and the study about the RBM choice model with attributes\nwill be reported elsewhere. Although the attributes are important for generalization of the RBM\nmodel to unseen items, our experiments suggest that the RBM choice model, without attributes, can\nlearn the typical choice phenomena from a given choice set and generalize it to unseen choice sets.\n\nAcknowledgements\n\nA part of this research is supported by JST, CREST.\n\n8\n\n\fReferences\n[1] D. Ariely. Predictably Irrational: The Hidden Forces That Shape Our Decisions. Harper\n\nPerennial, revised and expanded edition, 2010.\n\n[2] M. Bierlaire, K. Axhausen, and G. Abay. The acceptance of modal innovation: The case of\nSwissmetro. In Proceedings of the First Swiss Transportation Research Conference, March\n2001.\n\n[3] E. Bonilla, S. Guo, and S. Sanner. Gaussian process preference elicitation.\n\nIn J. Lafferty,\nC. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, editors, Advances in Neural\nInformation Processing Systems 23, pages 262\u2013270. 2010.\n\n[4] J. R. Busemeyer and J. T. Townsend. Decision \ufb01eld theory: A dynamic cognition approach to\n\ndecision making. Psychological Review, 100:432\u2013459, 1993.\n\n[5] O. Chapelle and Z. Harchaoui. A machine learning approach to conjoint analysis. In L. K.\nSaul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17,\npages 257\u2013264. 2005.\n\n[6] B. Eric, N. de Freitas, and A. Ghosh. Active preference learning with discrete choice data.\nIn J. C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information\nProcessing Systems 20, pages 409\u2013416. 2008.\n\n[7] V. F. Farias, S. Jagabathula, and D. Shah. A nonparametric approach to modeling choice with\n\nlimited data. Management Science, 59(2):305\u2013322, 2013.\n\n[8] Y. Freund and D. Haussler. Unsupervised learning of distributions on binary vectors using two\nlayer networks. Technical Report UCSC-CRL-94-25, University of California, Santa Cruz,\nJune 1994.\n\n[9] H. Hruschka. Analyzing market baskets by restricted Boltzmann machines. OR Spectrum,\n\npages 1\u201322, 2012.\n\n[10] R. D. Luce. Individual choice behavior: A theoretical analysis. John Wiley and Sons, New\n\nYork, NY, 1959.\n\n[11] T. Osogami and T. Katsuki. A hierarchical Bayesian choice model with visibility.\n\nIn Pro-\nceedings of the 22nd International Conference on Pattern Recognition (ICPR 2014), pages\n3618\u20133623, August 2014.\n\n[12] T. Otter, J. Johnson, J. Rieskamp, G. M. Allenby, J. D. Brazell, A. Diederich, J. W. Hutchinson,\nS. MacEachern, S. Ruan, and J. Townsend. Sequential sampling models of choice: Some recent\nadvances. Marketing Letters, 19(3-4):255\u2013267, 2008.\n\n[13] J. Rieskamp, J. R. Busemeyer, and B. A. Mellers. Extending the bounds of rationality: Evi-\ndence and theories of preferential choice. Journal of Economic Literature, 44:631\u2013661, 2006.\n[14] R. M. Roe, J. R. Busemeyer, and J. T. Townsend. Multialternative decision \ufb01eld theory: A\ndynamic connectionist model of decision making. Psychological Review, 108(2):370\u2013392,\n2001.\n\n[15] N. L. Roux and Y. Bengio. Representational power of restricted Boltzmann machines and deep\n\nbelief networks. Neural Computation, 20(6):1631\u20131649, 2008.\n\n[16] P. Shenoy and A. J. Yu. Rational preference shifts in multi-attribute choice: What is fair?\nIn Proceedings of the Annual Meeting of the Cognitive Science Society (CogSci 2013), pages\n1300\u20131305, 2013.\n\n[17] K. Train. Discrete Choice Methods with Simulation. Cambridge University Press, second\n\nedition, 2009.\n\n[18] A. Tversky and I. Simonson.\n\n39(10):1179\u20131189, 1993.\n\nContext-dependent preferences. Management Science,\n\n[19] M. Usher and J. L. McClelland. Loss aversion and inhibition in dynamical models of multial-\n\nternative choice. Psychological Review, 111(3):757\u2013769, 2004.\n\n9\n\n\f", "award": [], "sourceid": 66, "authors": [{"given_name": "Takayuki", "family_name": "Osogami", "institution": "IBM Research - Tokyo"}, {"given_name": "Makoto", "family_name": "Otsuka", "institution": "IBM Research - Tokyo"}]}