{"title": "Group Retention when Using Machine Learning in Sequential Decision Making: the Interplay between User Dynamics and Fairness", "book": "Advances in Neural Information Processing Systems", "page_first": 15269, "page_last": 15278, "abstract": "Machine Learning (ML) models trained on data from multiple demographic groups can inherit representation disparity (Hashimoto et al., 2018) that may exist in the data: the model may be less favorable to groups contributing less to the training process; this in turn can degrade population retention in these groups over time, and exacerbate representation disparity in the long run. In this study, we seek to understand the interplay between ML decisions and the underlying group representation, how they evolve in a sequential framework, and how the use of fairness criteria plays a role in this process. We show that the representation disparity can easily worsen over time under a natural user dynamics (arrival and departure) model when decisions are made based on a commonly used objective and fairness criteria, resulting in some groups diminishing entirely from the sample pool in the long run. It highlights the fact that fairness criteria have to be defined while taking into consideration the impact of decisions on user dynamics. Toward this end, we explain how a proper fairness criterion can be selected based on a general user dynamics model.", "full_text": "Group Retention when Using Machine Learning in\nSequential Decision Making: the Interplay between\n\nUser Dynamics and Fairness\n\nUniversity of Michigan, AnnArbor, USA\n\nUniversity of Michigan, AnnArbor, USA\n\nXueru Zhang\u2217\n\nxueru@umich.edu\n\nMohammad Mahdi Khalili\u2217\n\nkhalili@umich.edu\n\nCem Tekin\n\nBilkent University, Ankara, Turkey\ncemtekin@ee.bilkent.edu.tr\n\nMingyan Liu\n\nUniversity of Michigan, AnnArbor, USA\n\nmingyan@umich.edu\n\nAbstract\n\nMachine Learning (ML) models trained on data from multiple demographic groups\ncan inherit representation disparity [7] that may exist in the data:\nthe model\nmay be less favorable to groups contributing less to the training process; this in\nturn can degrade population retention in these groups over time, and exacerbate\nrepresentation disparity in the long run. In this study, we seek to understand the\ninterplay between ML decisions and the underlying group representation, how they\nevolve in a sequential framework, and how the use of fairness criteria plays a role in\nthis process. We show that the representation disparity can easily worsen over time\nunder a natural user dynamics (arrival and departure) model when decisions are\nmade based on a commonly used objective and fairness criteria, resulting in some\ngroups diminishing entirely from the sample pool in the long run. It highlights\nthe fact that fairness criteria have to be de\ufb01ned while taking into consideration the\nimpact of decisions on user dynamics. Toward this end, we explain how a proper\nfairness criterion can be selected based on a general user dynamics model.\n\n1\n\nIntroduction\n\nMachine learning models developed from real-world data can inherit pre-existing bias in the dataset.\nWhen these models are used to inform decisions involving humans, it may exhibit similar discrimi-\nnation against sensitive attributes (e.g., gender and race) [6, 14, 15]. Moreover, these decisions can\nin\ufb02uence human actions, such that bias in the decision is then captured in the dataset used to train\nfuture models. This closed feedback loop becomes self-reinforcing and can lead to highly undesirable\noutcomes over time by allowing biases to perpetuate. For example, speech recognition products such\nas Amazon\u2019s Alexa and Google Home are shown to have accent bias against non-native speakers [6],\nwith native speakers experience much higher quality than non-native speakers. If this difference leads\nto more native speakers using such products while driving away non-native speakers, then over time\nthe data used to train the model may become even more skewed toward native speakers, with fewer\nand fewer non-native samples. Without intervention, the resulting model becomes even more accurate\nfor the former and less for the latter, which then reinforces their respective user experience [7].\nTo address the fairness issues, one commonly used approach is to impose fairness criteria such that\ncertain statistical measures (e.g., positive classi\ufb01cation rate, false positive rate, etc.) across different\n\n\u2217Equal contribution\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fdemographic groups are (approximately) equalized [1]. However, their effectiveness is studied mostly\nin a static framework, where only the immediate impact of the learning algorithm is assessed but\nnot their long-term consequences. Consider an example where a lender decides whether or not to\napprove a loan application based on the applicant\u2019s credit score. Decisions satisfying an identical\ntrue positive rate (equal opportunity) across different racial groups can make the outcome seem\nfairer [5]. However, this can potentially result in more loans issued to less quali\ufb01ed applicants in\nthe group whose score distribution skews toward higher default risk. The lower repayment among\nthese individuals causes their future credit scores to drop, which moves the score distribution of that\ngroup further toward high default risk [13]. This shows that intervention by imposing seemingly fair\ndecisions in the short term can lead to undesirable results in the long run.\nIn this paper we are particularly interested in understanding what happens to group representation\nover time when models with fairness guarantee are used, and how it is affected when the underlying\nfeature distributions are also affected/reshaped by decisions. Toward this end, we introduce a user\nretention model to capture users\u2019 reaction (stay or leave) to the decision. We show that under relatively\nmild and benign conditions, group representation disparity exacerbates over time and eventually\nthe disadvantaged groups may diminish entirely from the system. This condition unfortunately can\nbe easily satis\ufb01ed when decisions are made based on a typical algorithm (e.g., taking objective as\nminimizing the total loss) under some commonly used fairness criteria (e.g., statistical parity, equal\nof opportunity, etc.). Moreover, this exacerbation continues to hold and can accelerate when feature\ndistributions are affected and change over time. A key observation is that if the factors equalized\nby the fairness criterion do not match what drives user retention, then the difference in (perceived)\ntreatment will exacerbate representation disparity over time. Therefore, fairness has to be de\ufb01ned\nwith a good understanding of how users are affected by the decisions, which can be challenging in\npractice as we typically have only incomplete/imperfect information. However, we show that if a\nmodel for the user dynamics is available, then it is possible to \ufb01nd the proper fairness criterion that\nmitigates representation disparity.\nThe impact of fairness intervention on both individuals and society has been studied in [7, 9, 10, 12,\n13] and [7, 9, 13] are the most relevant to the present study. Speci\ufb01cally, [9, 13] focus on the impact\non reshaping features over two time steps, while we study the impact on group representation over\nan in\ufb01nite horizon. [7] studies group representation disparity in a sequential framework but without\ninspecting the impact of fairness criteria or considering feature distributions reshaped by decision.\nMore on related work can be found in Appendix B.\nThe remainder of this paper is organized as follows. Section 2 formulates the problem. The impact of\nvarious fairness criteria on group representation disparity is analyzed and presented in Section 3, as\nwell as potential mitigation. Experiments are presented in Section 4. Section 5 concludes the paper.\nAll proofs and a table of notations can be found in the appendices.\n\nk(t) + \u03b11\n\nk(t)\n\nk(t) the size of Gj\n\nk \u2282 Gk the subgroup with label j, j \u2208 {0, 1}, k \u2208 {a, b}, f j\n\n2 Problem Formulation\nConsider two demographic groups Ga, Gb distinguished based on some sensitive attribute K \u2208 {a, b}\n(e.g., gender, race). An individual from either group has feature X \u2208 Rd and label Y \u2208 {0, 1}, both\ncan be time varying. Denote by Gj\nk,t(x)\nits feature distribution and \u03b1j\nk as a fraction of the entire population at time t. Then\nk(t) is the size of Gk as a fraction of the population and the difference between\n\u03b1k(t) := \u03b10\n\u03b1a(t) and \u03b1b(t) measures the representation disparity between two groups at time step t. Denote by\nk,t = \u03b1j\ngj\n\u03b1k(t) the fraction of label j \u2208 {0, 1} in group k at time t, then the distribution of X over Gk\nis given by fk,t(x) = g1\nConsider a sequential setting where the decision maker at each time makes a decision on each\nindividual based on feature x. Let h\u03b8(x) be the decision rule parameterized by \u03b8 \u2208 Rd and \u03b8k(t) be\nthe decision parameter for Gk at time t, k \u2208 {a, b}. The goal of the decision maker at time t is to \ufb01nd\nthe best parameters \u03b8a(t), \u03b8b(t) such that the corresponding decisions about individuals from Ga, Gb\nmaximize its utility (or minimize its loss) in the current time. Within this context, the commonly\nstudied fair machine learning problem is the one-shot problem stated as follows, at time step t:\n\nk,t(x) and fa,t (cid:54)= fb,t.\n\nk,tf 1\n\nk,t(x) + g0\n\nk,tf 0\n\nmin\n\u03b8a,\u03b8b\n\nOOOt(\u03b8a, \u03b8b; \u03b1a(t), \u03b1b(t)) = \u03b1a(t)Oa,t(\u03b8a) + \u03b1b(t)Ob,t(\u03b8b) s.t. \u0393C,t(\u03b8a, \u03b8b) = 0 ,\n\n(1)\n\n2\n\n\fwhere OOOt(\u03b8a, \u03b8b; \u03b1a(t), \u03b1b(t)) is the overall objective of the decision maker at time t, which consists\nof sub-objectives from two groups weighted by their group proportions.2 \u0393C,t(\u03b8a, \u03b8b) = 0 charac-\nterizes fairness constraint C, which requires the parity of certain statistical measure (e.g., positive\nclassi\ufb01cation rate, false positive rate, etc.) across different demographic groups. Some commonly\nused criteria will be elaborated in Section 3.1. Both Ok,t(\u03b8k) and \u0393C,t(\u03b8a, \u03b8b) = 0 depend on fk,t(x).\nThe resulting solution (\u03b8a(t), \u03b8b(t)) will be referred to as the one-shot fair decision under fairness C,\nwhere the optimality only holds for a single time step t.\nIn this study, we seek to understand how the group representation evolves in a sequential setting over\nthe long run when different fairness criteria are imposed. To do so, the impact of the current decision\non the size of the underlying population is modeled by the following discrete-time retention/attrition\ndynamics. Denote by Nk(t) \u2208 R+ the expected number of users in group k at time t:\n\nNk(t + 1) = Nk(t) \u00b7 \u03c0k,t(\u03b8k(t)) + \u03b2k ,\u2200k \u2208 {a, b},\n\n(2)\nwhere \u03c0k,t(\u03b8k(t)) is the retention rate, i.e., the probability of a user from Gk who was in the system at\ntime t remaining in the system at time t + 1. This is assumed to be a function of the user experience,\nwhich could be the actual accuracy of the algorithm or their perceived (mis)treatment. This experience\nis determined by the application and is different under different contexts. For instance, in domains of\nspeaker veri\ufb01cation and medical diagnosis, it can be considered as the average loss, i.e., a user stays\nif he/she can be classi\ufb01ed correctly; in loan/job application scenarios, it can be the rejection rates,\ni.e., user stays if he/she gets approval. \u03b2k is the expected number of exogenous arrivals to Gk and\nis treated as a constant in our analysis, though our main conclusion holds when this is modeled as\na random variable. Accordingly, the relative group representation for time step t + 1 is updated as\n\u03b1k(t + 1) =\n\nNk(t+1)\n\nNa(t+1)+Nb(t+1) ,\u2200k \u2208 {a, b}.\n\nFor the remainder of this paper, \u03b1a(t)\n\u03b1b(t) is used to measure the group representation disparity at time t.\nAs \u03b1k(t) and fk,t(x) change over time, the one-shot problem (1) is also time varying. In the next\nsection, we examine what happens to \u03b1a(t)\n\n\u03b1b(t) when one-shot fair decisions are applied in each step.\n\n3 Analysis of Group Representation Disparity in the Sequential Setting\n\nBelow we present results on the monotonic change of \u03b1a(t)\n\u03b1b(t) when applying one-shot fair decisions in\neach step. It shows that the group representation disparity can worsen over time and may lead to the\nextinction of one group under a monotonicity condition stated as follows.\nMonotonicity Condition. Consider two one-shot problems de\ufb01ned in (1) with objectives\n\n,\n\n(cid:98)OOO(\u03b8a, \u03b8b;(cid:98)\u03b1a,(cid:98)\u03b1b) and (cid:101)OOO(\u03b8a, \u03b8b;(cid:101)\u03b1a,(cid:101)\u03b1b) over distributions (cid:98)fk(x), (cid:101)fk(x) respectively. Let ((cid:98)\u03b8a,(cid:98)\u03b8b),\nity condition given a dynamic model if for any(cid:98)\u03b1a +(cid:98)\u03b1b = 1 and(cid:101)\u03b1a +(cid:101)\u03b1b = 1 such that (cid:98)\u03b1a(cid:98)\u03b1b\n< (cid:101)\u03b1a(cid:101)\u03b1b\n((cid:101)\u03b8a,(cid:101)\u03b8b) be the corresponding fair decisions. We say that two problems(cid:98)OOO and(cid:101)OOO satisfy the monotonic-\nthe resulting retention rates satisfy(cid:98)\u03c0a((cid:98)\u03b8a) <(cid:101)\u03c0a((cid:101)\u03b8a) and(cid:98)\u03c0b((cid:98)\u03b8b) >(cid:101)\u03c0b((cid:101)\u03b8b).\n\nNote that this condition is de\ufb01ned over two one-shot problems and a given dynamic model. It is not\nlimited to speci\ufb01c families of objective or constraint functions; nor is it limited to one-dimensional\nfeatures. The only thing that matters is the group proportions within the system and the retention\nrates determined by the decisions and the dynamics. It characterizes a situation where when one\ngroup\u2019s representation increases, the decision becomes more in favor of this group and less favorable\nto the other, so that the retention rate is higher for the favored group and lower for the other.\nTheorem 1. [Exacerbation of representation disparity] Consider a sequence of one-shot problems (1)\nwith objective OOOt(\u03b8a, \u03b8b; \u03b1a(t), \u03b1b(t)) at each time t. Let (\u03b8a(t), \u03b8b(t)) be the corresponding solution\nand \u03c0k,t(\u03b8k(t)) be the resulting retention rate of Gk, k \u2208 {a, b} under a dynamic model (2). If the\n, Nk(2) > Nk(1),3 and one-shot problems in any two consecutive\ninitial states satisfy Na(1)\ntime steps, i.e., OOOt, OOOt+1, satisfy the monotonicity condition under the given dynamic model, then\n2This is a typical formulation if the objective OOOt measures the average performance of decisions over\n|Ga|+|Gb| (|Ga|Oa,t + |Gb|Ob,t), where Oi\nt\ni\u2208Gk\n\nall samples, i.e., OOOt =\nOi\nmeasures the performance of each sample i and Ok,t = 1|Gk|\n\n|Ga|+|Gb| ((cid:80)\n\nt is the average performance of Gk.\n\nt +(cid:80)\n\nNb(1) = \u03b2a\n\n\u03b2b\n\n(cid:80)\n\nt) =\n\ni\u2208Ga\n\nOi\n\n1\n\nOi\n\n3This condition will always be satis\ufb01ed when the system starts from a near empty state.\n\n1\n\ni\u2208Gb\n\n3\n\n\f\u03b1a(t+1)\n\n\u03b1b(t) and \u03c0a,t+1(\u03b8a(t + 1)) (cid:5) \u03c0a,t(\u03b8a(t)) (cid:5) \u03c0b,t(\u03b8b(t)) (cid:5) \u03c0b,t+1(\u03b8b(t + 1)), \u2200t.\n\nthe following holds. Let (cid:5) denote either \u201c < \u201d or \u201c = \u201d or \u201c > \u201d, if \u03c0a,1(\u03b8a(1)) (cid:5) \u03c0b,1(\u03b8b(1)), then\n\u03b1b(t+1) (cid:5) \u03b1a(t)\nTheorem 1 says that once a group\u2019s proportion starts to change (increase or decrease), it will continue\nto change in the same direction. This is because under the monotonicity condition, there is a feedback\nloop between representation disparity and the one-shot decisions: the former drives the latter which\nresults in different user retention rates in the two groups, which then drives future representation.\nThe monotonicity condition can be satis\ufb01ed under some commonly used objectives, dynamics and\nfairness criteria. This is characterized in the following theorem.\nTheorem 2. [A case satisfying monotonicity condition] Consider two one-shot problems de\ufb01ned in\n\n(1) with objectives (cid:101)O(\u03b8a, \u03b8b;(cid:98)\u03b1a,(cid:98)\u03b1b) =(cid:98)\u03b1aOa(\u03b8a) +(cid:98)\u03b1bOb(\u03b8b) and (cid:98)O(\u03b8a, \u03b8b;(cid:101)\u03b1a,(cid:101)\u03b1b) =(cid:101)\u03b1aOa(\u03b8a) +\n(cid:101)\u03b1bOb(\u03b8b) over the same distribution fk(x) with(cid:98)\u03b1a +(cid:98)\u03b1b = 1 and(cid:101)\u03b1a +(cid:101)\u03b1b = 1. Let ((cid:98)\u03b8a,(cid:98)\u03b8b), ((cid:101)\u03b8a,(cid:101)\u03b8b)\nbe the corresponding solutions. Under the condition that Ok((cid:98)\u03b8k) (cid:54)= Ok((cid:101)\u03b8k) for all possible(cid:98)\u03b1k (cid:54)=(cid:101)\u03b1k,\nif the dynamics satisfy \u03c0k(\u03b8k) = hk(Ok(\u03b8k)) for some decreasing function hk(\u00b7), then (cid:101)O and (cid:98)O\n\nsatisfy the monotonicity condition.\n\nThe above theorem identi\ufb01es a class of cases satisfying the monotonicity condition; these are cases\nwhere whenever the group proportion changes, the decision will cause the sub-objective function\nvalue to change as well, and the sub-objective function value drives user departure.\nFor the rest of the paper we will focus on the one-dimensional setting. Some of the cases we consider\nare special cases of Theorem 2 (Sec. 3.2). Others such as the time-varying feature distribution fk,t(x)\nconsidered in Sec. 3.3 also satisfy the monotonicity condition but are not captured by Theorem 2.\n\n3.1 The one-shot problem\nConsider a binary classi\ufb01cation problem based on feature X \u2208 R. Let decision rule h\u03b8(x) = 1(x \u2265 \u03b8)\nbe a threshold policy parameterized by \u03b8 \u2208 R and L(y, h\u03b8(x)) = 1(y (cid:54)= h\u03b8(x)) the 0-1 loss incurred\nby applying decision \u03b8 on individuals with data (x, y).\nThe goal of the decision maker at each time is to \ufb01nd a pair (\u03b8a(t), \u03b8b(t)) subject to criterion C\nsuch that the total expected loss is minimized, i.e., OOOt(\u03b8a, \u03b8b; \u03b1a(t), \u03b1b(t)) = \u03b1a(t)La,t(\u03b8a) +\n\u03b1b(t)Lb,t(\u03b8b), where Lk,t(\u03b8k) = g1\nk,t(x)dx is the expected loss Gk\nf 0\nexperiences at time t. Some examples of \u0393C,t(\u03b8a, \u03b8b) are as follows and illustrated in Fig. 1.\n\nk,t(x)dx + g0\n\nk,t(cid:82) \u03b8k\u2212\u221e f 1\n\n\u03b8k\n\nk,t(cid:82) \u221e\na,t(x)dx \u2212(cid:82) \u221e\nfa,t(x)dx \u2212(cid:82) \u221e\n\n\u03b8b\n\n\u03b8b\n\n\u03b8a\n\nf 0\n\n\u03b8a\n\nthe same decision parameter is used for both groups.\n\n1. Simple fair (Simple): \u0393Simple,t = \u03b8a \u2212 \u03b8b. Imposing this criterion simply means we ensure\n2. Equal opportunity (EqOpt): \u0393EqOpt,t =(cid:82) \u221e\nb,t(x)dx. This requires the\nf 0\nfalse positive rate (FPR) be the same for different groups (Fig. 1(c)),4 i.e., Pr(h\u03b8a (X) =\n1|Y = 0, K = a) = Pr(h\u03b8b (X) = 1|Y = 0, K = b).\n3. Statistical parity (StatPar): \u0393StatPar,t = (cid:82) \u221e\nfb,t(x)dx. This requires\ndifferent groups be given equal probability of being labelled 1 (Fig. 1(b)), i.e., Pr(h\u03b8a (X) =\n1|K = a) = Pr(h\u03b8b (X) = 1|K = b).\n4. Equalized loss (EqLos): \u0393EqLos,t = La,t(\u03b8a) \u2212 Lb,t(\u03b8b). This requires that the expected\nloss across different groups be equal (Fig. 1(d)).\na, \u03b8(cid:48)\nb)\n\nNotice that for Simple, EqOpt and StatPar criteria, the following holds: \u2200t, (\u03b8a, \u03b8b), and (\u03b8(cid:48)\nthat satisfy \u0393C,t(\u03b8a, \u03b8b) = \u0393C,t(\u03b8(cid:48)\nb) = 0, we have \u03b8a \u2265 \u03b8(cid:48)\nSome technical assumptions on the feature distributions are in order.\nb,t(x) have bounded support on\nWe assume f 0\na,t(x), f 0\na,t(x), f 1\n0\nt ] and [b1\nt ], [b0\nk,t(x) and\nt , a1\nt , a0\n[a0\nt , b\n1\n0\nk,t(x) overlap, i.e., a0\nt . The\nf 0\nt < a1\nt < b\nt < b\nmain technical assumption is stated as follows.\n\nb,t(x), f 1\n1\nt ] respectively, and that f 1\nt , b\nt < a0\nt < a1\n\na if and only if \u03b8b \u2265 \u03b8(cid:48)\nb.\n\nk,t(x), k \u2208 {a, b}\n\nt and b0\n\nFig. 2: f j\n\nt < b1\n\nt ], [a1\n\na, \u03b8(cid:48)\n\n4Depending on the context, this criterion can also refer to equal false negative rate (FNR), true positive rate\n\n(TPR), or true negative rate (TNR), but the analysis is essentially the same.\n\n4\n\nk0tk1tk0tk1t0.000.020.04probabilitydensityf0k,t(x)f1k,t(x)\f(a) each f j\n\nk (x) for Gj\n\nk\n\n(b) Statistical parity\n\n(c) Equal opportunity\n\n(d) Equalized Loss\n\nFig. 1: For Ga, Gb with group proportions \u03b11\nfair under each criterion stated in Fig. 1(b)-1(d) requires the corresponding colored areas be equal.\n\na = 0.55, \u03b10\n\na = 0.15, \u03b11\n\nb = 0.1, \u03b10\n\nb = 0.2, a pair of (\u03b8a, \u03b8b) is\n\na,t(x) (resp. f 0\n\nt , a0\nb,t(x) and f 1\n\nk,t(x) is strictly increasing and f 0\n\n0\nt ] (resp. Tb,t = [b1\nt ]) be the overlapping interval between f 0\nt , b\nb,t(x)). Distribution f 1\n\nAssumption 1. Let Ta,t = [a1\nand f 1\nstrictly decreasing over Tk,t, \u2200k \u2208 {a, b}.\nFor bell-shaped feature distributions (e.g., Normal, Cauchy, etc.), Assumption 1 implies that f 1\nk,t(x)\nand f 0\nk,t(x) are suf\ufb01ciently separated. An example is shown in Fig. 2. As we show later, this\nassumption helps us establish the monotonic convergence of decisions (\u03b8a(t), \u03b8b(t)) but is not\nnecessary for the convergence of group representation. We next \ufb01nd the one-shot decision to this\nproblem under Simple, EqOpt, and StatPar fairness criteria.\nLemma 1. Under Assumption 1, \u2200k \u2208 {a, b}, the optimal decision at time t for Gk without\nconsidering fairness is\n\na,t(x)\nk,t(x) is\n\nk,tf 1\n\nk(t), k\n\n\u03b8k\n\n0\n\n0\nt ) > g0\n\nk,tf 0\n\nk,t(k\n\n0\nt )\n\nt , \u03b8\u2217\n\nk,tf 1\n\nk,t(k\n\nk,tf 0\nk,tf 0\nk,tf 0\n\nk,tf 1\nk,tf 1\nk,tf 1\n\nk,t(k1\nk,t(k1\nk,t(k\n\n\u03b8\u2217\nk(t) = arg min\n\na(t) = \u03b4a,t and \u03b8\u2217\n\nLk,t(\u03b8k) =\uf8f1\uf8f4\uf8f2\uf8f4\uf8f3\n\nk(t)] and increasing over [\u03b8\u2217\n\nk,t(\u03b4k,t). Moreover, Lk,t(\u03b8k) is decreas-\n\nk,t(k1\nt )\nk,t(k1\nt ) & g1\n0\nk,t(k\nt )\n\nt ) \u2265 g0\nt ) < g0\nt ) \u2264 g0\nk,t(\u03b4k,t) = g0\nk,tf 0\n1\nt ].\n\nk1\nt , if g1\n\u03b4k,t, if g1\n0\nt , if g1\nk\nwhere \u03b4k,t \u2208 Tk,t is de\ufb01ned such that g1\ning in \u03b8k over [k0\nBelow we will focus on the case when \u03b8\u2217\nb (t) = \u03b4b,t, while analysis for the other\ncases are essentially the same. For Simple, StatPar and EqOpt fairness, \u2203 a strictly increasing\nfunction \u03c6C,t, such that \u0393C,t(\u03c6C,t(\u03b8b), \u03b8b) = 0. Denote by \u03c6\u22121C,t the inverse of \u03c6C,t. Without loss of\ngenerality, we will assign group labels a and b such that \u03c6C,t(\u03b4b,t) < \u03b4a,t and \u03c6\u22121C,t(\u03b4a,t) > \u03b4b,t, \u2200t. 5\nLemma 2. Under Simple, EqOpt, StatPar fairness criteria, one-shot fair decision at time t satis\ufb01es\n(\u03b8\u2217\na(t), \u03b8\u2217\n\u03b1a(t)La,t(\u03b8a)+\u03b1b(t)Lb,t(\u03b8b) \u2208 {(\u03b8a, \u03b8b)|\u03b8a \u2208 [\u03c6C,t(\u03b4b,t), \u03b4a,t], \u03b8b \u2208\n[\u03b4b,t, \u03c6\u22121C,t(\u03b4a,t)], \u0393C,t(\u03b8a, \u03b8b) = 0} (cid:54)= \u2205 regardless of group proportions \u03b1a(t), \u03b1b(t).\nLemma 2 shows that given feature distributions fa,t(x), fb,t(x), although one-shot fair decisions can\nbe different under different group proportions \u03b1a(t), \u03b1b(t), these solutions are all bounded by the\nsame compact intervals (Fig. 3). Theorem 3 below describes the more speci\ufb01c relationship between\ngroup representation \u03b1a(t)\nTheorem 3. [Impact of group representation disparity on the one-shot decision] Consider the\none-shot problem with group proportions \u03b1a(t), \u03b1b(t) at time step t, let (\u03b8a(t), \u03b8b(t)) be the corre-\nsponding one-shot decision under either Simple, EqOpt or StatPar criterion. Under Assumption 1,\n(\u03b8a(t), \u03b8b(t)) is unique and satis\ufb01es the following:\n\n\u03b1b(t) and the corresponding one-shot decision (\u03b8a(t), \u03b8b(t)).\n\nb (t)) = arg min\u03b8a,\u03b8b\n\n\u03a8C,t(\u03b8a(t), \u03b8b(t)) =\n\n\u03b1a(t)\n\u03b1b(t)\n\n,\n\n(3)\n\nwhere \u03a8C,t is some function increasing in \u03b8a(t) and \u03b8b(t), with details illustrated in Table 1.\n\n5If the change of fa,t(x) and fb,t(x) w.r.t. the decisions follows the same rule (e.g., examples given in\n\nSection 3.3), then this relationship holds \u2200t.\n\n5\n\nprobability density\f\u03b8a \u2208 [a0\n\nt , a1\n\n(cid:16) g1\n\nb,t\n\ng0\n\nb,t\n\nf 1\nb,t(\u03b8b)\nf 0\nb,t(\u03b8b)\n\nt ], \u03b8b \u2208 Tb,t\n(cid:17) g0\n\n\u2212 1\n\nb,t\n\ng0\n\na,t\n\nEqOpt\n\nStatPar\n\n1 \u2212\n\nSimple\n\n2\n\nb,t\n\nf 1\nf 0\n\nb,t\n\n(\u03b8b )\n(\u03b8b )\n\n+1\n\nb,t\n\ng1\ng0\n\nb,t\n\n(cid:16)\n\n1 \u2212\n\nb,t\n\ng1\ng0\n\nb,t\n\n\u03b8a \u2208 Ta,t, \u03b8b \u2208 Tb,t\n\n\u03b8a \u2208 Ta,t, \u03b8b \u2208 [b\n\n0\nt , b\n\n1\nt ]\n\nb,t\n\ng1\nb,t\ng0\n1\u2212 g1\ng0\n\nb,t\n\n\u22121\n\nf 1\nb,t (\u03b8b )\nf 0\n(\u03b8b )\nf 1\na,t(\u03b8a )\nf 0\na,t(\u03b8a )\n\n(cid:17)(cid:16)\n\na,t\n\na,t\n\n2\n\ng0\nb,t\ng0\n\na,t\n\nf 1\nf 0\n\nb,t\n\n+1\n\n(\u03b8b )\n(\u03b8b )\nb,t\nb,t(\u03b8b)\u2212g0\nb,tf 1\ng1\na,t(\u03b8a)\u2212g1\ng0\na,tf 0\n\n2\na,tf 1\n1\u2212 g1\na,t(\u03b8a)\ng0\na,tf 0\na,t(\u03b8a)\n\nb,tf 0\na,tf 1\n\nb,t(\u03b8b)\na,t(\u03b8a)\n\n(cid:17)\n\n\u2212 1\n\n\u2212 1\n\n2\nf 1\na,t(\u03b8a )\nf 0\na,t(\u03b8a )\n\n1\u2212 g1\ng0\n\na,t\n\na,t\n\nTable 1: The form of \u03a8C,t(\u03b8a, \u03b8b) for C = EqOpt, StatPar, Simple.6\n\nk,t(\u03b8k)\nk,t(\u03b8k) and g1\n\nk,tf 1\nNote that under Assumption 1, both g1\nk,t(\u03b8k) are strictly increasing\nk,tf 0\ng0\nin \u03b8k \u2208 Tk,t, k \u2208 {a, b}, and \u03b8a(t) = \u03c6C,t(\u03b8b(t)) for some strictly increasing function. According\nto \u03a8C,t(\u03b8a, \u03b8b) given in Table 1, the larger \u03b1a(t)\nk,t(\u03b8k) \u2212\nk,t(\u03b8k), thus the larger \u03b8a(t) and \u03b8b(t). The above theorem characterizes the impact of the\ng0\nk,tf 0\nunderlying population on the one-shot decisions. Next we investigate how the one-shot decision\nimpacts the underlying population.\n\n\u03b1b(t) results in the larger g1\n\nk,t(\u03b8k)\nk,t(\u03b8k) and g1\n\nk,t(\u03b8k)\u2212 g0\n\nk,tf 1\nk,tf 0\ng0\n\nk,tf 0\n\nk,tf 1\n\nk,tf 1\n\n3.2 Participation dynamics\n\nHow a user reacts to the decision is captured by the retention dynamics (2) which is fully characterized\nby the retention rate. Below we introduce two types of (perceived) mistreatment as examples when\nthe monotonicity condition is satis\ufb01ed.\n(1) User departure driven by model accuracy: Examples include discontinuing the use of products\nviewed as error-prone, e.g., speech recognition software, or medical diagnostic tools. In these\ncases, the determining factor is the classi\ufb01cation error, i.e., users who experience low accuracy\nhave a higher probability of leaving the system. The retention rate at time t can be modeled as\n\u03c0k,t(\u03b8k) = \u03bd(Lk,t(\u03b8k)) for some strictly decreasing function \u03bd(\u00b7) : [0, 1] \u2192 [0, 1].\n(2) User departure driven by intra-group disparity: Participation can also be affected by intra-\ngroup disparity, that between users from the same demographic group but with different labels, i.e.,\nGj\nk for j \u2208 {0, 1}. An example is in making \ufb01nancial assistance decisions where one expects to\nsee more awards given to those quali\ufb01ed than to those unquali\ufb01ed. Denote by Dk,t(\u03b8k) = Pr(Y =\n1, h\u03b8k (X) = 1|K = k)\u2212Pr(Y = 0, h\u03b8k (X) = 1|K = k) =(cid:82) \u221e\nk,t(x)(cid:1)dx as intra-\n\u03b8k (cid:0)g1\ngroup disparity of Gk at time t, then the retention rate can be modeled as \u03c0k,t(\u03b8k) = w(Dk,t(\u03b8k))\nfor some strictly increasing function w(\u00b7) mapping to [0, 1].\nTheorem 4. Consider the one-shot problem (1) de\ufb01ned in Sec. 3.1 under either Simple, EqOpt or\nStatPar criterion, and assume distributions fk,t(x) = fk(x) are \ufb01xed over time. Then the one-\nshot problems in any two consecutive time steps, i.e., OOOt, OOOt+1, satisfy the monotonicity condition\nunder dynamics (2) with \u03c0k(\u00b7) being either \u03bd(Lk(\u00b7)) or w(Dk(\u00b7)).7 This implies that Theorem 1\nholds and (\u03b8a(t), \u03b8b(t)) converges monotonically to a constant decision (\u03b8\u221e\nb ). Furthermore,\nlim\nt\u2192\u221e\n\n1\u2212\u03c0b(\u03b8\u221e\nb )\na ) .\n1\u2212\u03c0a(\u03b8\u221e\n\nk,t(x)\u2212g0\n\n\u03b1b(t) = \u03b2a\n\na , \u03b8\u221e\n\nkf 0\n\nkf 1\n\n\u03b1a(t)\n\n\u03b2b\n\nWhen distributions are \ufb01xed, the discrepancy between \u03c0a(\u03b8a(t)) and \u03c0b(\u03b8b(t)) increases over time\nas (\u03b8a(t), \u03b8b(t)) changes. The process is illustrated in Fig. 3, where \u03b8a(t) \u2208 [\u03c6C(\u03b4b), \u03b4a], \u03b8b(t) \u2208\n[\u03b4b, \u03c6\u22121C (\u03b4a)] are constrained by the same interval \u2200t. Left and right plots illustrate cases when\n\u03c0k(\u03b8k) = \u03bd(Lk(\u03b8k)) and \u03c0k(\u03b8k) = w(Dk(\u03b8k)) respectively.\nNote that the case considered in Theorem 4 is a special case of Theorem 2, with distributions fk,t(x) =\nfk(x) \ufb01xed, Ok(\u03b8k) = Lk(\u03b8k) and both dynamics \u03c0k(\u00b7) = \u03bd(Lk(\u00b7)) and \u03c0k(\u00b7) = w(Dk(\u00b7)) some\n6The cases represented by blank cells cannot happen. When C = Simple, the table only illustrates the result\nwhen \u03b4a,t, \u03b4b,t \u2208 Ta,t \u2229 Tb,t (cid:54)= \u2205.\n7When fk,t(x) = fk(x), \u2200t, subscript t is omitted in some notations (\u03c6C,t, \u03b4k,t, \u03c0k,t, etc.) for simplicity.\n\n6\n\n\fdecreasing functions of Lk(\u00b7).8 In this special case we obtain the additional result of monotonic\nconvergence of decisions, which holds due to Assumption 1.\nOnce \u03b1a(t)\n\u03b1b(t) starts to increase, the corre-\nsponding one-shot solution (\u03b8a(t), \u03b8b(t))\nalso increases (Theorem 3), meaning that\n\u03b8a(t) moves closer to \u03b8\u2217\na = \u03b4a and \u03b8b(t)\nmoves further away from \u03b8\u2217\nb = \u03b4b (solid ar-\nrows in Fig. 3). Consequently, La(\u03b8a(t))\nand Db(\u03b8b(t)) decrease while Lb(\u03b8b(t))\nand Da(\u03b8a(t)) increase. Under both dy-\nnamics, \u03c0a(\u03b8a(t)) increases and \u03c0b(\u03b8b(t))\ndecreases, resulting in the increase of\n\u03b1a(t+1)\n\u03b1b(t+1) ; the feedback loop becomes self-\nreinforcing and representation disparity\nworsens.\n\nFig. 3: Illustration of Lk(\u03b8k) and Dk(\u03b8k) w.r.t. \u03b8k: Each black\ntriangle represents the one-shot decision \u03b8k; size of the colored\narea represents the value of Lk(\u03b8k) (left) or Dk(\u03b8k) (right). Note\nthat for the right plot, there are two gray regions and the darker\none is for compensating the lighter one thus they are of the same\nsize; the smaller gray regions result in the larger Da(\u03b8a).\n\n3.3\n\nImpact of decisions on reshaping feature distributions\n\nFig. 4: Visualization of decisions\nshaping feature distributions.\n\nk, G1\n\nk,tf 1\n\nk,t(x) + g0\n\nk,tf 0\n\nk,t(x) = f j\n\nk (x) remain \ufb01xed but gj\n\nk,t changes over time given Gj\n\nk,t+1 > gi\nk but for subgroup Gi\n\nk,t,9 In other words, for i \u2208 {0, 1} and t \u2265 2 such that Li\nk,t, where \u2212i := {0, 1} \\ {i}.\n\nOur results so far show the potential adverse impact on group rep-\nresentation when imposing certain fairness criterion, while their\nunderlying feature distributions are assumed \ufb01xed. Below we\nexamine what happens when decisions also affect feature distri-\nbutions over time, i.e., fk,t(x) = g1\nk,t(x), which\nis not captured by Theorem 2. We will focus on the dynamics\n\u03c0k,t(\u03b8k) = \u03bd(Lk,t(\u03b8k)). Since G0\nk may react differently to the\nsame \u03b8k, we consider two scenarios as illustrated in Fig. 4, which\nshows the change in distribution from t to t + 1 when G1\nk (resp.\nk) experiences the higher (resp. lower) loss at t than t \u2212 1 (see\nG0\nAppendix I for more detail): \u2200j \u2208 {0, 1},\nCase (i): f j\nby its perceived loss Lj\nk,t\u22121(\u03b8k(t \u2212 1)), we have gi\nLi\nCase (ii): gj\nmake extra effort such that f i\nwords, for i \u2208 {0, 1} and t \u2265 2 such that Li\nk,t(x), \u2200x \u2208 Tk, while f\u2212i\nf i\nIn both cases, under the condition that fk,t(x) is relatively insensitive to the change in one-shot\ndecisions, representation disparity can worsen and deterioration accelerates. The precise conditions\nare formally given in Conditions 1 and 2 in Appendix I, which describes the case where the change\nfrom fk,t(x) to fk,t+1(x) is suf\ufb01ciently small while the change from \u03b1a(t)\n\u03b1b(t+1) and the\nresulting decisions from \u03b8k(t) to \u03b8k(t + 1) are suf\ufb01ciently large. These conditions hold in scenarios\nwhen the change in feature distributions induced by the one-shot decisions is a slow process.\nTheorem 5. [Exacerbation in representation disparity can accelerate] Consider the one-shot problem\nde\ufb01ned in (1) under either Simple, EqOpt or StatPar fairness criterion. Let the one-shot decision,\nrepresentation disparity and retention rate at time t be given by \u03b8f\nk (t))\nwhen distribution fk(x) is \ufb01xed \u2200t. Let the same be denoted by \u03b8r\nk(t))\nwhen fk,t(x) changes according to either case (i) or (ii) de\ufb01ned above. Assume we start from the\n\nk,t and g\u2212i\nk that is less favored by the decision over time, its members\nk,t(x) skews toward the direction of lowering their losses.10 In other\nk,t+1(x) <\n\nk (t), \u03b1f\na(t)\n, and \u03c0f\n\u03b1f\nb (t)\nk(t), \u03b1r\na(t)\nb (t) , and \u03c0r\n\nk,t(\u03b8f\nk,t(\u03b8r\n\nk,t(x), \u2200x, where \u2212i := {0, 1} \\ {i}.\n\nk\u2019s retention determined\nk,t(\u03b8k(t)) <\n\nk,t\u22121(\u03b8k(t \u2212 1)), we have f i\n\nk,t+1(x) = f\u2212i\n\n\u03b1b(t) to \u03b1a(t+1)\n\nk,t+1 < g\u2212i\n\nk,t(\u03b8k(t)) > Li\n\nk,t = gj\n\n\u03b1r\n\nk,t(\u03b8k) =(cid:82) \u03b8k\u2212\u221e f 1\n\nk \u2212 Lk(\u03b8).\n8By Fig. 3, we have Dk(\u03b8) = g1\n9Here L1\nk,t(x)dx and L0\n10Suppose Assumption 1 holds for all f j\n\nk,t(\u03b8k) =(cid:82) \u221e\n\n\u03b8k\n\nk,t(x)dx.\nf 0\n\noverlap over Tk = [k1, k\n\n0\n\n], \u2200t.\n\nk,t(x) and their support does not change, then f 1\n\nk,t(x) and f 0\n\nk,t(x)\n\n7\n\nC(b)a0.000.020.04densityag0af0a(x)g1af1a(x)b1C(a)0.000.020.04densitybg0bf0b(x)g1bf1b(x)C(b)a0.000.020.04densityag0af0a(x)g1af1a(x)b1C(a)0.000.020.04densitybg0bf0b(x)g1bf1b(x)k0tk1tk0tk1t0.000.030.05densityg0k,tf0k(x)\u2191g1k,tf1k(x)\u2193g0k,t+1f0k(x)g1k,t+1f1k(x)Case(i)fjk(x)gjk,tfjk,t(x)gjk,t+1fjk,t+1(x)k0tk1tk0tk1t0.000.030.05densityg0kf0k,t(x)=g0kf1k,t(x)\u2192g0kf0k,t+1(x)g1kf1k,t+1(x)Case(ii)\fsame distribution fk,1(x) = fk(x). Under Conditions 1 and 2 in Appendix I, if \u03c0f\n\u03c0r\na,1(\u03b8r\n\u03b1f\na(t+1)\n\u03b1f\nb (t+1)\n\na(1)) (cid:5) \u03c0f\n(accelerates), \u2200t, where (cid:5) represents either \u201c < \u201dor \u201c > \u201d.\n\na,1(\u03b8f\nb (t) (disparity worsens) and \u03b1r\n\nb (t+1) (cid:5) \u03b1r\n\nb (1)), then \u03b1r\n\nb (1)) = \u03c0r\n\nb,1(\u03b8f\n\nb,1(\u03b8r\n\na(t+1)\n\na(t)\n\n\u03b1r\n\n\u03b1r\n\n\u03b1r\n\na (1)) =\na(t+1)\nb (t+1) (cid:5)\n\n3.4 Potential mitigation & \ufb01nding the proper fairness criterion from participation dynamics\n\n\u03b2b\n\n\u03b1a(t)\n\n\u03b1b(t) = \u03b2a\n\nThe above results show that when the objective is to minimize the average loss over the entire\npopulation, applying commonly used and seemingly fair decisions at each time can exacerbate\nrepresentation disparity over time under reasonable participation dynamics. It highlights the fact\nthat fairness has to be de\ufb01ned with a good understanding of how users are affected by the algorithm,\nand how they may react to it. For instance, consider the dynamics with \u03c0k,t(\u03b8k) = \u03bd(Lk,t(\u03b8k)),\nthen imposing EqLos fairness (Fig. 1(d)) at each time step would sustain group representations, i.e.,\n, as we are essentially equalizing departure when equalizing loss. In contrast, under\nlim\nt\u2192\u221e\nother fairness criteria the factors that are equalized do not match what drives departure, and different\nlosses incurred to different groups cause signi\ufb01cant change in group representation over time.\nIn reality the true dynamics is likely a function of a mixture of factors given the application context,\nand a proper fairness constraint C should be adopted accordingly. Below we illustrate a method for\n\ufb01nding the proper criterion from a general dynamics model de\ufb01ned below when fk,t(x) = fk(x),\u2200t:\n(4)\nm=1 (e.g. accuracy, true\nwhere user retention in Gk is driven by M different factors {\u03c0m\npositives, etc.) and each of them depends on decision \u03b8k(t). Constant \u03b2k is the intrinsic growth\nk (\u03b8k(t)). The expected number of users at time\nrate while the actual arrivals may depend on \u03c0m\nt + 1 depends on users at t and new users; both may be effected by \u03c0m\nk (\u03b8k(t)). This relationship is\ncharacterized by a general function \u039b. Let \u0398 be the set of all possible decisions.\nAssumption 2. \u2203(\u03b8a, \u03b8b) \u2208 \u0398 \u00d7 \u0398 such that \u2200k \u2208 {a, b}, \u02c6Nk = \u039b( \u02c6Nk,{\u03c0m\nk (\u03b8k)}M\n|\u039b(cid:48)( \u02c6Nk,{\u03c0m\n(\u03b8a, \u03b8b) have stable \ufb01xed points, where \u039b(cid:48) denotes the derivative of \u039b with respect to Nk.\nTo \ufb01nd the proper fairness constraint, let C be the set of decisions (\u03b8a, \u03b8b) that can sustain group\nrepresentation. It can be found via the following optimization problem; the set of feasible solutions is\nguaranteed to be non-empty under Assumption 2.\n\nm=1, \u03b2k) and\nm=1, \u03b2k)| < 1 hold for some \u02c6Nk, i.e., dynamics (4) under some decision pairs\n\nm=1, \u03b2k), \u2200k \u2208 {a, b},\nk (\u03b8k(t))}M\n\nNk(t + 1) = \u039b(Nk(t),{\u03c0m\n\nk (\u03b8k(t))}M\n\nk (\u03b8k)}M\n\nk (\u03b8k)}M\n\nm=1, \u03b2k) \u2208 R+, \u03b8k \u2208 \u0398,\u2200k \u2208 {a, b}.\n\n\u03b2a\n\n\u02dcNa\n\u02dcNb \u2212\n\n(\u03b8a,\u03b8b) (cid:12)(cid:12)(cid:12)\n\n\u03b2b(cid:12)(cid:12)(cid:12) s.t. \u02dcNk = \u039b( \u02dcNk,{\u03c0m\n\nC = arg min\nThe idea is to \ufb01rst select decision pairs whose\ncorresponding dynamics can lead to stable \ufb01xed\npoints ( \u02dcNa, \u02dcNb); then among them select those\nthat are best in sustaining group representation,\nwhich may or may not be unique. Sometimes\nguaranteeing the perfect fairness can be unre-\nalistic and a relaxed version is preferred, in\n\u02dcNa\nwhich case all pairs (\u03b8a, \u03b8b) satisfying |\n\u02dcNb \u2212\n\u03b2b |}+\u2206 constitute the \u2206-fair\n\u03b2b | \u2264 min{|\nset. An example under dynamics Nk(t + 1) =\nk(\u03b8k(t)) is illustrated in\nNk(t)\u03c02\nFig. 5, where all curves with \u0001 \u2264 \u2206 \u03b2b\nconsti-\ntute \u2206-fair set (perfect fairness set is given by\nthe deepest red curve with \u0001 = 0). See Appendix\nK for more details.\n\nk(\u03b8k(t)) + \u03b2k\u03c01\n\n\u02dcNb \u2212 \u03b2a\n\n\u02dcNa\n\n\u03b2a\n\n\u03b2a\n\nk(\u03b8k) = \u03bd((cid:82) \u221e\n\n\u03b8k\n\nfk(x)dx), \u03c01\nk(\u03b8k) = \u03bd(Lk(\u03b8k)), \u03c01\n\nFig. 5: Left plot: \u03c02\nk(\u03b8k) =\n\u03bd(Lk(\u03b8k)); right plot: \u03c02\nk(\u03b8k) =\n1, and \u03bd(x) = 1 \u2212 x. Value of each pair (\u03b8a, \u03b8b) corre-\n| measuring how well it can sustain\nsponds to | \u02dcNa\n\u02dcNb\nthe group representation. All points (\u03b8a, \u03b8b) with the\nsame value of | \u02dcNa\n\u0001 form a curve of the\n\u02dcNb\nsame color with \u0001 \u2208 [0, 1] shown in the color bar.\n\n| = \u03b2a\n\n\u2212 \u03b2a\n\n\u2212 \u03b2a\n\n\u03b2b\n\n\u03b2b\n\n\u03b2b\n\n4 Experiments\n\nWe \ufb01rst performed a set of experiments on synthetic data where every Gj\nk, k \u2208 {a, b}, j \u2208 {0, 1}\nfollows the truncated normal (Fig. 2) distributions. A sequence of one-shot fair decisions are used\n\n8\n\n\fand group representation changes over time according to dynamics (2) with \u03c0k(\u03b8k) = \u03bd(Lk(\u03b8k)).\nParameter settings and more experimental results (e.g., sample paths, results under other dynamics\nand when feature distributions are learned from data) are presented in Appendix L.\n\n(a) Simple fair\n\n(b) StatPar fair\n\n(c) EqOpt fair\n\n(d) EqLos fair\n\nFig. 6: Each dot in Fig. 6(a)-6(d) represents the \ufb01nal group proportion limt\u2192\u221e \u03b1a(t) of one sample path under\na pair of arriving rates (\u03b2a, \u03b2b). If the group representation is sustained, then limt\u2192\u221e \u03b1a(t) =\nfor\neach pair of (\u03b2a, \u03b2b), as shown in Fig. 6(d) under EqLos fairness. However, under Simple, StatPar and\nEqOpt fairness, limt\u2192\u221e \u03b1a(t) = 1/(1 + \u03b2b(1\u2212\u03bd(La(\u03b8\u221e\n\u03b2a(1\u2212\u03bd(Lb(\u03b8\u221e\n\n1+\u03b2b/\u03b2a\n\n1\n\na )))\nb ))) ).\n\nFig. 6 illustrates the \ufb01nal group proportion (the converged state) limt\u2192\u221e \u03b1a(t) as a function of the\nexogenous arrival sizes \u03b2a and \u03b2b under different fairness criteria. With the exception of EqLos\nfairness, group representation is severely skewed in the long run,\nwith the system consisting mostly of Gb, even for scenarios when\nGa has larger arrival, i.e., \u03b2a > \u03b2b. Moreover, decisions under an\ninappropriate fairness criterion (Simple, EqOpt or StatPar) can\nresult in poor robustness, where a minor change in \u03b2a and \u03b2b can\nresult in very different representation in the long run (Fig. 6(b)).\nWe also consider the dynamics presented in Fig. 5 and show the\neffect of \u2206 = \u0001 \u03b2a\n-fair decision found with method in Sec. 3.4\n\u03b2b\non \u03b1a(t). Each curve in Fig. 7 represents a sample path under\ndifferent \u0001 where (\u03b8a(t), \u03b8b(t)) is from a small randomly selected\nsubset of \u2206-fair set, \u2200t (to model the situation where perfect\nfairness is not feasible) and \u03b2a = \u03b2b. We observe that fairness\nis always violated at the beginning in lower plot even with small\n\u0001. This is because the fairness set is found based on stable \ufb01xed\npoints, which only concerns fairness in the long run.\nWe also trained binary classi\ufb01ers over Adult dataset [4] by min-\nimizing empirical loss where features are individual data points\nsuch as sex, race, and nationality, and labels are their annual\nincome (\u2265 50k or < 50k). Since the dataset does not re\ufb02ect\ndynamics, we employ (2) with \u03c0k(\u03b8k) = \u03bd(Lk(\u03b8k)) and \u03b2a = \u03b2b.\nWe examine the monotonic convergence of representation dis-\nparity under Simple, EqOpt (equalized false positive/negative\ncost(FPC/FNC)) and EqLos, and consider cases where Ga, Gb\nare distinguished by the three features mentioned above. These\nresults are shown in Fig. 8.\n\nFig. 7: Effect of \u2206-fair decisions\nfound with proposed method.\n\nFig. 8: Illustration of group represen-\ntation disparity using Adult dataset.\n\n5 Conclusion\n\nThis paper characterizes the impact of fairness intervention on group representation in a sequential\nsetting. We show that the representation disparity can easily get exacerbated over time under relatively\nmild conditions. Our results suggest that fairness has to be de\ufb01ned with a good understanding of\nparticipation dynamics. Toward this end, we develop a method of selecting a proper fairness criterion\nbased on prior knowledge of participation dynamics. Note that we do not always have full knowledge\nof participation dynamics; modeling dynamics from real-world measurements and \ufb01nding a proper\nfairness criterion based on the obtained model is a potential direction for future work.\n\n9\n\n200040006000800010000b200040006000800010000a0.20.40.60.8200040006000800010000b200040006000800010000a0.20.40.60.8200040006000800010000b200040006000800010000a0.20.40.60.8200040006000800010000b200040006000800010000a0.20.40.60.80102030400.300.400.50\u03b1a(t)\u03c02k(\u03b8k)=\u03bd(Lk(\u03b8k)),\u03c01k(\u03b8k)=1\u0001=0.02\u0001=0.1\u0001=0.3\u0001=0.7\u0001=1.0010203040t0.430.450.480.50\u03b1a(t)\u03c02k(\u03b8k)=\u03bd(R\u221e\u03b8kfk(x)dx),\u03c01k(\u03b8k)=\u03bd(Lk(\u03b8k))\u0001=0.001\u0001=0.1\u0001=0.3\u0001=0.5\u0001=0.8010203040t0.430.450.480.500.530.550.580.600.62a(t)SimpleEqual FPCEqual FNCEqual Lossrace: White (Ga) vs. Non-white (Gb)sex: Female (Ga) vs. Male (Gb)nationality: US (Ga) vs. Non-US (Gb)\fAcknowledgments\n\nThis work is supported by the NSF under grants CNS-1616575, CNS-1646019, CNS-1739517. The\nwork of Cem Tekin was supported by BAGEP 2019 Award of the Science Academy.\n\nReferences\n[1] Solon Barocas, Moritz Hardt, and Arvind Narayanan. Fairness and Machine Learning. fairml-\n\nbook.org, 2019. http://www.fairmlbook.org.\n\n[2] Avrim Blum, Suriya Gunasekar, Thodoris Lykouris, and Nati Srebro. On preserving non-\ndiscrimination when combining expert advice. In Advances in Neural Information Processing\nSystems, pages 8386\u20138397, 2018.\n\n[3] Christos Dimitrakakis, Yang Liu, David C Parkes, and Goran Radanovic. Bayesian fairness. In\n\nProceedings of the AAAI Conference on Arti\ufb01cial Intelligence, pages 509\u2013516, 2019.\n\n[4] Dheeru Dua and Casey Graff. UCI machine learning repository, 2017. http://archive.ics.\n\nuci.edu/ml.\n\n[5] Moritz Hardt, Eric Price, Nati Srebro, et al. Equality of opportunity in supervised learning. In\n\nAdvances in neural information processing systems, pages 3315\u20133323, 2016.\n\n[6] Drew Harwell. Amazon\u2019s alexa and google home show accent bias, with chinese and spanish\n\nhardest to understand. 2018. http://bit.ly/2QFA1MR.\n\n[7] Tatsunori Hashimoto, Megha Srivastava, Hongseok Namkoong, and Percy Liang. Fairness\nwithout demographics in repeated loss minimization. In Proceedings of the 35th International\nConference on Machine Learning, pages 1929\u20131938, 2018.\n\n[8] Hoda Heidari and Andreas Krause. Preventing disparate treatment in sequential decision\nmaking. In Proceedings of the 27th International Joint Conference on Arti\ufb01cial Intelligence,\npages 2248\u20132254, 2018.\n\n[9] Hoda Heidari, Vedant Nanda, and Krishna P. Gummadi. On the long-term impact of algorithmic\ndecision policies: Effort unfairness and feature segregation through social learning. Proceedings\nof the 39th International Conference on Machine Learning, pages 2692\u20132701, 2019.\n\n[10] Lily Hu and Yiling Chen. A short-term intervention for long-term fairness in the labor market.\nIn Proceedings of the 2018 World Wide Web Conference on World Wide Web, pages 1389\u20131398,\n2018.\n\n[11] Shahin Jabbari, Matthew Joseph, Michael Kearns, Jamie Morgenstern, and Aaron Roth. Fairness\nin reinforcement learning. In Proceedings of the 34th International Conference on Machine\nLearning, pages 1617\u20131626, 2017.\n\n[12] Sampath Kannan, Aaron Roth, and Juba Ziani. Downstream effects of af\ufb01rmative action. In\nProceedings of the Conference on Fairness, Accountability, and Transparency, pages 240\u2013248,\n2019.\n\n[13] Lydia T. Liu, Sarah Dean, Esther Rolf, Max Simchowitz, and Moritz Hardt. Delayed impact\nof fair machine learning. In Proceedings of the 35th International Conference on Machine\nLearning, pages 3150\u20133158, 2018.\n\n[14] Lauren Rhue. Emotion-reading tech fails the racial bias test. 2019. http://bit.ly/2Ty3aLG.\n\n[15] Abhishek Tiwari. Bias and fairness in machine learning. 2017. http://bit.ly/2RIr89A.\n\n[16] Isabel Valera, Adish Singla, and Manuel Gomez Rodriguez. Enhancing the accuracy and\nfairness of human decision making. In Advances in Neural Information Processing Systems,\npages 1769\u20131778, 2018.\n\n[17] Chongjie Zhang and Julie A Shah. Fairness in multi-agent sequential decision-making. In\n\nAdvances in Neural Information Processing Systems, pages 2636\u20132644, 2014.\n\n10\n\n\f", "award": [], "sourceid": 8761, "authors": [{"given_name": "Xueru", "family_name": "Zhang", "institution": "University of Michigan"}, {"given_name": "Mohammadmahdi", "family_name": "Khaliligarekani", "institution": "university of michigan"}, {"given_name": "Cem", "family_name": "Tekin", "institution": "Bilkent University"}, {"given_name": "mingyan", "family_name": "liu", "institution": "university of Michigan, Ann Arbor"}]}