{"title": "A rational model of causal inference with continuous causes", "book": "Advances in Neural Information Processing Systems", "page_first": 2384, "page_last": 2392, "abstract": "Rational models of causal induction have been successful in accounting for people's judgments about the existence of causal relationships. However, these models have focused on explaining inferences from discrete data of the kind that can be summarized in a 2 \u2715 2 contingency table. This severely limits the scope of these models, since the world often provides non-binary data. We develop a new rational model of causal induction using continuous dimensions, which aims to diminish the gap between empirical and theoretical approaches and real-world causal induction. This model successfully predicts human judgments from previous studies better than models of discrete causal inference, and outperforms several other plausible models of causal induction with continuous causes in accounting for people's inferences in a new experiment.", "full_text": "A rational model of causal induction\n\nwith continuous causes\n\nMichael D. Pacer\n\nDepartment of Psychology\n\nUniversity of California, Berkeley\n\nBerkeley, CA 94720\n\nmpacer@berkeley.edu\n\nThomas L. Grif\ufb01ths\n\nDepartment of Psychology\n\nUniversity of California, Berkeley\n\nBerkeley, CA 94720\n\nTom Griffiths@berkeley.edu\n\nAbstract\n\nRational models of causal induction have been successful in accounting for peo-\nple\u2019s judgments about causal relationships. However, these models have focused\non explaining inferences from discrete data of the kind that can be summarized in\na 2\u00d72 contingency table. This severely limits the scope of these models, since the\nworld often provides non-binary data. We develop a new rational model of causal\ninduction using continuous dimensions, which aims to diminish the gap between\nempirical and theoretical approaches and real-world causal induction. This model\nsuccessfully predicts human judgments from previous studies better than models\nof discrete causal inference, and outperforms several other plausible models of\ncausal induction with continuous causes in accounting for people\u2019s inferences in\na new experiment.\n\n1\n\nIntroduction\n\nThe problem of causal induction is central to science, and is something at which people are re-\nmarkably skilled, especially given its apparent dif\ufb01culty. Understanding how people identify causal\nrelationships has consequently become a challenge taken up by many research programs in cogni-\ntive science. One of the most successful of these programs has used rational solutions to the abstract\nproblem of causal induction (in the spirit of [1, 2]) as a source of explanations for people\u2019s infer-\nences [3, 4, 5, 6]. However nearly all this research has assumed people have access to categorical\ninformation about whether or not a cause or effect is present on a given trial \u2013 the sort of information\nthat appears in a 2\u00d7 2 contingency table (see Figure 1(a)). Such an assumption may not be valid for\nmany of the causal relationships that we see in the world.\nFor a simple example of a situation in which a continuous cause is relevant, consider the case of\ndrinking coffee and wakefulness. Clearly, someone who drinks a beverage made by placing a single\ndrop of coffee in a gallon of water will experience no effects of wakefulness, as an insuf\ufb01cient\namount of the cause was present. Meanwhile, the diligent graduate student who imbibes upwards\nof 10 pots of coffee a day will experience a great deal of wakefulness. How much coffee one drinks\nis closely linked to whether wakefulness occurs \u2013 merely knowing that some amount of coffee was\ndrunk is insuf\ufb01cient. And this problem is not relegated to those who wish to titrate their caffeination;\nmany causes exist along continuous dimensions, even if their effects do not (e.g., medicine dosage\nand recovery, smoking and related death from cancer).1\nThe primary strategy that has been explored in previous work on causal induction from continuous\ncauses is one in which ambiguous examples are immediately categorized as indicating either the\npresence or the absence of the cause. This approach, taken by Marsh and Ahn [9], provides a way to\n1We will focus on the case of continuous causes with binary outcomes. Learning the mapping between\n\ncontinuous variables is known as function learning (e.g., [7, 8]).\n\n1\n\n\f\uf021\n\n\uf025\uf023\uf021\n\uf025\uf024\uf021\n\n\uf022\uf023\uf021\n\uf022\uf024\uf021\n\uf021\uf026\uf025\uf023\uf027\uf021\uf022\uf023\uf028\uf021 \uf029\uf026\uf025\uf023\uf027\uf021\uf022\uf024\uf028\uf021\n\uf029\uf026\uf025\uf024\uf027\uf021\uf022\uf023\uf028\uf021 \uf029\uf026\uf025\uf024\uf027\uf021\uf022\uf024\uf028\uf021\n\n(a)\n\n\uf021\n\nGraph 0\n\nGraph 1\n\nB\n\nC\n\nB\n\nC\n\n(b)\n\nE\n\nE\n\nFigure 1: Causal induction. (a) A 2 \u00d7 2 contingency table. C is the cause, E the effect, with c+ and\nc\u2212 indicating the presence and absence of the cause, similarly for e+ and e\u2212. (b) Graphical models\nshowing possible causal relationships between cause C, effect E, and background B.\n\nreduce continuous causes to the familiar binary case. In this paper, however, we argue that another\napproach can be fruitful \u2013 developing models that work directly with continuous values. We extend\nthe causal support model [4], originally de\ufb01ned for binary causes, to work with continuous-valued\ncauses. We then re-analyze the results of Marsh and Ahn [9], comparing people\u2019s causal judgments\nto predictions made by a number of rational models of causal induction with both discrete and con-\ntinuous causes. The predictions made by the continuous models for these experiments perform well,\nbut are extremely similar, which led us to conduct a new experiment using stimuli that discriminate\namong the different models. We show that continuous causal support provides a better account of\nthese data than the other models we consider.\n\n2 Background\n\nIn this section we review previous work on rational models of causal induction, and summarize the\nresults of Marsh and Ahn [9] that we will use to evaluate different models later in the paper.\n\n2.1 Rational models of causal induction\n\nRational models of causal induction have focused on the problem of determining the nature of the\nrelationship between a cause C and an effect E. These models can be divided into two groups.\nOne group focuses on estimating causal strength, such as \u2206P [10], causal power [3] and pCI [11],\nwhich attempt to identify the degree of relationship between two variables. The other group focuses\non causal structure, such as causal support [4], which attempts to identify how certain one can be\nthat a causal relationship exists at all. The causal support model has proven effective in predicting\nhuman judgments in several studies [4, 5, 6], and we use it as the starting point for our model of\ncausal induction with continuous causes. The causal support model can be most easily described\nin the context of causal graphical models [12] (see Figure 1(b)). In particular, we consider two\ngraphical models, Graph 0 (G0) and Graph 1 (G1), and we want to determine the log posterior odds\nof the models given some data D (i.e.\nP (G0|D)). If we assume that both graphs are equally\nlikely a priori (i.e. P (G0) = P (G1)), then this is equivalent to calculating the log Bayes factor\n(log P (D|G1)\nP (D|G0)). In its most general form causal support is this calculation, described less technically\nas identifying the evidence that D provides in favor of G1 over G0 [4].\nIn the particular case of causal inference over binary variables, we have three random variables rep-\nresenting the unknown background causes assumed to be always present (B), the possible cause\n(C) and the effect (E) in question. In Graph 0 (G0) only B causes E, and how often it does so\nis described by the weight parameter, w0. Thus the probability of the event occurring under G0\nis P (e+|b+, w0; G0) = w0.2 Graph 1 (G1) allows C to potentially in\ufb02uence the probability of\nE.\nIn particular we say C also has an associated weight parameter w1. How we parameterize\nthe relationship between B, C, and E determines the type of causal relationship we are consid-\nering. In order to capture generative causal relationships we use a noisy-OR parameterization for\nP (e|b+, c, w0, w1; G1). That is, under G1 the probability of E occurring (assuming b+) is\n\nlog P (G1|D)\n\n(1)\n2Following [4], a superscript + indicates the presence of a variable, and a \u2212 indicates its absence. We also\n\nP (e+|b+, c, w0, w1; G1) = 1 \u2212 (1 \u2212 w0)(1 \u2212 w1)c\n\nuse c+ and c\u2212 to indicate that C takes the values 1 and 0 respectively.\n\n2\n\n\fP (D|Gk) =\n\nP (ei|ci, b+, w0, w1; Gk)\n\n(2)\n\nn\uffffi=1\n\na similar noisy-AND-NOT parameterization can be used for preventive causes [4], but we focus on\ngenerative causes in this paper.\nHaving de\ufb01ned these graphical models, we can compute the corresponding likelihoods. The\ndata consists of the values of all n observed occurrences of cause and effects (i.e. D =\n{(e1, c1), (e2, c2), . . . , (en, cn)}). Assuming trials are conditionally independent, we have\n\nwhere the noisy-OR parameterization is used, as in Equation 1. If we were concerned with estimat-\ning causal strength, we could use this likelihood to determine the estimates of w0 and w1 under G1\nand G0. However, if we want to compute a measure of causal structure we need to integrate over\nall possible values of w0 and w1, assuming prior distributions on w0 and w1. In the original causal\nsupport model [4], a uniform prior was used on w0 and w1 (for a more complex prior, see [6]).\nDespite its success in modeling human judgments, this measure of causal support only works in a\nlimited set of cases \u2013 those cases where data can be summarized in a 2\u00d72 contingency table. In order\nto address more complicated data sets (e.g. continuous-valued causes), signi\ufb01cant modi\ufb01cations are\nneeded. These modi\ufb01cations can be made to the model or the data. We propose a modi\ufb01cation to\nthe model, while others (e.g., [9]) have attempted to solve this by collapsing continuous data into\nbinary form. We discuss the consequences of the latter strategy in the next section.\n\n2.2 Previous work on continuous-valued causal induction\n\nMarsh and Ahn [9] note the insuf\ufb01ciencies of current models of causal induction that result from\nconsidering only binary variables. Assuming that the data must be coerced into binary form, they\nproposed two potential solutions to this problem, and ruled out one of these options. The \ufb01rst\nsolution is that people simply ignore ambiguous information, and only deal with instances that can\neasily be categorized into \u201ccause\u201d and \u201cnot cause\u201d. They reject this solution and instead opt for\nthe idea that learners \u201cspontaneously categorize ambiguous evidence into one of the four types of\nevidence [used in contingency tables].\u201d [9] (p. 4)\nTo test these claims, Marsh and Ahn conducted a series of experiments in which participants observe\nvisual stimuli (e.g., Figure 2 (a)) representing a particular value along a continuous dimension paired\nwith a (binary) event either occurring or not occurring. Participants were asked to use these images\nto do two things. First, they were asked to estimate how many examples of each type of data they\nhad seen. Then, participants were asked \u201cto judge the strength between C and E on a scale from\n0 (not a cause) to 100 (strongly causes)\u201d. Marsh and Ahn used this second measure to show that\nparticipants use ambiguous evidence when making causal judgments, refuting the idea that people\nignore the instances which cannot be easily categorized. Furthermore, they discovered that engaging\nin causal inference changes participants\u2019 judgments of how many instances of each category they\nsaw. For example, when the \u201cambiguous\u201d stimuli were paired with the effect (e.g., condition AE of\nExperiment 1, see Table 1), they found that participants claimed to have seen more examples of the\nC category. This evidence that people\u2019s frequency ratings were altered based on whether or not the\neffect was paired with the ambiguous stimuli was used to dismiss the possibility that participants\nwere learning a continuous causal relationship.\nWhile Marsh and Ahn demonstrate that causal induction altered how people assigned ambiguous\nstimuli to categories, this does not necessarily mean that people were spontaneously categorizing\nthese stimuli and using that categorization information to make causal judgments. An alternative\naccount is that the boundary between the categories was ambiguous, and the evidence about the\nrelationship between cause and effect in\ufb02uenced where people placed this boundary. Previous re-\nsearch suggests that category structures should not always be thought of as \ufb01xed [13] and that causal\ninformation can be used when learning category structures and meanings [14]. Our focus here is on\ninvestigating how people might induce causal relationships that involve continuous variables, rather\nthan understanding their in\ufb02uence on categorization. However, the existence of a plausible alter-\nnative account of Marsh and Ahn\u2019s results raises the possibility that we can understand their data\nwithout assuming that people spontaneously categorize ambiguous stimuli in order to make causal\njudgments. We will explore this possibility after introducing our rational model of causal induction.\n\n3\n\n\f(a)\n\nSet 1\n\nSet 2\n\n(b)\n\nFigure 2: Examples of continuous-valued stimuli. (a) Two sets of stimuli used by Marsh and Ahn\n[9]. The extreme stimuli indicated the presence and absence of a cause, while the intermediate\nstimulus was deemed \u201cambiguous\u201d. (b) A stimulus used in our experiments.\n\n3 De\ufb01ning causal support for continuous causes\n\nOur goal in this section is to extend the rational analysis used to de\ufb01ne the causal support model\n[4] to causes with continuous values. Following the original model, we take causal support to be\nthe log likelihood ratio in favor of G1 over G0, and assume that the causes combine in a noisy-OR.\nHowever, rather than assuming that the in\ufb02uence of C is described by a single parameter w1, we\ninstead de\ufb01ne a function(f) that maps c the value of C \u2208 R into [0, 1]. For any such function\nf\u03bb(\u00b7) : R \u2192 [0, 1], with parameters \u03bb, we then have the parameterization\n(3)\nwhere c is the (continuous) value of the cause C. The function f\u03bb(\u00b7), thus plays a very similar role\nto that of the link function in generalized linear models.\nWe use a speci\ufb01c choice for f\u03bb(\u00b7): the probit function (the cumulative distribution function (CDF)\nof the standard Normal distribution [15]), denoted \u03a6(\u00b7). The in\ufb02uence of C is encoded in two\nparameters, a bias parameter \u03b8 and a gain parameter \u03b3. This gives the full parameterization\n\nP (e+|b+, c, w0,\u03bb ; G1) = 1 \u2212 (1 \u2212 w0)(1 \u2212 f\u03bb(c))\n\nP (e+|b+, c, w0,\u03b8,\u03b3 ; G1) = 1 \u2212 (1 \u2212 w0)(1 \u2212 \u03a6\uffff c \u2212 \u03b8\n\n\u03b3\n\n\uffff)\n\nwhere \u03b8 indicates the point where the effective strength of C will be 0.5, and \u03b3 determines the\nsharpness of the transition in strength around this threshold. It is straightforward to show that the\noriginal causal support model corresponds to a special case of this model when C only takes on a\nsingle value when it is present.3 Under the assumption that there is no background rate of occurrence\n(i.e., w0 = 0), this model is nearly equivalent to probit regression, which provides an excellent\ncomparison case for identifying the role that the noisy-OR plays in explaining people\u2019s judgments.\nTo complete the speci\ufb01cation of the model, we need to de\ufb01ne prior distributions on the parameters.\nFor the results we report here w0 \u223c U (0, 1), as in [4], and we use the observed values c(n) to\nproduce the priors over \u03b8 and \u03b3. We take \u03b8 \u223c U (cmin, cmax), where cmin is the minimum of\nc(n),and cmax is the maximum. This allows the prior on \u03b8 to be as uninformative as possible while\nonly sampling from the range of values over which inference could be reasonably made. The prior\non \u03b3 is a mixture distribution, where we draw a variable z from an inverse Wishart distribution with\none degree of freedom and a mean corresponding to the sample variance, and then set \u03b3 to either \u221az\nor \u2212\u221az with equal probability. Initial investigations suggest the model is relatively robust to prior\nchoice (e.g. varying the degrees of freedom in the Inverse Wishart does not substantially change\nmodel predictions). Because of the complexity of analytically determining the joint likelihood, we\nuse Monte Carlo simulation to approximate the integral over these parameters.\n\n3In our continuous model, we assume the cause is always present but with varying strength. If we allow\nfor the possibility that the cause is absent, and that it has no in\ufb02uence on the effect in such a situation, then we\n\n\u03b3 \uffff plays an analogous role\nin Equation 3 to w1 in (1). To show equivalence, we need to show that it is possible for this quantity to have\na uniform prior when c = 1. Take \u03b3 = 1, and de\ufb01ne a Gaussian prior on \u03b8 with mean 1 and unit variance.\nthen follows a Gaussian distribution with mean 0 and unit variance. Since \u03a6(\u00b7) is the CDF of the standard\nc\u2212\u03b8\n\u03b3\n\nobtain P (e+|b+, c\u2212, w0,\u03b8,\u03b3 ; G1) = w0, as required. We then observe that \u03a6\uffff c\u2212\u03b8\nNormal, the distribution of \u03a6\uffff c\u2212\u03b8\n\n\u03b3 \uffff is uniform on [0, 1].\n\n4\n\n\fContingencies\nN (e+, c+)\nN (e\u2212, c\u2212)\nN (e\u2212, c+)\nN (e+, c\u2212)\nCausal Ratings:\nNote: Ex1 and Ex2 refer to Experiments 1 and 2. Vertical bars in Ex2 contingencies separate the\n\nEx2:Perfect\n40|40|40\n20|20|20\n0 | 0 | 0\n0 | 0 | 0\n81.0\n\n36|32|32\n16|16|16\n4 | 4 | 4\n4 | 8 | 4\n60.6\n\nEx2:Zero\n10|10|10\n10|10|10\n10|10|10\n10|10|10\n28.3\n\nEx2:Weak\n33|26|26\n13|13|13\n7 | 7 | 7\n7 |14| 7\n36.2\n\n18\n38\n2\n2\n78.5\n\n38\n18\n2\n2\n79.2\n\nTable 1: Contingencies and mean causal ratings from Marsh and Ahn [9]\n\nConditions\n\nEx1:AE Ex1:A \u00afE\n\nEx2:Moderate\n\nthree possible strategies (1|2|3) proposed in [9] for assimilating ambiguous stimuli.\n\nWe developed this rational model in order to be able to investigate how people engage in causal\ninference in the case of continuous causes. We proceeded with this investigation in two ways. First,\nin order to demonstrate the usefulness of considering any model of continuous causal inference,\nwe reanalyzed the causal ratings provided by participants in Marsh and Ahn\u2019s [9] Experiments 1\nand 2. Second, in order to better identify which model best predicts human judgments among the\ncontinuous causal models, we conducted a new experiment designed to distinguish between the\nvarious rational models.\n\n4 Reanalyzing the results of Marsh and Ahn\n\nWe applied the continuous causal support model, together with several models of causal induction\nfrom discrete data and alternative statistical models for causal induction from continuous data, to two\ndata sets from Marsh and Ahn [9]: the two conditions of Experiment 1 that contained ambiguous\nstimuli (AE and A \u00afE), and the four conditions of Experiment 2. Contingencies and mean ratings for\nthese experiments are shown in Table 1.\n\n4.1 Models\n\nDiscrete models. Following [4], we evaluated \ufb01ve models of causal induction from discrete data:\n\u2206P [10], causal power [3], pCI [11], (discrete) causal support [4], and the \u03c72 statistic. These\nmodels were applied to contingencies derived by discretizing the continuous stimuli in three different\nways, following the strategies suggested by Marsh and Ahn: (1) if people believe in a generative\ncausal relationship, all ambiguous information should be incorporated into the cause count (i.e.\ne+, c+), (2) that people will classify information as being an example of e+, c+ and e+, c\u2212 in a\nway that is proportional to the relationship they infer from the non-\u201cambiguous\u201d examples, and (3)\nthat people increase e+, c+ by the same number of \u201cambiguous\u201d cases as they would under (2), but\nthey do not similarly do this for e+, c\u2212. Because there are three potential sets of true event counts\nunder the assimilation hypothesis for Experiment 2, in order to analyze the assimilation hypothesis\nunder the best possible case, we will run the discrete models under all three possible methods of\nassimilation. These three possible ways of assimilating the ambiguous cases are represented in\nTable 1, as contingencies separated by vertical bars (\u201c|\u201d).\nContinuous models. We also evaluated several models that consider the causal variable to be con-\ntinuously valued. This includes the causal support model described in the previous section, as well\nas several traditional models for statistical inference in cases where there is a relationship between\ncontinuous and binary variables. Because they are usually used for hypothesis tests about whether\nor not there is a relationship between a continuous and a binary variable, the two tests we use are\nprobit regression and a two-sample Student t-test. The former tests whether there is a relationship\nbetween a continuous valued variable mapped to a binary variable, while the latter tests whether\nthere is a relationship between a binary variable mapped to a continuous variable.\nBoth continuous causal support and the discrete models have the property that with more evidence\nthere is for a cause the larger the positive score produced by the model. We want a similar property\nto hold for the statistics we obtain from the alternative continuous models. If we treat the two-\n\n5\n\n\fTable 2: Correlations of Models Predictions to Human Data and \u03b1 values\n\nDiscrete Model Predictions\n\nPossibility 2\n\nPossibility 3\n\nPossibility 1\nr\u03b1r\u03b1r\u03b1\n\n-0.250\n-0.250\n-0.035\n0.679\n0.679\n\n2\u00d710\u22124\n2\u00d710\u22124\n1.100\n154.950\n1\u00d710\u22125\n\n-0.250\n-0.250\n-0.035\n0.240\n0.679\n\n2\u00d710\u22124\n2\u00d710\u22124\n1.100\n2\u00d710\u22124\n1\u00d710\u22125\n\n-0.250\n-0.250\n0.239\n0.679\n0.679\n\n2\u00d710\u22124\n2\u00d710\u22124\n16.142\n77.350\n1\u00d710\u22125\n\nModel:\n\u2206P:\nPower:\npCI:\nSupport :\n\u03c72:\n\nContinuous Model Predictions\nModel:\nC-Support:\nProbit, |t| :\nProbit, |\u03b2| :\nt-test, |t|:\nt-test, |\u03b2| :\n\nr\u03b1\n0.966\n0.984\n0.876\n0.976\n0.976\n\n0.475\n2\u00d710\u22124\n0.320\n1.132\n1.132\n\nsample t-test as a case of linear regression (with an indicator variable for whether or not the effect\noccurred as the regressor), we obtain \u03b2 values for both the probit model and the t-test model. We\ncan treat these \u03b2 values as estimates of the strength of the relationship between the two variables.\nBoth methods also produce a t statistic, indicating the evidence that \u03b2 is different from zero. We\ncan treat these t values as alternative measures of causal structure. However, the sign of the \u03b2 and t\nstatistics is highly dependent on the particular way the data is represented, so we will use |\u03b2| and |t|\ninstead.\nIn their studies, Marsh and Ahn used four types of continuously varying stimuli that differed slightly\nin the parameters used to create them. We have designed our models such that they are invariant\nacross speci\ufb01cation of the dimension, as long as the speci\ufb01cation accurately re\ufb02ects the variance as\nobserved by participants. The parameters used to generate their stimuli, along with the frequencies\nwhich each of these values occurred and the associated effects, can be directly plugged into the\nmodels to produce predictions. We ran the model over each set of stimulus values, and averaged\nthese four predictions to obtain the \ufb01nal general predictions the means of which were compared to\nthe mean human judgments.\n\n4.2 Results\n\nFollowing [4], model predictions underwent a nonlinear transformation to accommodate nonlinear-\nities in the response scale. This was the transformation y = sign(x)\u2217 abs(x)\u03b1, where \u03b1 was chosen\nto maximize the correlation (r) between the mean human ratings and mean model predictions across\nthe conditions. The results are shown in Table 2.\nThe re-analysis supports the idea that people were using continuous values in their causal judgments.\nThe best possible correlation achieved by any discrete model was discrete causal support and \u03c72,\nr = .679; this is substantially worse than any of the continuous model correlations. On the other\nhand, the models of continuous causal inference successfully captured much of the variation in\nresponses, with all the continuous models performing well (all r > .85). The Probit |t| model had\nthe best performance, r = .984, with Continuous causal support and the t-test models not far behind,\nwith r = .966 and r = .976, respectively.\n\n5 Distinguishing between the continuous models\n\nIn the previous section, all of the models for continuous causal induction performed well. However,\nthe continuous models all made very similar predictions to one another. As a result, it is dif\ufb01cult to\ndistinguish which model of continuous causal induction people might be using. In order to better\ndetermine which of these models most accurately captures human causal induction over continuous\ndimensions, we need to construct data sets that will result in divergent predictions across the various\nmodels.\nBecause of the noisy-OR parameterization of the generative model, (discrete) causal support predic-\ntions are sensitive to the base rate of occurrence while standard statistical tests (e.g., \u03c72) lack this\nsensitivity despite being otherwise good approximations for the rational model [4]. The continuous\ncausal support model also uses a noisy-OR parameterization, meaning that it will also be sensitive\n\n6\n\n\fe\nu\nl\na\nV\n\n \nt\nc\ne\nf\nf\nE\n\nData Set:1\n\nData Set:2\n\nData Set:3\n\nData Set:4\n\n1\n\n1\n\ne\nu\nl\na\nV\n\n1\n\ne\nu\nl\na\nV\n\n1\n\ne\nu\nl\na\nV\n\n \nt\nc\ne\nf\nf\nE\n\n0\n0\nCause Value\n\n0.5\n\n1\n\n \nt\nc\ne\nf\nf\nE\n\n0\n0\nCause Value\n\n0.5\n\n1\n\n \nt\nc\ne\nf\nf\nE\n\n0\n0\nCause Value\n\n0.5\n\n1\n\n0\n0\nCause Value\n\n0.5\n\n1\n\nData Set:5\n\nData Set:6\n\nData Set:7\n\nData Set:8\n\nData Set:9\n\ne\nu\nl\na\nV\n\n \nt\nc\ne\nf\nf\nE\n\n1\n\n0\n0\n1\nCause Value\n\n0.5\n\ne\nu\nl\na\nV\n\n \nt\nc\ne\nf\nf\nE\n\n1\n\n0\n0\n1\nCause Value\n\n0.5\n\ne\nu\nl\na\nV\n\n \nt\nc\ne\nf\nf\nE\n\n1\n\n0\n0\n1\nCause Value\n\n0.5\n\ne\nu\nl\na\nV\n\n \nt\nc\ne\nf\nf\nE\n\n1\n\n0\n0\n1\nCause Value\n\n0.5\n\ne\nu\nl\na\nV\n\n \nt\nc\ne\nf\nf\nE\n\n1\n\n0\n0\n1\nCause Value\n\n0.5\n\nFigure 3: Datasets 1 - 9 for the current experiment. The horizontal axis denotes the value of the\ncause, while the vertical axis denotes whether or not the event occurred.\n\nto base rates in ways that standard statistical models will not. More generally, the assumption of a\nparticular form for generative causal relationships means that, for some data sets, \ufb02ipping the values\nof the effect (replacing a 0 with a 1 and vice versa) can result in different continuous causal support\nvalues, though it leaves unchanged the predictions made by the standard methods.\nWe designed nine data sets to produce such differential predictions. Each data set consisted of a\nseries of \ufb01fty (e, c) pairs, where c \u2208{ .02, .04, . . . , 1} and e \u2208 0, 1. The only differences between\nthe data sets were the functions de\ufb01ning the relationship between c and e. The \ufb01rst four data sets\n(Figure 3, 1-4) were designed as follows: (1) for c < .6 then e \u223c Bern(.6) and for c \u2265 .6 then\ne = 1, (2) \ufb02ipping the e from (1), (3) for c < .6 then e \u223c Bern(.6) and for c \u2265 .6 then e = 0, and\n(4) \ufb02ipping the e from (3). The next \ufb01ve data sets (Figure 3, 5-9) were meant to be analogous to base\nrate effects studied in [4]. There was no relationship between the value of c and e, but the rate at\nwhich e = 1 differed between data sets, sampled from Bern(p) with p = .1, .25, .5, .75, .9, for data\nsets 5-9, respectively. These datasets were then used as the basis for a new behavioral experiment.\n\n5.1 Method\n\nParticipants. A total of 147 participants were recruited through the Amazon Mechanical Turk\nweb service and were paid $0.25 for their participation. Participants were only asked for one such\njudgment, and were randomly sorted into one of the nine data set conditions we described above.\nIn order to account for any participants who did not read the instructions and consider the data, we\neliminated any participants who took less than sixty seconds to complete the study.4 Because of this\nconstraint, twelve participants were removed, leaving 135 participants for analysis. After removing\nthese participants, we were left with \ufb01fteen participants in each condition.\nProcedure. Participants were told that they would be assisting a scientist in identifying \u201cwhether\nor not different levels of a chemical cause a type of bacteria to produce a protein\u201d. They were told\nthat they would see an array of \ufb01fty images like the one in Figure 2(b), each of which denoted the\noutcome of one batch of bacteria. Each of the images consisted of three elements: (1) a black bar\nthat denoted both how much of a chemical was in that batch of bacteria by how large it was with\nrelation to (2) a constant gray line, where a larger bar relative to this indicated that more of the\nchemical was present, and (3) either a green checkmark or a red cross which denoted whether or\nnot the protein was found. Which images were included in the array were determined by the data\ncondition, and the images were sorted into a random order for each participant before being placed\nin the array. Participants were told to take their time in analyzing the data, and then were asked to\nrate \u201cwhether they think the chemical causes the protein to be produced\u201d on a 0-100 scale, where 0\n\n4Though we eliminated these subjects from the analysis here, not eliminating them does not change any of\nthe r scores by more than \u00b1.02. In fact, including these participants increases the performance of our model\nand decreases the performance of the alternative models.\n\n7\n\n\fHuman Responses with Error Bars\n\nContinuous Causal Support\n\ns\ne\nu\nl\na\nV\nd\ne\nl\na\nc\nS\n\n \n\n100\n\n80\n\n60\n\n40\n\n20\n\n0\n\n1 2 3 4 5 6 7 8 9\n\nData Set\n\ns\ne\nu\nl\na\nV\nd\ne\nl\na\nc\nS\n\n \n\n100\n\n80\n\n60\n\n40\n\n20\n\n0\n\nr = .74\n\n1 2 3 4 5 6 7 8 9\n\nData Set\n\nProbit Regression: Abs(T)\n\nProbit Regression: Abs(Beta)\n\nIndependent T\u2212test: Abs(T)\n\nIndependent T\u2212test: Abs(Beta)\n\ns\ne\nu\nl\na\nV\nd\ne\nl\na\nc\nS\n\n \n\n100\n\n80\n\n60\n\n40\n\n20\n\n0\n\nr = .03\n\n1 2 3 4 5 6 7 8 9\n\nData Set\n\ns\ne\nu\nl\na\nV\nd\ne\nl\na\nc\nS\n\n \n\n100\n\n80\n\n60\n\n40\n\n20\n\n0\n\nr = .06\n\n1 2 3 4 5 6 7 8 9\n\nData Set\n\ns\ne\nu\nl\na\nV\nd\ne\nl\na\nc\nS\n\n \n\n100\n\n80\n\n60\n\n40\n\n20\n\n0\n\nr = .06\n\n1 2 3 4 5 6 7 8 9\n\nData Set\n\ns\ne\nu\nl\na\nV\nd\ne\nl\na\nc\nS\n\n \n\n100\n\n80\n\n60\n\n40\n\n20\n\n0\n\nr = .01\n\n1 2 3 4 5 6 7 8 9\n\nData Set\n\nFigure 4: Experimental results, showing human judgments (error bars are one standard error), to-\ngether with unscaled model predictions and corresponding correlations.\n\nmeant extremely unlikely and 100 meant extremely likely. This scale was designed to obtain scalar\nestimates of degrees of belief in causal structure [6].\n\n5.2 Results\n\nAs above, we use a power-law transformation to accommodate nonlinearities in response scale. Be-\nfore discussing the results, we should note that the Figure 4 does not re\ufb02ect the maximal correlation\nbetween the transformed values of the probit and t-test models. The optimized correlation between\nthe mean human responses and mean model predictions for the probit |\u03b2| model and the t-test |t|\nmodel were r = .060, (with, respectively, \u03b1 = 408.6 and \u03b1 = 164.15 ). The optimized correlation\nfor the probit |t| model was r = 0.028 with \u03b1 = 2 \u00d7 10\u22124. The optimized correlation for the t-test\n|t| was r = 0.012, with \u03b1 = 12.2. We did not include the optimized graphs because the optimized\nmean values for all models save the continuous causal support essentially became binary predic-\ntions, and as such they did not convey information about how the probit and t-test model predictions\ndiffered from those made by continuous causal support. The values in Figure 4 re\ufb02ect the the case\nwhere no scaling occurred (i.e. where \u03b1 = 1).\nThe results are striking in that, though all the models performed well at predicting people\u2019s judg-\nments in the Marsh and Ahn studies, all but the continuous causal support model perform poorly\nhere. Continuous causal support outperforms every other model of continuous causal inference\n(r = .744, with \u03b1 = 0.92). Still, it does seem to underestimate human causal ratings in data sets 8\nand 9 (see Figure 4), which suggests further investigation of this phenomenon is needed.\n\n6 Conclusion\n\nWe have proposed a new rational model of causal induction using continuous dimensions, contin-\nuous causal support, which aims to be a \ufb01rst step towards \ufb01lling the gap between existing models\nof causal induction and real-world cases of causal learning. This model successfully predicts hu-\nman judgments found in previous work, and outperforms several other plausible models of causal\ninduction with continuous causes. Future work will hopefully continue to bring our models of\ncausal induction ever closer to addressing the problem of real-world causal induction, for example\nby looking at how temporal information is used in conjunction with traditional statistical informa-\ntion. Consistent with a continuous view of causal induction, we suspect that more work in each of\nthese directions is likely to produce positive results.\n\nAcknowledgements: This work was supported by a Berkeley Graduate Fellowship given to MP and grants IIS-\n0845410 from the National Science Foundation and FA-9550-10-1-0232 from the Air Force Of\ufb01ce of Scienti\ufb01c\nResearch to TLG.\n\n8\n\n\fReferences\n[1] J. R. Anderson. The adaptive character of thought. Erlbaum, Hillsdale, NJ, 1990.\n[2] D. Marr. Vision. W. H. Freeman, San Francisco, CA, 1982.\n[3] P. Cheng. From covariation to causation: A causal power theory. Psychological Review,\n\n104:367\u2013405, 1997.\n\n[4] T. L. Grif\ufb01ths and J. B. Tenenbaum. Structure and strength in causal induction. Cognitive\n\nPsychology, 51:354\u2013384, 2005.\n\n[5] T. L. Grif\ufb01ths and J. B. Tenenbaum. Theory-based causal induction. Psychological review,\n\n116(4):661, 2009.\n\n[6] H. Lu, A. L. Yuille, M. Liljeholm, P. W. Cheng, and K. J. Holyoak. Bayesian generic priors\n\nfor causal learning. Psychological review, 115(4):955, 2008.\n\n[7] J. R. Busemeyer, E. Byun, E. L. DeLosh, and M. A. McDaniel. Learning functional relations\nbased on experience with input-output pairs by humans and arti\ufb01cial neural networks.\nIn\nK. Lamberts and D. Shanks, editors, Concepts and Categories, pages 405\u2013437. MIT Press,\nCambridge, 1997.\n\n[8] T. L. Grif\ufb01ths, C. G. Lucas, J. J. Williams, and M. L. Kalish. Modeling human function learn-\ning with gaussian processes. In Daphne Koller, Yoshua Bengio, Dale Schuurmans, and L\u00b4eon\nBottou, editors, Advances in Neural Information Processing Systems, volume 21, Cambridge,\nMA, 2009. MIT Press.\n\n[9] J. K. Marsh and W. Ahn. Spontaneous assimilation of continuous values and temporal in-\nformation in causal induction. Journal of Experimental Psychology: Learning, Memory, and\nCognition, 35(2):334, 2009.\n\n[10] P. W. Cheng and L. R. Novick. A probabilistic contrast model of causal induction. Journal of\n\nPersonality and Social Psychology, 58:545\u2013567, 1990.\n\n[11] P. A. White. Making causal judgments from the proportion of con\ufb01rming instances: the pCI\nrule. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29:710\u2013727,\n2003.\n\n[12] J. Pearl. Probabilistic reasoning in intelligent systems. Morgan Kaufmann, San Francisco, CA,\n\n1988.\n\n[13] M. R. Waldmann and Y. Hagmayer. Categories and causality: The neglected direction. Cogni-\n\ntive Psychology, 53(1):27\u201358, 2006.\n\n[14] M. R. Waldmann, K. J. Holyoak, and A. Fratianne. Causal models and the acquisition of\n\ncategory structure. Journal of Experimental Psychology: General, 124:181\u2013206, 1995.\n\n[15] C. I. Bliss. The calculation of the dosage-mortality curve. Annals of Applied Biology,\n\n22(1):134\u2013167, 1935.\n\n9\n\n\f", "award": [], "sourceid": 1273, "authors": [{"given_name": "Thomas", "family_name": "Griffiths", "institution": null}, {"given_name": "Michael", "family_name": "James", "institution": null}]}