{"title": "Analyzing human feature learning as nonparametric Bayesian inference", "book": "Advances in Neural Information Processing Systems", "page_first": 97, "page_last": 104, "abstract": "Almost all successful machine learning algorithms and cognitive models require powerful representations capturing the features that are relevant to a particular problem. We draw on recent work in nonparametric Bayesian statistics to define a rational model of human feature learning that forms a featural representation from raw sensory data without pre-specifying the number of features. By comparing how the human perceptual system and our rational model use distributional and category information to infer feature representations, we seek to identify some of the forces that govern the process by which people separate and combine sensory primitives to form features.", "full_text": "Analyzing human feature learning as\nnonparametric Bayesian inference\n\nJoseph L. Austerweil\nDepartment of Psychology\n\nUniversity of California, Berkeley\n\nBerkeley, CA 94720\n\nJoseph.Austerweil@gmail.com\n\nThomas L. Grif\ufb01ths\n\nDepartment of Psychology\n\nUniversity of California, Berkeley\nTom Griffiths@berkeley.edu\n\nBerkeley, CA 94720\n\nAbstract\n\nAlmost all successful machine learning algorithms and cognitive models require\npowerful representations capturing the features that are relevant to a particular\nproblem. We draw on recent work in nonparametric Bayesian statistics to de\ufb01ne a\nrational model of human feature learning that forms a featural representation from\nraw sensory data without pre-specifying the number of features. By comparing\nhow the human perceptual system and our rational model use distributional and\ncategory information to infer feature representations, we seek to identify some of\nthe forces that govern the process by which people separate and combine sensory\nprimitives to form features.\n\nIntroduction\n\n1\nMost accounts of the processes underlying human learning, decision-making, and perception assume\nthat stimuli have \ufb01xed sets of features. For example, traditional accounts of category learning start\nwith a set of features (e.g., is furry and barks), which are used to learn categories (e.g., dogs).\nIn a sense, features are the basic atoms for these processes. Although the model\u2019s features may\nbe combined in particular ways to create new features, the basic primitives are assumed to be \ufb01xed.\nWhile this assumption has been useful in investigating many cognitive functions, it has been attacked\non empirical [1] and theoretical [2] grounds. Experts identify parts of objects in their domain of\nexpertise vastly differently than novices (e.g., [3]), and evidence for \ufb02exible feature sets has been\nfound in many laboratory experiments (see [2] for a review). In this paper, we present an account of\nhow \ufb02exible features sets could be induced from raw sensory data without requiring the number of\nfeatures to be prespeci\ufb01ed.\nFrom early work demonstrating XOR is only learnable by a linear classi\ufb01er with the right represen-\ntation [4] to the so-called \u201ckernel trick\u201d popular in support vector machines [5], forming an appropri-\nate representation is a fundamental issue for applying machine learning algorithms. We draw on the\nconvergence of interest from cognitive psychologists and machine learning researchers to provide a\nrational analysis of feature learning in the spirit of [6], de\ufb01ning an \u201cideal\u201d feature learner using ideas\nfrom nonparametric Bayesian statistics. Comparing the features identi\ufb01ed by this ideal learner to\nthose learned by people provides a way to understand how distributional and category information\ncontribute to feature learning.\nWe approach the problem of feature learning as one of inferring hidden structure from observed data\n\u2013 a problem that can be solved by applying Bayesian inference. By using methods from nonpara-\nmetric Bayesian statistics, we can allow an unbounded amount of structure to be expressed in the\nobserved data. For example, nonparametric Bayesian clustering models allow observations to be\nassigned to a potentially in\ufb01nite number of clusters, of which only a \ufb01nite number are represented\nat any time. When such a model is presented with a new object that it cannot currently explain,\n\n1\n\n\fit increases the complexity of its representation to accommodate the object. This \ufb02exibility gives\nnonparametric Bayesian models the potential to explain how people infer rich latent structure from\nthe world, and such models have recently been applied to a variety of aspects of human cognition\n(e.g., [6, 7]). While nonparametric Bayesian models have traditionally been used to solve problems\nrelated to clustering, recent work has resulted in new models that can infer a set of features to repre-\nsent a set of objects without limiting the number of possible features [8]. These models are based on\nthe Indian Buffet Process (IBP), a stochastic process that can be used to de\ufb01ne a prior on the features\nof objects. We use the IBP as the basis for a rational model of human perceptual feature learning.\nThe plan of the paper is as follows. Section 2 summarizes previous empirical \ufb01ndings from the\nhuman perceptual feature learning literature. Motivated by these results, Section 3 presents a rational\nanalysis of feature learning, focusing on the IBP as one component of a nonparametric Bayesian\nsolution to the problem of \ufb01nding an optimal representation for some set of observed objects. Section\n4 compares human learning and the predictions of the rational model. Section 5 concludes the paper.\n\n2 Human perceptual feature learning\nOne main line of investigation of human feature learning concerns the perceptual learning phe-\nnomena of unitization and differentiation. Unitization occurs when two or more features that were\npreviously perceived as distinct features merge into one feature. In a visual search experiment by\nShiffrin and Lightfoot [9], after learning that the features that generated the observed objects co-vary\nin particular ways, partcipants represented each object as its own feature instead of as three separate\nfeatures. In contrast, differentiation is when a fused feature splits into new features. For example,\ncolor novices cannot distinguish between a color\u2019s saturation and brightness; however, people can\nbe trained to make these distinctions [10]. Although general conditions for when differentiation or\nunitization occur have been outlined, there is no formal account for why and when these processes\ntake place.\nIn Shiffrin and Lightfoot\u2019s visual search experiment [9], participants were trained to \ufb01nd one of the\nobjects shown in Figure 1(a) in a scene where the other three objects were present as distractors.\nEach object is composed of three features (single line segments) inside a rectangle. The objects can\nthus be represented by the feature ownership matrix shown in Figure 1(a), with Zik = 1 if object i\nhas feature k. After prolonged practice, human performance drastically and suddenly improved, and\nthis advantage did not transfer to other objects created from the same feature set. They concluded\nthat the human perceptual system had come to represent each object holistically, rather than as being\ncomposed of its more primitive features. In this case, the fact that the features tended to co-occur\nonly in the con\ufb01gurations corresponding to the four objects provides a strong cue that they may not\nbe the best way to represent these stimuli.\nThe distribution of potential features over objects provides one cue for inferring a feature represen-\ntation; however, there can be cases where multiple feature representations are equally good. For ex-\nample, Pevtzow and Goldstone [11] demonstrated that human perceptual feature learning is affected\nby category information. In the \ufb01rst part of their experiment, they trained participants to categorize\neight \u201cdistorted\u201d objects into one of three groups using one of two categorization schemes. The\nobjects were distorted by the addition of a random line segment. The category membership of four\nof the objects, A-D, depended on the training condition, as shown in Figure 1 (b). Participants in the\nhorizontal categorization condition had objects A and B categorized into one group and objects C\nand D into the other. Those in the vertical categorization condition learned objects A and C are cat-\negorized into one group and objects B and D in the other. The nature of this categorization affected\nthe features learned by participants, providing a basis for selecting one of the two featural repre-\nsentations for these stimuli that would otherwise be equally well-justi\ufb01ed based on distributional\ninformation.\nRecent work has supplemented these empirical results with computational models of human feature\nlearning. One such model is a neural network that incorporates categorization information as it\nlearns to segment objects [2]. Although the inputs to the model are the raw pixel values of the\nstimuli, the number of features must be speci\ufb01ed in advance. This is a serious issue for an analysis of\nhuman feature learning because it does not allow us to directly compare different feature set sizes \u2013\na critical factor in capturing unitization and differentiation phenomena. Other work has investigated\nhow the human perceptual system learns to group objects that seem to arise from a common cause\n\n2\n\n\fx1\n\nx2\n\nx3\n\nx4\n\nx1\nx2\nx3\nx4\n\n1\n0\n0\n1\n\n1\n1\n0\n0\n\n1\n0\n1\n0\n\n0\n1\n1\n0\n\n0\n0\n1\n1\n\n0\n1\n0\n1\n\n(a)\n\n(b)\n\nFigure 1: Inferring representations for objects.\n(a) Stimuli and feature ownership matrix from\nShiffrin and Lightfoot [9]. (b) Four objects (A-D) and inferred features depending on categorization\nscheme from Pevtzow and Goldstone [11]\n\n[12]. This work uses a Bayesian model that can vary the number of causes it identi\ufb01es, but assumes\nindifference to the spatial position of the objects and that the basic objects themselves are already\nknown, with a binary variable representing the presence of an object in each scene being given to\nthe model as the observed data. This model is thus given the basic primitives from raw sensory data\nand does not provide an account of how the human perceptual system identi\ufb01es these primitives. In\nthe remainder of the paper, we develop a rational model of human feature learning that applies to\nraw sensory data and does not assume a \ufb01xed number of features in advance.\n\n3 A Rational Analysis of Feature Learning\n\nRational analysis is a technique for understanding a cognitive process by comparing it to the optimal\nsolution to an underlying computational problem [6], with the goal of understanding how the struc-\nture of this problem in\ufb02uences human behavior. By formally analyzing the problem of inferring\nfeatural representations from raw sensory data of objects, we can determine how distributional and\ncategory information should in\ufb02uence the features used to represent a set of objects.\n\nInferring Features from Percepts\n\n3.1\nOur goal is to form the most probable feature representation for a set of objects given the set of\nobjects we see. Formally, we can represent the features of a set of objects with a feature ownership\nmatrix Z like that shown in Figure 1, where rows correspond to objects, columns correspond to\nfeatures, and Zik = 1 indicates that object i possesses feature k. We can then seek to identify the\nmost likely feature ownership matrix Z given the observed properties of a set of objects X by a\nsimple application of Bayes theorem:\n\n\u02c6Z = arg max\n\nZ\n\nP (Z|X) = arg max\n\nZ\n\n= arg max\n\nZ\n\nP (X|Z)P (Z)\n\n(1)\n\nThis separates the problem of \ufb01nding the best featural representation given a set of data into two\nsubproblems: \ufb01nding a representation that is in general probable, as expressed by the prior P (Z),\nand \ufb01nding a representation that generates the observed properties of the objects with high proba-\nbility, as captured by the likelihood P (X|Z). We consider how these distributions are de\ufb01ned in\nturn.\n\nP (X|Z)P (Z)\n!Z! P (X|Z!)P (Z!)\n\n3\n\n\fP (Z) =\n\nexp{\u2212\u03b1HN}\n\nK+#k=1\n\nN!\n\n1 , . . . , xT\n\n\u03b1K+\nh=1 Kh!\n\n3.2 A Prior on Feature Ownership Matrices\nAlthough in principle any distribution on binary matrices P (Z) could be used as a prior, we use one\nparticular nonparametric Bayesian prior, the Indian Buffet Process (IBP) [8]. The IBP has several\nnice properties: it allows for multiple features per object, possessing one feature does not make\npossessing another feature less likely, and it generates binary matrices of unbounded dimensionality.\nThis allows the IBP to use an appropriate, possibly different, number of features for each object and\nmakes it possible for the size of the feature set to be learned from the objects.\nThe IBP de\ufb01nes a distribution over binary matrices with a \ufb01xed number of rows and an in\ufb01nite\nnumber of columns, of which only a \ufb01nite number are expected to have non-zero elements. The\ndistribution thus permits tractable inference of feature ownership matrices without specifying the\nnumber of features ahead of time. The probability of a feature ownership matrix under the IBP is\ntypically described via an elaborate metaphor in which objects are customers and features are dishes\nin an Indian buffet, with the choice of dishes determining the features of the object, but reduces to\n(2)\nwhere N is the number of objects, Kh is the number of features with history h (the history is the\ncolumn of the feature interpreted as a binary number), K+ is the number of columns with non-zero\nentries, HN is the N-th harmonic number, \u03b1 affects the number of features objects own and mk is\nthe number of objects that have feature k.\n3.3 Two Likelihood Functions for Perceptual Data\nTo de\ufb01ne the likelihood, we assume N objects with d observed dimensions (e.g., pixels in an image)\nare grouped in a matrix X (X = [xT\nN], where xi \u2208R d). The feature ownership matrix Z\nmarks the commonalities and contrasts between these objects, and the likelihood P (X|Z) expresses\nhow these relationships in\ufb02uence their observed properties. Although in principle many forms are\npossible for the likelihood, two have been used successfully with the IBP in the past: the linear-\nGaussian [8] and noisy-OR [13] models.\nThe linear-Gaussian model assumes that xi is drawn from a Gaussian distribution with mean ziA\nand covariance matrix \u03a3X = \u03c32\nXI, where zi is the binary vector de\ufb01ning the features of object xi\nand A is a matrix of the weights of each element of D of the raw data for each feature k.\ntr((X \u2212 ZA)T (X \u2212 ZA))}\n\n(N \u2212 mk)!(mk \u2212 1)!\n\n1\nX)N D/2 exp{\u2212\n(2\u03c0\u03c32\n\n\"2N\u22121\n\n1\n2\u03c32\nX\n\np(X|Z, A,\u03c3 X) =\n\n(3)\nAlthough A actually represents the weights of each feature (which combine with each other to\ndetermine raw pixel values of each object), it is integrated out of so that the conditional probability\nof X given Z and A only depends on Z and hyperparameters corresponding to the variance in X and\nA (see [8] for details). The result of using this model is a set of images representing the perceptual\nfeatures corresponding to the matrix Z, expressed in terms of the posterior distribution over the\nweights A.\nFor the noisy-OR model [13], the raw visual data is reduced to binary pixel values. This model\nassumes that the pixel values X are generated from a noisy-OR distribution where Z de\ufb01nes the\nfeatures that each object has and Y de\ufb01nes which pixels that should be one for each feature:\n(4)\nwhere hyperparameters \u0001 and \u03bb represent the probability a pixel is turned on without a cause and\nthe probability a feature fails to turn on a pixel respectively. Additionally, Y is assumed to have\na Bernoulli prior with hyperparameter p representing the probability that an entry of Y is one,\narrays indicating the pixels associated with the features identi\ufb01ed by Z, expressed via the posterior\ndistribution on Y .\n3.4 Summary\nThe prior and likelihood de\ufb01ned in the preceding sections provide the ingredients necessary to use\nBayesian inference to identify the features of a set of objects from raw sensory data. The result\n\nwith p(Y ) =\"k,d pyk,d(1 \u2212 p)1\u2212yk,d. The result of using this model is a distribution over binary\n\np(xi,d = 1|Z, Y,\u03bb ,\u0001 ) = 1 \u2212 (1 \u2212 \u03bb)zi,:y:,d(1 \u2212 \u0001)\n\n4\n\n\fFigure 2: Inferring feature representations using distributional information from Shrif\ufb01n and Light-\nfoot [9]. On the left, bias features and on the right, the four objects as learned features. The rational\nmodel justi\ufb01es the human perceptual system\u2019s unitization of the objects as features\n\nis a posterior distribution on feature ownership matrices Z, indicating how a set of objects could\nbe represented, as well as an indication of how the features identi\ufb01ed by this representation are\nexpressed in the sensory data. While computing this posterior distribution exactly is intractable, we\ncan use existing algorithms developed for probabilistic inference in these models. Although we used\nGibbs sampling \u2013 a form of Markov chain Monte Carlo that produces samples from the posterior\ndistribution on Z \u2013 for all of our simulations, Reversible Jump MCMC and particle \ufb01ltering inference\nalgorithms have also been derived for these models [8, 13, 14].\n\n4 Comparison with Human Feature Learning\nThe nonparametric Bayesian model outlined in the previous section provides an answer to the ques-\ntion of how an ideal learner should represent a set of objects in terms of features. In this section we\ncompare the representations discovered by this ideal model to human inferences. First, we demon-\nstrate that the representation discovered by participants in Shiffrin and Lightfoot\u2019s experiment [9]\nis optimal under this model. Second, we illustrate that both the IBP and the human perceptual\nsystem incorporate category information appropriately. Finally, we present simulations that show\nthe \ufb02exibility of the IBP to learn different featural representations depending on the distributional\ninformation of the actual features used to generate the objects, and discuss how this relates to the\nphenomena of unitization and differentiation more generally.\n\n4.1 Using Distributional Information\nWhen should whole objects or line segments be learned as features? It is clear which features should\nbe learned when all of the line segments occur independently and when the line segments in each\nobject always occur together (the line segments and the objects respectively). However, in the inter-\nmediate cases of non-perfect co-occurence, what should be learned? Without a formal account of\nfeature learning, there is no basis for determining when object \u201cwholes\u201d or \u201cparts\u201d should be learned\nas features. Our rational model provides an answer \u2013 when there is enough statistical evidence for\nthe individual line segments to be features, then each line segment should be differentiated into\nfeatures. Otherwise, the collection of line segments should be learned as one unitized feature.\nThe stimuli constructed by Shiffrin and Lightfoot [9] constitute one of the intermediate cases be-\ntween the extremes of total independence and perfect correlation, and are thus a context in which\nformal modeling can be informative. Figure 2 presents the features learned by applying the model\nwith a noisy-OR likelihod to this object set. The features on left are the bias and the four features on\nthe right are the four objects from their study. The learned features match the representation formed\nby people in the experiment. Although there is imperfect co-occurence between the features in each\nobject, there is not enough statistical evidence to warrant representing the object as a combination of\nfeatures. These results were obtained with an object set consisting of \ufb01ve copies of each of the four\nobjects with added noise that \ufb02ips a pixel\u2019s value with probability 1\n75. The results were obtained by\nrunning the Gibbs sampler with initialization p = 0.2, \u03b1 = 1.0, \u0001 = 0.025, and \u03bb = .975. Inference\nis robust to different initializations as long as they are near these values.\n\n5\n\n\f(a)\n\n(b)\n\n(c)\n\n(d)\n\nFigure 3: Inferring feature representations using category information from Pevtzow and Goldstone\n[11]. (a) - (b) Features learned from using the rational model with the noisy-OR likelihood where\n10 distorted copies of objects A-D comprise the object set with (a) horizontal and (b) vertical cate-\ngorization schemes (c = 35) respectively. The features inferred by the model match those learned\nby participants in the experiment. (c) - (d) Features learned from using the same model with the full\nobject set with 10 distorted copies of each object, the (c) horizontal and (d) vertical categorization\nschemes (c = 75) respectively. The \ufb01rst two features learned by the model match those learned\nby participants in the experiment. The third feature represents the intersection of the third category\n(Pevtzow and Goldstone did not test if participants learned this feature).\n\n4.2 Using Category Information\nTo model the results of Pevtzow and Goldstone [11], we applied the rational model with the noisy-\nOR likelihood to the stimuli used in their experiment. Although this model does not incorporate\ncategory information directly, we included it indirectly by postpending c bits per category to the\nend of each image. Figure 3 (a) and (b) show the features learned by the model when trained on\ndistorted objects A-D using both categorization schemes. The categorization information is used\nappropriately by the model and mirrors the different feature representations inferred by the two\npariticipant groups. Figure 3 (c) and (d) show the features learned by the model when given ten\ndistorted copies of all eight objects. Like the human perceptual system, the model infers different,\notherwise undistinguishable, feature sets using categorization information appropriately. Although\nthe neural network model of feature learning presented in [2] also inferred correct representations\nwith the four object set, this model did not produce correct results for the eight object set. Inference\nis susceptible to local minima given poor initializations of the hyperparameters. The features shown\nin Figure 3 used the following initialization: p = 0.125, \u03b1 = 1.5, \u03bb = 0.99, and \u0001 = 0.01.1\n4.3 Unitization and Differentiation\nThe results presented in this section show that our rational model reproduces human inferences for\nparticular datasets, suggesting that the model might be useful more generally in identifying condi-\ntions under which the human perceptual system should unitize or differentiate sensory primitives.\nThe Shiffrin and Lightfoot results demonstrated one case where whole objects should be learned as\nfeatures even though each object was created from features that did not perfectly co-occur. The IBP\ncon\ufb01rms the intuitive explanation that there is not enough statistical evidence to break (differentiate)\nthe objects into individual features and thus the unitization behavior of the participants is justi\ufb01ed.\nHowever, there is no comparison with the same underlying feature set to when statistical evidence\nwarrants differentiation, so that the individual features should be learned as features.\nTo illustrate the importance of distributional information on the inferred featural representation, we\ndesigned a simulation to show cases where the objects and the actual features used to generate the\nobjects should be learned as the features. Figure 4 (a) shows the bias (on left) and the set of six\nfeatures used in the simulations. Figure 4 (b) is an arti\ufb01cially generated set of observed objects for\n\n1The features inferred by the model in each \ufb01gure has highest probability given the images it observed.\n\n6\n\n\f(a)\n\n(b)\n\n(d)\n\n1\n0\n1\n0\n1\n\n1\n1\n0\n0\n1\n\n0\n0\n1\n1\n0\n\n1\n0\n0\n1\n1\n\n0\n1\n1\n0\n0\n\n0\n1\n0\n1\n0\n\n...\n\n1\n1\n1\n1\n1\n\n1\n1\n1\n1\n0\n\n0\n0\n1\n0\n0\n\n0\n0\n0\n1\n0\n\n1\n0\n0\n0\n1\n\n0\n1\n0\n0\n1\n\n...\n\n(c)\n\n(e)\n\nFigure 4: Inferring different feature representations depending on the distributional information.\n(a) The bias (on left) and the six features used to generate both object sets. (b) - (c) The feature\nmembership matrices for (b) unitization and (c) differentiation sets respectively. (d) - (e) The feature\nrepresentations inferred by model for (d) unitization and (e) differentiation sets respectively.\n\n3% objects (differentiation\n\nmembership matrix used to generate the observed objects is all possible$6\n\nwhich there is not enough statistical evidence to warrant differentiation. This is the same underlying\nfeature membership matrix as the Shiffrin and Lightfoot result (unitization set). Figure 4 (c) is an\narti\ufb01cially generated object set in which the observed objects should be differentiated. Here, the fea-\ntures used to generate the objects occur independently of each other and thus the underlying feature\nset).\nFigure 4 (d) and (e) show the results of applying the rational model with a noisy-OR likelihood to\nthese two object sets. When the underlying features occur independently of each other, the model\nrepresents the objects in terms of these features. When the features often co-occur, the model forms\na representation which consists simply of the objects themselves. For each simulation, 40 objects\nfrom the appropriate set (repeating as necessary) were presented to the model. Each object was\nperturbed by added noise that \ufb02ipped a pixel\u2019s value with probability 1\n75. The hyperparameters were\ninferred with Metropolis-Hastings steps during Gibbs sampling and were initialized to: \u03b1 = 1,\nA = 0.5. These simulations demonstrate that even when the same underlying\nX = 2.25, and \u03c32\n\u03c32\nfeatures create two object sets, different representations should be inferred depending on the the\ndistributional information, suggesting that this kind of information can be a powerful driving force\nbehind unitization and differentiation.\n5 Discussion and Future Directions\nThe \ufb02exibility of human featural representations and the power of representation in machine learn-\ning make a formal account of how people derive representations from raw sensory information\ntremendously important. We have outlined one approach to this problem, drawing on ideas from\nnonparametric Bayesian statistics to provide a rational account of how the human perceptual system\nuses distributional and category information to infer representations. First, we showed that in one\ncircumstance where it is ambiguous whether or not parts or objects should form the featural rep-\n\n7\n\n\fresentation of the objects, that this model peforms similarily to the human perceptual system (they\nboth learn the objects themselves as the basic features). Second, we demonstrated that the IBP and\nthe human perceptual systems both use categorization information to make the same inductions as\nappropriate for the given categorization scheme. Third, we further investigated how distributional\ninformation of the features that create the object set affects the inferred representation. These results\nbegin to sketch a picture of human feature learning as a rational combination of different sources of\ninformation about the structure of a set of objects.\nThere are two main future directions for our work. First, we intend to perform further analysis of\nhow the human perceptual system uses statistical cues. Speci\ufb01cally, we plan to investigate whether\nthe feature sets identi\ufb01ed by the perceptual system are affected by the distributional information it\nis given (as our simulations would suggest). Second, we hope to use hierarchical nonparametric\nBayesian models to investigate the interplay between knowledge effects and perceptual input. Re-\ncent work has identi\ufb01ed a connection between the IBP and the Beta process [15], making it possible\nto de\ufb01ne hierarchical Bayesian models in which the IBP appears as a component. Such models\nwould provide a more natural way to capture the in\ufb02uence of category information on feature learn-\ning, extending the analyses that we have performed here.\nAcknowledgements We thank Rob Goldstone, Karen Schloss, Stephen Palmer, and the Computational Cogni-\ntive Science Lab at Berkeley for discussions and the Air Force Of\ufb01ce of Scienti\ufb01c Research for support.\nReferences\n[1] P. G. Schyns, R. L. Goldstone, and J. Thibaut. Development of features in object concepts. Behavioral\n\nand Brain Sciences, 21:1\u201354, 1998.\n\n[2] R. L. Goldstone. Learning to perceive while perceiving to learn. In Perceptual organization in vision:\n\nBehavioral and neural perspectives, pages 233\u2013278. 2003.\n\n[3] I. Biederman and M. M. Schiffrar. Sexing day-old chicks: A case study and expert systems analysis\nof a dif\ufb01cult perceptual-learning task. Journal of Experimental Psychology: Learning, Memory, and\nCognition, 13:640\u2013645, 1987.\n\n[4] M. L. Minsky and S. A. Papert. Perceptrons. MIT Press, Cambridge, MA, 1969.\n[5] B. Scholkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2001.\n[6] J. R. Anderson. Is human cognition adaptive? Behavioral and Brain Sciences, 14:471\u2013517, 1991.\n[7] A. N. Sanborn, T. L. Grif\ufb01ths, and D. J. Navarro. A more rational model of categorization. In Proceedings\n\nof the 28th Annual Conference of the Cognitive Science Society, 2006.\n\n[8] T. L. Grif\ufb01ths and Z. Ghahramani. In\ufb01nite latent feature models and the Indian buffet process. In Advances\n\nin Neural Information Processing Systems 18, 2006.\n\n[9] R. M. Shiffrin and N. Lightfoot. Perceptual learning of alphanumeric-like characters. In The psychology\n\nof learning and motivation, volume 36, pages 45\u201382. Academic Press, San Diego, 1997.\n\n[10] R. L. Goldstone.\n\nIn\ufb02uences of categorization on perceptual discrimination. Journal of Experimental\n\nPsychology: General, 123:178\u2013200, 1994.\n\n[11] R. Pevtzow and R. L. Goldstone. Categorization and the parsing of objects. In Proceedings of the Six-\nteenth Annual Conference of the Cognitive Science Society, pages 712\u2013722, Hillsdale, NJ, 1994. Lawrence\nErlbaum Associates.\n\n[12] G. Orban, J. Fiser, R. N. Aslin, and M. Lengyel. Bayesian model learning in human visual perception. In\n\nAdvances in Neural Information Processing Systems 18, 2006.\n\n[13] F. Wood, T. L. Grif\ufb01ths, and Z. Ghahramani. A non-parametric Bayesian method for inferring hidden\n\ncauses. In Proceeding of the 22nd Conference on Uncertainty in Arti\ufb01cial Intelligence, 2006.\n\n[14] F. Wood and T. L. Grif\ufb01ths. Particle \ufb01ltering for nonparametric Bayesian matrix factorization. In Advances\n\nin Neural Information Processing Systems 19, 2007.\n\n[15] R. Thibaux and M. I. Jordan. Hierarchical Beta processes and the Indian buffet process. Technical Report\n\n719, University of California, Berkeley. Department of Statistics, 2006.\n\n8\n\n\f", "award": [], "sourceid": 841, "authors": [{"given_name": "Thomas", "family_name": "Griffiths", "institution": null}, {"given_name": "Joseph", "family_name": "Austerweil", "institution": null}]}