{"title": "ALCOVE: A Connectionist Model of Human Category Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 649, "page_last": 655, "abstract": null, "full_text": "ALCOVE: A Connectionist Model of \n\nHuman Category Learning \n\nJohn K. Kruschke \nDepartment of Psychology and Cognitive Science Program \nIndiana University, Bloomington IN 47405-4201 USA \ne-mail: kruschke@ucs.indiana.edu \n\nAbstract \n\nALCOVE is a connectionist model of human category learning that fits a \nbroad spectrum of human learning data. Its architecture is based on well(cid:173)\nestablished psychological theory, and is related to networks using radial \nbasis functions. From the perspective of cognitive psychology, ALCOVE can \nbe construed as a combination of exemplar-based representation and error(cid:173)\ndriven learning. From the perspective of connectionism, it can be seen as \nincorporating constraints into back-propagation networks appropriate for \nmodelling human learning. \n\n1 \n\nINTRODUCTION \n\nALCOVE is intended to accurately model human, perhaps non-optimal, performance \nin category learning. While it is a feed-forward network that learns by gradient \ndescent on error, it is unlike standard back propagation (Rumelhart, Hinton & \n'''illiams, 1986) in its architecture, its behavior, and its goals. Unlike the standard \nback-propagation network, which was motivated by generalizing neuron-like per(cid:173)\nceptrons, the architecture of ALCOVE was motivated by a molar-level psychological \ntheory, Nosofsky's (1986) generalized context model (GCM). The psychologically \nconstrained architecture results in behavior that captures the detailed course of hu(cid:173)\nman category learning in many situations where standard back propagation fares \nless well. And, unlike most applications of standard back propagation, the goal of \nALCOVE is not to discover new (hidden-layer) representations after lengthy training, \nbut rather to model the course of learning itself (Kruschke, 1990c), by determining \nwhich dimensions of the given representation are most relevant to the task, and how \nstrongly to associate exemplars with categories. \n\n649 \n\n\f650 \n\nKruschke \n\nCategory nodes. \n\nLearned association weights. \n\nExemplar nodes. \n\no 0 \n\nLearned attention strengths. \n\nStimulus dimension nodes. \n\nFigure 1: The architecture of ALCOVE (Attention Learning covEring map). Exem(cid:173)\nplar nodes show their activation profile when r = q = 1 in Eqn. 1. \n\n2 THE MODEL \n\nLike the GCM, ALCOVE assumes that input patterns can be represented as points in a. \nmulti-dimensional psychological space, as determined by multi-dimensiona.l scaling \nalgorithms (e.g., Shepard, 1962). Each input node encodes a single psychological \ndimension, with the activation of the node indicating the value of the stimulus on \nthat dimension. Figure 1 shows the architecture of ALCOVE, illustrating the case of \njust two input dimensions. \n\nEach input node is gated by a dimensional attention strength ai. The attention \nstrength on a dimension reflects the relevance of that dimension for the particular \ncategorization task at hand, and the model learns to allocate more attention to \nrelevant dimensions and less to irrelevant dimensions. \n\nEach hidden node corresponds to a position in the multi-dimensional stimulus space, \nwith one hidden node placed at the position of every training exemplar. Each hidden \nnode is activated according to the psychological similarity of the stimulus to the \nexemplar represented by the hidden node. The similarity function comes from the \nGCM and the work of Shepard (1962; 1987): Let the position of the ph hidden \nnode be denoted as (hjl' hj2' ... ), and let the activation of the ph hidden node be \ndenoted as ajid. Then \n\nwhere c is a positive constant called the specificity of the node, where the sum is \ntaken over all input dimensions, and where rand q are constants determining the \nsimilarity metric and similarity gradient, respectively. For separable psychologica.l \n\n(1) \n\n\fALCOVE: A Connectionist Model of Human Category Learning \n\n651 \n\n(a) *: .... t \n\n\u2022.... ~ \n\"';t-bL\u00b7~\u00b7\u00b7 \n\n.....\u2022..... \n\n(b) \n\n\u2022..\u2022 \n\n__ ... J .... .&. \n~ \n\nFigure 2: (a) Increasing attention on the horizontal axis and decreasing attention on \nthe vertical axis causes exemplars of the two categories (denoted by dots and + 's) to \nhave greater between-category dissimilarity and greater within-category similarity. \n(b) ALCOVE cannot differentially attend to diagonal \n(After Nosofsky, 1986, Fig. 2.) \naxes. \n\ndimensions, the city-block metric (1' = 1) is used, while integra.l dimensions might \ncall for a Euclidean metric (r = 2). An exponential similarity gradient (q = 1) is \nused here (Shepard, 1987; this volume), but a Gaussian similarity gradient (q = 2) \ncan sometimes be appropriate. \n\nThe dimensional attention strengths adjust themselves so that exemplars from dif(cid:173)\nferent categories become less similar, and exemplars within categories become more \nsimilar. Consider a simple case of four stimuli that form the corners of a square in \ninput space, as in Figure 2(a). The two left stimuli are mapped to one category \n(indicated by dots) and the two right stimuli are mapped to another category (indi(cid:173)\ncated by +'s). ALCOVE learns to increase the attention strength on the horizontal \naxis, and to decrease the attention strength on the vertical axis. On the other hand, \nALCOVE cannot stretch or shrink diagonally, as suggested in Figure 2(b). This con(cid:173)\nstraint is an accurate reflection of human performance, in that categories separated \nby a diagonal boundary tend to take longer to learn than categories separa.ted by a \nboundary orthogonal to one dimension. \n\nEach hidden node is connected to output nodes that correspond to response cate(cid:173)\ngories. The connection from the lh hidden node to the kth category node hac; a \nconnection weight denoted Wkj' called the association weight between the exemplar \nand the category. The output (category) nodes are activated by the linear rule used \nin the GCM and the network models of Gluck and Bower (1988a,b): \n\naout - ' \" W ahid \nk \n\n- L..J kj j \n\n. \n\nhid \nj \n\n(2) \n\nIn ALCOVE, unlike the GCM, the association weights are learned and can take on any \nreal value, including negative values. Category activations are mapped to response \nprobabilities using the same choice rule as was used in the GCM and network models . \nThus, \n\nPr(I<) = exp( \u00a2 alt) / L exp( \u00a2 akut ) \n\n(3) \n\nout \nk \n\n\f652 \n\nKruschke \n\nwhere