{"title": "Estimating analogical similarity by dot-products of Holographic Reduced Representations", "book": "Advances in Neural Information Processing Systems", "page_first": 1109, "page_last": 1116, "abstract": null, "full_text": "Estimating analogical similarity by dot-products \n\nof Holographic Reduced Representations. \n\nTony A. Plate \n\nDepartment of Computer Science, University of Toronto \n\nToronto, Ontario, Canada M5S 1A4 \n\nemail: tap@ai.utoronto.ca \n\nAbstract \n\nModels of analog retrieval require a computationally cheap method of \nestimating similarity between a probe and the candidates in a large pool \nof memory items. The vector dot-product operation would be ideal for \nthis purpose if it were possible to encode complex structures as vector \nrepresentations in such a way that the superficial similarity of vector \nrepresentations reflected underlying structural similarity. This paper de(cid:173)\nscribes how such an encoding is provided by Holographic Reduced Rep(cid:173)\nresentations (HRRs), which are a method for encoding nested relational \nstructures as fixed-width distributed representations. The conditions un(cid:173)\nder which structural similarity is reflected in the dot-product rankings of \nHRRs are discussed. \n\n1 INTRODUCTION \n\nGentner and Markman (1992) suggested that the ability to deal with analogy will be a \n\"Watershed or Waterloo\" for connectionist models. They identified \"structural alignment\" \nas the central aspect of analogy making. They noted the apparent ease with which people \ncan perform structural alignment in a wide variety of tasks and were pessimistic about the \nprospects for the development of a distributed connectionist model that could be useful in \nperforming structural alignment. \n\nIn this paper I describe how Holographic Reduced Representations (HRRs) (Plate, 1991; \nPlate, 1994), a fixed-width distributed representation for nested structures, can be used \nto obtain fast estimates of analogical similarity. A HRR is a high dimensional vector, \n\n1109 \n\n\f1110 \n\nPlate \n\nand the vector dot-product of two HRRs is an efficiently computable estimate of the \noverall similarity between the two structures represented. This estimate reflects both \nsurface similarity and some aspects of structural similarity,l even though alignments are \nnot explicitly calculated. I also describe contextualization, an enrichment ofHRRs designed \nto make dot-product comparisons of HRRs more sensitive to structural similarity. \n\n2 STRUCTURAL ALIGNMENT & ANALOGICAL REMINDING \n\nPeople appear to perform structural alignment in a wide variety of tasks, including per(cid:173)\nception, problem solving, and memory recall (Gentner and Markman, 1992; Markman, \nGentner and Wisniewski, 1993). One task many researchers have investigated is analog \nrecall. A subject is shown a number of stories and later is shown a probe story. The task \nis to recall stories that are similar to the probe story (and sometimes evaluate the degree of \nsimilarity and perform analogical reasoning). \n\nMACIFAC, a computer models of this process, has two stages(Gentner and Forbus, 1991). \nThe first stage selects a few likely analogs from a large number of potential analogs. The \nsecond stage searches for an optimal (or at least good) mapping between each selected story \nand the probe story and outputs those with the best mappings. Two stages are necessary \nbecause it is too computationally expensive to search for an optimal mapping between the \nprobe and all stories in memory. An important requirement for a first stage is that its \nperformance scale well with both the size and number of episodes in long-term memory. \nThis prevents the first stage of MACIFAC from considering any structural features. \n\nLarge pool of items in memory \n\nProbe \n\n0000 \n\n0 0 00 000 \n3 0080 0 0 \n0 0000 0 0 0 0 \n000 00 0 \no:J 0 0 \n0 0 \no 0 ~ ~ 0 \n\nanalogies \n00 \n0 00 \n00 \n\nGood \nanalogies \n\no o \n\nCheap filtering process \nbased on surface features \n\nExpensive selection pro(cid:173)\ncess based on structural \nfeatures \n\nFigure 1: General architecture of a two-stage retrieval model. \n\nWhile it is indisputable that people take structural correspondences into account when \nevaluating and using analogies (Gentner, Rattermann and Forbus, 1993), it is less certain \nwhether structural similarity influences access to long term memory (i.e., the first-stage \nreminding process). Some studies have found little effect of analogical similarity on \nreminding (Gentner and Forbus, 1991; Gentner, Rattermann and Forbus, 1993), while \nothers have found some effect (Wharton et aI., 1994). \n\nl\"Surface features\" of stories are the features of the entities and relations involved, and \"structural \n\nfeatures\" are the relationships among the relations and entities. \n\n\fEstimating Analogical Similarity by Dot-Products of Holographic Reduced Representations \n\n1111 \n\nIn any case, surface features appear to influence the likelihood of a reminding far more than \ndo structural features. Studies that have found an effect of structural similarity on reminding \nseem to indicate the effect only exists, or is greater, in the presence of surface similarity \n(Gentner and Forbus, 1991; Gentner, Rattermann and Forbus, 1993; Thagard et al., 1990). \n\n2.1 EXAMPLES OF ANALOGY BETWEEN NESTED STRUCTURES. \n\nTo test how well the HRR dot-product works as an estimate of analogical similarity between \nnested relational structures I used the following set of simple episodes (see Plate (1993) \nfor the full set). The memorized episodes are similar in different ways to the probe. These \nexamples are adapted from (Thagard et al., 1990). \n\nProbe: \nEpisodes in long-term memory: \nEl \nE2 \nE3 \nE6 \nE7 \n\n(L8) \n(ANcm) \n(AN) \n(88) \n(FA) \n\nSpot bit Jane, causing Jane to flee from Spot. \n\nFido bit John, causing John to flee from Fido. \nFred bit Rover, causing Rover to flee from Fred. \nFelix bit Mort, causing Mort to flee from Felix. \nJohn fled from Fido, causing Fido to bite John. \nMort bit Felix, causing Mort to flee from Felix. \n\nIn these episodes Jane, John, and Fred are people, Spot, Fido and Rover are dogs, Felix is \na cat, and Mort is a mouse. All of these are objects, represented by token vectors. Tokens \nof the same type are considered to be similar to each other, but not to tokens of other types. \nBite, flee, and cause are relations. The argument structure of the cause relation, and the \npatterns in which objects fill multiple roles constitutes the higher-order structure. \n\nThe second column classifies the relationship between each episode and the probe using \nGentner et aI's types of similarity: LS (Literal Similarity) shares relations, object features, \nand higher-order structure; AN (Analogy, also called True Analogy) shares relations and \nhigher-order structure, but not object features; SS (Surface Similarity, also called Mere \nAppearance) shares relations and object features, but not higher-order structure; FA (False \nAnalogy) shares relations only. ANcm denotes a cross-mapped analogy - it involves the \nsame types of objects as the probe, but the types of corresponding objects are swapped. \n\n2.2 MACIFAC PERFORMANCE ON TEST EXAMPLES \n\nThe first stage of MACIFAC (the \"Many Are Called\" stage) only inspects object features \nand relations. It uses a vector representation of surface features. Each location in the vector \ncorresponds to a surface feature of an object, relation or function, and the value in the \nlocation is the number of times the feature occurs in the structure. The first-stage estimate \nof the similarity between two structures is the dot-product of their feature-count vectors. A \nthreshold is used to select likely analogies. It would give El (L8), E2 (ANcm), and E6 \n(88) equal and highest scores, i.e., (L8, ANcm, 88) > (AN, FA) \nThe Structure Mapping Engine (SME) (Falkenhainer, Forbus and Gentner, 1989) is used \nas the second stage of MACIFAC (the \"Few Are Chosen\" stage). The rules of SME are \nthat mapped relations must match, all the arguments of mapped relations must be mapped \nconsistently, and mapping of objects must be one-to-one. SME would detect structural \ncorrespondences between each episode and the probe and give the literally similar and \nanalogous episodes the highest rankings, i.e., LS > AN > (SS, FA). \n\n\f1112 \n\nPlate \n\nA simplified view of the overall similarity scores from MAC and the full MACIFAC is \nshown in Table 1. There are four conditions - the two structures being compared can be \nsimilar in structure and/or in object attributes. In all four conditions, the structures are \nassumed to involve similar relations - only structural and object attribute similarities are \nvaried. Ideally, the responses to the mixed conditions should be flexible, and controlled by \nwhich aspects of similarity are currently considered important. Only the relative values of \nthe scores are important, the absolute values do not matter. \n\nStructural Object Attribute Similarity \nSimilarity \nYES \nNO \n\n(AN) Low \n(FA) Low \n\nNO \n\nYES \n\n(LS) High \n(SS) High \n\nStructural \nSimilarity \nYES \nNO \n\nObject Attribute Similarity \nYES \n\nNO \n\n(LS) \n(SS) :J:Med-Low \n\nHigh \n\n(AN) tMed-High \n(FA) \n\nLow \n\n(a) Scores from MAC. \n\n(b) Ideal similarity scores. \n\nTable 1: (a) Scores from the fast (MAC) similarity estimator in MACIFAC. (b) Scores from \nan ideal structure-sensitive similarity estimator, e.g., SME as used in MACIFAC. \n\nIn the remainder of this paper I describe how HRRs can be used to compute fast similarity \nestimates that are more like ratings in Table 1 b, i.e., estimates that are flexible and sensitive \nto structure. \n\n3 HOLOGRAPHIC REDUCED REPRESENTATIONS \n\nA distributed representation for nested relational structures requires a solution to the binding \nproblem. The representation of a relation such as bite (spot, jane) (\"Spot bit Jane.\") \nmust bind 'Spot' to the agent role and 'Jane' to the object role. In order to represent nested \nstructures it must also be possible to bind a relation to a role, e.g., bite (spot, jane) and \nthe antecedent role of the cause relation. \n\nn-l \n\nZi = 2.:= XkYj-k \n\nk=O \n\n(Subscript are modulo-n) \n\nZo = XoYo + X2Yl + XIY2 \nZl = XIYO + XOYI + X2Y2 \nZ2 = X2YO + XIYl + XOY2 \n\n(a) \n\n(b) \n\nFigure 2: (a) Circular convolution. (b) Circular convolution illustrated as a compressed \nouter product for n = 3. Each of the small circles represents an element of the outer product \nof x and Y, e.g., the middle bottom one is X2Yl. The elements of the circular convolution \nof x and yare the sums of the outer product elements along the wrapped diagonal lines. \n\nHolographic Reduced Representations (HRRs) (Plate, 1994) use circular convolution to \nsolve the binding problem. Circular convolution (Figure 2a) is an operation that maps two \nn-dimensional vectors onto one n-dimensional vector. It can be viewed as a compressed \nouter product, as shown in Figure 2b. Algebraically, circular convolution behaves like \nit is commutative, associative, and distributes over addition. Circular \nmultiplication -\n\n\fEstimating Analogical Similarity by Dot-rroducts of Holographic Reduced Representations \n\n1113 \n\nconvolution is similarity preserving: if ~ ~ ~' then ~ \u00ae b ~ ~' \u00ae b. Associations can be \ndecoded using a stable approximate inverse: ~ * \u00ae (~ \u00ae b) ~ b (provided that the vector \nelements are normally distributed with mean zero and variance lin). The approximate \ninverse is a permutation of vector elements: ar = an-i. The dot-product of two vectors, \na similarity measure, is: ~. b = L~:Ol aibi. High dimensional vectors (n in the low \nthousands) must be used to ensure reliable encoding and decoding. \nThe HRR for bi te (spot, jane) is: F = < bite + biteagt \u00ae spot + biteobj \u00ae jane>, \nwhere < . > is a normalization operation \u00ab ~ >= ~I V!! . ~). Multiple associations are \nsuperimposed in one vector and the representations for the objects (spot and jane) can \nalso be added into the HRR in order to make it similar to other HRRs involving Spot and \nJane. The HRR for a relation is the same size as the representation for an object and can be \nused as the filler for a role in another relation. \n\n4 EXPT. 1: HRR DOT-PRODUCT SIMILARITY ESTIMATES \n\nExperiment 1 illustrates the ways in which the dot-products of ordinary HRRs reflect, and \nfail to reflect, the similarity of the underlying structure of the episodes. \n\nBase vectors \n\nperson, dog, cat, mouse \nbite, flee, cause \nbiteagt, fleeagt, causeantc \nbiteobj, flee from, causecnsq \n\nToken vectors \n\njane =< person + idjane > spot =< dog + id spot > \njohn =< person + idjohn > fido =< dog + idfido > \nfred =< person + idfred > rover =< dog + idrover > \nmort =< mouse + idntort > felix =< cat + id felix> \n\nThe set of base and tokens vectors used in Experiments 1, 2 and 3 is shown above. All base \nand id vectors had elements independently chosen from a zero-mean normal distribution \nwith variance lin. The HRR for the probe is constructed as follows. and the HRRs for the \nother episodes are constructed in the same manner. \nPbite =< bite + biteagt \u00ae spot + biteobj \u00ae jane> \nP flee =< flee + fleeagt \u00ae jane + flee from \u00ae spot> \nP objects =< jane + spot> \nP =< cause + P objects + Pbite + P flee + causeantc \u00ae Pbite + causecnsq \u00ae P flee> \nExperiment 1 was run 100 times, each time with a new choice of random base vectors. The \nvector dimension was 2048. The means and standard deviations of the HRR dot-products \nof the probe and each episode are shown in Table 2. \n\nSpot bit Jane. causing Jane to flee from Spot. \n\nProbe: \nEpisodes in long-term memory: \nEl LS \nE2 ANCnt \nE3 AN \nE6 SS \nE7 FA \n\nFido bit John, causing John to flee from Fido. \nFred bit Rover, causing Rover to flee from Fred. \nFelix bit Mort, causing Mort to flee from Felix. \nJohn fled from Fido, causing Fido to bite John. \nMort bit Felix, causing Mort to flee from Felix. \n\nAvg \n0.70 \n0.47 \n0.39 \n0.47 \n0.39 \n\nTable 2: Results of Experiments 1,2 and 3. \n\nIn 94 out of 100 runs, the ranking of the HRR dot-products was consistent with \n\nLS > (ANcm, SS) > (FA, AN) \n\nDot-product with probe \nExptl \n\nExpt2 Expt3 \n\nSd \n\n0.016 \n0.022 \n0.024 \n0.018 \n0.024 \n\n0.81 \n0.63 \n0.69 \n0.47 \n0.39 \n0.61 \n0.44 0.53 \n0.39 \n0.39 \n\n\f1114 \n\nPlate \n\n(where the ordering within the parenthesis varies). The order violations are due to \"random\" \nfluctuations of dot-products, whose variance decreases as the vector dimension increases. \nWhen the experiment was rerun with vector dimension 4096 there was only one violation \nof this order out of 100 runs. \n\nThese results represent an improvement over the first stage of MACIFAC - the HRR dot(cid:173)\nproduct distinguishes between literal and surface similarity. However, when the episodes \ndo not share object attributes, the HRR dot-product is not affected by structural similarity \nand the scores do not distinguish analogy from false analogy or superficial similarity. \n\n5 EXPERIMENTS 2 AND 3: CONTEXTUALIZED HRRS \n\nDot-product comparisons ofHRRs are not sensitive to structural similarity in the absence of \nsimilar objects. This is because the way in which objects fill multiple roles is not expressed \nas a surface feature in HRRs. Consequently, the analogous episodes E2 (ANcm) and E3 \n(AN) do not receive higher scores than the non analogous episodes E6 (SS) and E7 (FA). \n\nWe can force role structure to become a surface feature by \"contextualizing\" the represen(cid:173)\ntations of fillers. Contextualization involves incorporating information about what other \nroles an object fills in the representation of a filler. This is like thinking of Spot (in the \nprobe) as an entity that bites (a biter) and an entity that is fled from (a \"fled-from\"). \n\nIn ordinary HRRs the filler alone is convolved with the role. In contextualized HRRs a blend \nof the filler and its context is convolved with the role. The representation for the context of \nobject in a role is the typical fillers of the other roles the object fills. The context for Spot in \nthe flee relation is represented by typ~~; and the context in the bite relation is represented \nby typ~~eo:n (where typ~~; = bite \u00ae bite~gt and typ~~:em = flee \u00ae fleejrom). The \ndegree of contextualization is governed by the mixing proportions \"\"0 (object) and \"\"c \n(context). The contextualized HRR for the probe is constructed as follows: \n\nPbite =< bite + biteagt \u00ae (X:ospot + X:ctyp~~:eTn) + biteobj \u00ae (X:ojane + X:ctyp!~~e) > \nP flee =< flee + fleeagt \u00ae (X:ojane + X:ctyp~tn + fleefroTn \u00ae (X:ospot + X:ctyp~~n > \nP objects =< jane + spot> \nP =< cause + P objects + P bite + P flee + causeantc \u00ae Pbite + causecnsq \u00ae P flee> \n\nA useful similarity estimator must be flexible and able to adjust salience of different aspects \nof similarity according to context or command. The degree to which role-alignment affects \nthe HRR dot-product can be adjusted by changing the degree of contextualization in just \none episode of a pair. Hence, the items in memory can be encoded with a fixed ,.., values \n(,..,-: and ,..,;;-) and the salience of role alignment can be changed by altering the degree of \ncontextualization in the probe (,..,~ and ,..,n. This is fortunate as it would be impractical to \nrecode all items in memory in order to alter the salience of role alignment in a particular \ncomparison. The same technique can be used to adjust the importance of other features. \n\nTwo experiments were performed with contextualized HRRs, with the same episodes as \nused in Experiment 1. In Experiment 2 the probe was non-contextualized (,..,~ = 1, ,..,~ = 0), \nand in Experiment 3 the probe was contextualized (,..,~ = 1/~,,..,~ = 1/~). For both \nExperiments 2 and 3 the episodes in memory were encoded with the same degree of \ncontextualization (,..,-: = 1/~,,..,;;- = 1/ ~). As before, each set of comparisons was run \n100 times, and the vector dimension was 2048. The results are shown in Table 2. \n\n\fEstimating Analogical Similarity by Dot-Products of Holographic Reduced Representations \n\n1115 \n\nThe scores in Experiment 2 (non-contextualized probe) were consistent (in 95 out of 100 \nruns) with the same order as given for Experiment 1: \n\nL8 > (AN cm ,88) > (FA, AN) \n\nThe scores in Experiment 3 (contextualized probe) were consistent (in all 100 runs) with an \nordering that ranks analogous episodes as strictly more similar than non-analogous ones: \n\nL8 > AN cm > AN > 88 > FA \n\n6 DISCUSSION \n\nThe dot-product of HRRs provides a fast estimate of the degree of analogical match and \nis sensitive to various structural aspects of the match. It is not intended to be a model \nof complex or creative analogy making, but it could be a useful first stage in a model of \nanalogical reminding. \n\nStructural Object Attribute Similarity \nSimilarity \nYES \nNO \n\n(AN) Low \n(FA) Low \n\nNO \n\nYES \n\n(LS) High \n(SS) Med \n\nStructural \nSimilarity \nYES \nNO \n\nObject Attribute Similarity \nYES \n\nNO \n\n(LS) \n(SS) tMed-Low \n\nHigh \n\n(AN) tMed-High \n(FA) \n\nLow \n\n(a) Ordinary-HRR dot-products. \n\n(b) Contextualized-HRR dot-products. \n\nTable 3: Similarity scores from ordinary and contextualized HRR dot-product comparisons. \nThe flexibility comes adjusting the weights of various components in the probe. \n\nThe dot-product of ordinary HRRs is sensitive to some aspects of structural similarity. It \nimproves on the existing fast similarity matcher in MACIFAC in that it discriminates the \nfirst column of Table 3 - it ranks literally similar (LS) episodes higher than superficially \nsimilar (88) episodes. However, it is insensitive to structural similarity when corresponding \nobjects are not similar. Consequently, it ranks both analogies (AN) and false analogies (FA) \nlower than superficially similar (S8) episodes. \n\nThe dot-product of contextualized HRRs is sensitive to structural similarity even when \ncorresponding objects are not similar. It ranks the given examples in the same order as \nwould the full MACIFAC or ARCS system. \n\nContextualization does not cause all relational structure to be expressed as surface features \nin the HRR vector. It only suffices to distinguish analogous from non-analogous structures \nwhen no two entities fill the same set of roles. Sometimes, the distinguishing context for \nan object is more than the other roles that the object fills. Consider the situation where \ntwo boys are bitten by two dogs, and each flees from the dog that did not bite him. With \ncontextualization as described above it is impossible to distinguish this from the situation \nwhere each boy flees from the dog that did bite him. \n\nHRR dot-products are flexible - the salience of various aspects of similarity can be adjusted \nby changing the weights of various components in the probe. This is true for both ordinary \nand contextualized HRRs. \n\nHRRs retain many of the advantages of ordinary distributed representations: (a) There is a \nsimple and computationally efficient measure of similarity between two representations -\n\n\f1116 \n\nPlate \n\nthe vector dot-product. Similar items can be represented by similar vectors. (b) Items are \nrepresented in a continuous space. ( c) Information is distributed and redundant. \n\nHummel and Biederman (1992) discussed the binding problem and identified two main \nproblems faced by conjunctive coding approaches such as Tensor Products (Smolensky, \n1990). These are exponential growth of the size of the representation with the number \nof associated objects (or attributes), and insensitivity to attribute structure. HRRs have \nmuch in common with conjunctive coding approaches (they can be viewed as a compressed \nconjunctive code), but do not suffer from these problems. The size of HRRs remains \nconstant with increasing numbers of associated objects, and sensitivity to attribute structure \nhas been demonstrated in this paper. \n\nThe HRR dot-product is not without its drawbacks. Firstly, examples for which it will \nproduce counter-intuitive rankings can be constructed. Secondly, the scaling with the size \nof episodes could be a problem - the sum of structural-feature matches becomes a less \nappropriate measure of similarity as the episodes get larger. A possible solution to this \nproblem is to construct a spreading activation network of HRRs in which each episode is \nrepresented as a number of chunks, and each chunk is represented by a node in the network. \n\nThe software used for the HRR calculations is available from the author. \n\nReferences \n\nFalkenhainer, B., Forbus, K. D., and Gentner, D. (1989). The Structure-Mapping Engine: Algorithm \nand examples. Artificial Intelligence, 41: 1-63. \nGentner, D. and Forbus, K. D. (1991). MAC/FAC: A model of similarity-based retrieval. In Proceed(cid:173)\nings of the Thirteenth Annual Cognitive Science Society Conference, pages 504-509, Hillsdale, NJ. \nErlbaum. \nGentner, D. and Markman, A. B. (1992). Analogy - Watershed or Waterloo? Structural alignment \nand the development of connectionist models of analogy. In Giles, C. L., Hanson, S. J., and Cowan, \nJ. D., editors, Advances in Neural Information Processing Systems 5 (NIPS*92), pages 855-862, San \nMateo, CA. Morgan Kaufmann. \nGentner, D., Rattermann, M. J., and Forbus, K. D. (1993). The roles of similarity in transfer: \nSeparating retrievability from inferential soundness. Cognitive Psychology, 25:431-467. \nHummel, J. E. and Biederman, I. (1992). Dynamic binding in a neural network for shape recognition. \nPsychological Review, 99(3):480-517. \nMarkman, A. B., Gentner, D., and Wisniewski, E. J. (1993). Comparison and cognition: Implications \nof structure-sensitive processing for connectionist models. Unpublished manuscript. \nPlate, T. A. (1991). Holographic Reduced Representations: Convolution algebra for compositional \ndistributed representations. In Mylopoulos, J. and Reiter, R., editors, Proceedings of the 12th Interna(cid:173)\ntional loint Conference on Artificial Intelligence, pages 30-35, San Mateo, CA. Morgan Kaufmann. \nPlate, T. A. (1993). Estimating analogical similarity by vector dot-products of Holographic Reduced \nRepresentations. Unpublished manuscript. \nPlate, T. A. (1994). Holographic reduced representations. IEEE Transactions on Neural Networks. \nTo appear. \nSmolensky, P. (1990) . Tensor product variable binding and the representation of symbolic structures \nin connectionist systems. Artificial Intelligence, 46(1-2):159-216. \nThagard, P., Holyoak, K. J., Nelson, G., and Gochfeld, D. (1990). Analog Retrieval by Constraint \nSatisfaction. Artificial Intelligence, 46:259-310. \nWharton, C. M., Holyoak, K. J., Downing, P. E., Lange, T. E., Wickens, T. D., and Melz, E. R. \n(1994). Below the surface: Analogical similarity and retrieval competition in reminding. Cognitive \nPsychology. To appear. \n\n\f", "award": [], "sourceid": 740, "authors": [{"given_name": "Tony", "family_name": "Plate", "institution": null}]}