{"title": "Implications of Recursive Distributed Representations", "book": "Advances in Neural Information Processing Systems", "page_first": 527, "page_last": 536, "abstract": null, "full_text": "RECURSIVE DISTRIBUTED REPRESENTATIONS \n\nIMPLICATIONS OF \n\n527 \n\nJordan B. Pollack \n\nLaboratory for A I Research \n\nOhio State University \nColumbus, OH -'3210 \n\nABSTRACT \n\nI will describe my recent results on the automatic development of fixed(cid:173)\nwidth recursive distributed representations of variable-sized hierarchal data \nstructures. One implication of this wolk is that certain types of AI-style \ndata-structures can now be represented in fixed-width analog vectors. Simple \ninferences can be perfonned using the type of pattern associations that \nneural networks excel at Another implication arises from noting that these \nrepresentations become self-similar in the limit Once this door to chaos is \nopened. many interesting new questions about the representational basis of \nintelligence emerge, and can (and will) be discussed. \n\nINTRODUCTION \n\nA major problem for any cognitive system is the capacity for, and the induction of the \npotentially infinite structures implicated in faculties such as human language and \nmemory. \nClassical cognitive architectures handle this problem through finite but recursive sets of \nrules, such as fonnal grammars (Chomsky, 1957). Connectionist architectures, while \nyielding intriguing insights into fault-tolerance and machine leaming, have, thus far, not \nhandled such productive systems in an adequate fashion. \nSo, it is not surprising that one of the main attacks on connectionism, especially on its \napplication to language processing models, has been on the adequacy of such systems to \ndeal with apparently rule-based behaviors (Pinker & Prince, 1988) and systematicity \n(Fodor & Pylyshyn, 1988). \nI had earlier discussed precisely these challenges for connectionism, calling them the \ngenerative capacity problem for language, and the representational adequacy problem for \ndata structures (Pollack, 1987b). These problems are actually intimately related, as the \ncapacity to recognize or generate novel language relies on the ability to represent the \nunderlying concept. \nRecently, I have developed an approach to the representation problem, at least for recur(cid:173)\nsive structures like sequences and trees. Recursive auto-associative memory (RAAM) \n(Pollack, 1988a). automatically develops recursive distributed representations of finite \ntraining sets of such structures, using Back-Propagation (Rumelhart et al., 1986). These \nrepresentations appear to occupy a novel position in the space of both classical and con(cid:173)\nnectionist symbolic representations. \nA fixed-width representation of variable-sized symbolic trees leads immediately to the \nimplication that simple fonns of neural-netwolk associative memories may be able to \nperfonn inferences of a type that are thought to require complex machinery such as vari(cid:173)\nable binding and unification. \nBut when we take seriously the infinite part of the representational adequacy problem, we \nare lead into a strange intellectual area, to which the second part of this paper is \naddressed. \n\n\f528 \n\nPollack \n\nBACKGROUND \n\nRECURSIVE AUTO-ASSOCIATIVE MEMORY \n\nA RAAM is composed of two mechanisms: a compressor, and a reconstructor, which are \nsimultaneously trained. The job of the compressor is to encode a small set of fixed-width \npatterns into a single pattern of the same width. This compression can be recursively \napplied, from the bottom up, to a fixed-valence tree with distinguished labeled terminals \n(leaves), resulting in a fixed-width pattern representing the entire structure. The job of \nthe reconstructor is to accurately decode this pattern into its parts, and then to further \ndecode the parts as necessary, until the tenninal patterns are found, resulting in a recon(cid:173)\nstruction of the original tree. \nFor binary trees with k-bit binary patterns as the leaves, the compressor could be a \nsingle-layer feedforward network with 2k inputs and k outputs, along with additional con(cid:173)\ntrol machinery. The reconstructor could be a single-layer feedforward-network with k \ninputs and 2k outputs, along with a mechanism for testing whether a pattern is a tenninal. \nWe simultaneously train these two networks in an auto-associative framework as follows. \nConsider the tree, \u00ab0 (A N\u00bb(Y (P (0 N\u00bb), as one member of a training set of such trees, \nwhere the lexical categories are pre-encoded as k-bit vectors. If the 2k-k-2k network is \nsuccessfully trained (defined below) with the following patterns (among other such pat(cid:173)\nterns in the training environment), the resultant compressor and reconstructor can reliably \nfonn representations for these binary trees. \n\ninput pattern \n\nhidden pattern \n\noutput pattern \n\n~ RAN(t) \n~ RDAN(t) \n~ RDN(t) \n~ RpDN(t) \n~ RVPDN(t) \n~ RDANVPDN