Constructing Distributed Representations Using Additive Clustering

Part of Advances in Neural Information Processing Systems 14 (NIPS 2001)

Bibtex Metadata Paper


Wheeler Ruml


If the promise of computational modeling is to be fully realized in higher- level cognitive domains such as language processing, principled methods must be developed to construct the semantic representations used in such models. In this paper, we propose the use of an established formalism from mathematical psychology, additive clustering, as a means of auto- matically constructing binary representations for objects using only pair- wise similarity data. However, existing methods for the unsupervised learning of additive clustering models do not scale well to large prob- lems. We present a new algorithm for additive clustering, based on a novel heuristic technique for combinatorial optimization. The algorithm is simpler than previous formulations and makes fewer independence as- sumptions. Extensive empirical tests on both human and synthetic data suggest that it is more effective than previous methods and that it also scales better to larger problems. By making additive clustering practical, we take a significant step toward scaling connectionist models beyond hand-coded examples. 1 Introduction

Many cognitive models posit mental representations based on discrete substructures. Even connectionist models whose processing involves manipulation of real-valued activations typically represent objects as patterns of 0s and 1s across a set of units (Noelle, Cottrell, and Wilms, 1997). Often, individual units are taken to represent specific features of the objects and two representations will share features to the degree to which the two objects are similar. While this arrangement is intuitively appealing, it can be difficult to construct the features to be used in such a model. Using random feature assignments clouds the relationship between the model and the objects it is intended to represent, diminishing the model's value. As Clouse and Cottrell (1996) point out, hand-crafted representations are tedious to construct and it can be difficult to precisely justify (or even articulate) the principles that guided their design. These difficulties effectively limit the number of objects that can be encoded, constraining modeling efforts to small examples. In this paper, we investigate methods for automatically synthesizing feature-based representations directly from the pairwise object similarities that the model is intended to respect. This automatic

Table 1: An 8-feature model derived from consonant confusability data. With c = 0.024,

the model accounts for 91.8% of the variance in the data. Wt. Objects with feature Interpretation .350 f# front unvoiced fricatives .243 dg back voiced stops .197 p k unvoiced stops (without t) .182 b v# front voiced .162 ptk unvoiced stops .127 mn nasals .075 dgv#z z voiced (without b) .049 ptkf#s s unvoiced approach eliminates the manual burden of selecting and assigning features while providing an explicit design criterion that objectively connects the representations to empirical data. After formalizing the problem, we will review existing algorithms that have been proposed for solving it. We will then investigate a new approach, based on combinatorial optimiza- tion. When using a novel heuristic search technique, we find that the new approach, despite its simplicity, performs better than previous algorithms and that, perhaps more important, it maintains its effectiveness on large problems. 1.1 Additive Clustering

We will formalize the problem of constructing discrete features from similarity information using the additive clustering model of Shepard and Arabie (1979). In this framework, abbreviated ADCLUS, clusters represent arbitrarily overlapping discrete features. Each of the k features has a non-negative real-valued weight w k , and the similarity between two objects i and j is just the sum of the weights of the features they share. If f ik is 1 if object

i has feature k and 0 otherwise, and c is a real-valued constant, then the similarity of i and