{"title": "Neural Representation of Multi-Dimensional Stimuli", "book": "Advances in Neural Information Processing Systems", "page_first": 115, "page_last": 121, "abstract": null, "full_text": "Effects of Spatial and Temporal Contiguity on \n\nthe Acquisition of Spatial Information \n\nThea B. Ghiselli-Crippa and Paul W. Munro \n\nDepartment of Information Science and Telecommunications \n\nUniversity of Pittsburgh \nPittsburgh, PA 15260 \n\ntbgst@sis.pitt.edu, munro@sis.pitt.edu \n\nAbstract \n\nSpatial information comes in two forms: direct spatial information (for \nexample, retinal position) and indirect temporal contiguity information, \nsince objects encountered sequentially are in general spatially close. The \nacquisition of spatial information by a neural network is investigated \nhere. Given a spatial layout of several objects, networks are trained on a \nprediction task. Networks using temporal sequences with no direct spa(cid:173)\ntial information are found to develop internal representations that show \ndistances correlated with distances in the external layout. The influence \nof spatial information is analyzed by providing direct spatial information \nto the system during training that is either consistent with the layout or \ninconsistent with it. This approach allows examination of the relative \ncontributions of spatial and temporal contiguity. \n\n1 \n\nIntroduction \n\nSpatial information is acquired by a process of exploration that is fundamentally tempo(cid:173)\nral, whether it be on a small scale, such as scanning a picture, or on a larger one, such as \nphysically navigating through a building, a neighborhood, or a city. Continuous scanning \nof an environment causes locations that are spatially close to have a tendency to occur in \ntemporal proximity to one another. Thus, a temporal associative mechanism (such as a \nHebb rule) can be used in conjunction with continuous exploration to capture the spatial \nstructure of the environment [1]. However, the actual process of building a cognitive map \nneed not rely solely on temporal associations, since some spatial information is encoded in \nthe sensory array (position on the retina and proprioceptive feedback). Laboratory studies \nshow different types of interaction between the relative contributions of temporal and spa(cid:173)\ntial contiguities to the formation of an internal representation of space. While Clayton and \nHabibi's [2] series of recognition priming experiments indicates that priming is controlled \nonly by temporal associations, in the work of McNamara et al. [3] priming in recogni(cid:173)\ntion is observed only when space and time are both contiguous. In addition, Curiel and \nRadvansky's [4] work shows that the effects of spatial and temporal contiguity depend on \nwhether location or identity information is emphasized during learning. Moreover, other \nexperiments ([3]) also show how the effects clearly depend on the task and can be quite \ndifferent if an explicitly spatial task is used (e.g., additive effects in location judgments). \n\n\f18 \n\nT. B. Ghiselli-Crippa and P W. Munro \n\nlabels \n\nlabels \n\nlabels \n\n(A coeff.) \n\ncoordinates \n(B coeff.) \n\nlabels \n\nlabels \n\ncoordinates \n\nlabels \n\nFigure 1: Network architectures: temporal-only network (left); spatio-temporal network \nwith spatial units part of the input representation (center); spatio-temporal network with \nspatial units part of the output representation (right). \n\n2 Network architectures \n\nThe goal of the work presented in this paper is to study the structure of the internal rep(cid:173)\nresentations that emerge from the integration of temporal and spatial associations. An \nencoder-like network architecture is used (see Figure 1), with a set of N input units and a \nset of N output units representing N nodes on a 2-dimensional graph. A set of H units is \nused for the hidden layer. To include space in the learning process, additional spatial units \nare included in the network architecture. These units provide a representation of the spatial \ninformation directly available during the learning/scanning process. In the simulations de(cid:173)\nscribed in this paper, two units are used and are chosen to represent the (x, y) coordinates of \nthe nodes in the graph. The spatial units can be included as part of the input representation \nor as part of the output representation (see Figure 1, center and right panels): both choices \nare used in the experiments, to investigate whether the spatial information could better ben(cid:173)\nefit training as an input or as an output [5]. In the second case, the relative contribution of \nthe spatial information can be directly manipulated by introducing weighting factors in the \ncost function being minimized. A two-term cost function is used, with a cross-entropy term \nfor the N label units and a squared error term for the 2 coordinate units, \n\nri indicates the actual output of unit i and ti its desired output. The relative influence of \nthe spatial information is controlled by the coefficients A and B. \n\n3 Learning tasks \n\nThe left panel of Figure 2 shows an example of the type of layout used; the effective \nlayout used in the study consists of N = 28 nodes. For each node, a set of neighboring \nnodes is defined, chosen on the basis of how an observer might scan the layout to learn the \nnode labels and their (spatial) relationships; in Figure 2, the neighborhood relationships are \nrepresented by lines connecting neighboring nodes. From any node in the layout, the only \nallowed transitions are those to a neighbor, thus defining the set of node pairs used to train \nthe network (66 pairs out of C(28, 2) = 378 possible pairs). In addition, the probability \nof occurrence of a particular transition is computed as a function of the distance to the \ncorresponding neighbor. It is then possible to generate a sequence of visits to the network \nnodes, aimed at replicating the scanning process of a human observer studying the layout. \n\n\fSpatiotemporal Contiguity Effects on Spatial Information Acquisition \n\nknife \n\ncoin \n\n19 \n\ncup \n\neraser \n\neraser \n\nbutton \n\nFigure 2: Example of a layout (left) and its permuted version (right). Links represent \nallowed transitions. A larger layout of 28 units was used in the simulations. \n\nThe basic learning task is similar to the grammar learning task of Servan-Schreiber et al. \n[6] and to the neighborhood mapping task described in [1] and is used to associate each of \nthe N nodes on the graph and its (x, y) coordinates with the probability distribution of the \ntransitions to its neighboring nodes. The mapping can be learned directly, by associating \neach node with the probability distribution of the transitions to all its neighbors: in this \ncase, batch learning is used as the method of choice for learning the mapping. On the \nother hand, the mapping can be learned indirectly, by associating each node with itself \nand one of its neighbors, with online learning being the method of choice in this case; \nthe neighbor chosen at each iteration is defined by the sequence of visits generated on \nthe basis of the transition probabilities. Batch learning was chosen because it generally \nconverges more smoothly and more quickly than online learning and gives qualitatively \nsimilar results. While the task and network architecture described in [1] allowed only \nfor temporal association learning, in this study both temporal and spatial associations are \nlearned simultaneously, thanks to the presence of the spatial units. However, the temporal(cid:173)\nonly (T-only) case, which has no spatial units, is included in the simulations performed \nfor this study, to provide a benchmark for the evaluation of the results obtained with the \nspatio-temporal (S-T) networks. \n\nThe task described above allows the network to learn neighborhood relationships for which \nspatial and temporal associations provide consistent information, that is, nodes experienced \ncontiguously in time (as defined by the sequence) are also contiguous in space (being spa(cid:173)\ntial neighbors). To tease apart the relative contributions of space and time, the task is kept \nthe same, but the data employed for training the network is modified: the same layout is \nused to generate the temporal sequence, but the x , y coordinates of the nodes are randomly \npermuted (see right panel of Figure 2). If the permuted layout is then scanned following the \nsame sequence of node visits used in the original version, the net effect is that the temporal \nassociations remain the same, but the spatial associations change so that temporally neigh(cid:173)\nboring nodes can now be spatially close or distant: the spatial associations are no longer \nconsistent with the temporal associations. As Figure 4 illustrates, the training pairs (filled \ncircles) all correspond to short distances in the original layout, but can have a distance \nanywhere in the allowable range in the permuted layout. Since the temporal and spatial \ndistances were consistent in the original layout, the original spatial distance can be used \nas an indicator of temporal distance and Figure 4 can be interpreted as a plot of temporal \ndistance vs. spatial distance for the permuted layout. \n\nThe simulations described in the following include three experimental conditions: temporal \nonly (no direct spatial information available); space and time consistent (the spatial coor(cid:173)\ndinates and the temporal sequence are from the same layout); space and time inconsistent \n(the spatial coordinates and the temporal sequence are from different layouts). \n\n\f20 \n\nT. B. Ghise/li-Crippa and P. W. Munro \n\nHidden unit representations are compared using Euclidean distance (cosine and inner prod(cid:173)\nuct measures give consistent results); the internal representation distances are also used to \ncompute their correlation with Euclidean distances between nodes in the layout (original \nand permuted). The correlations increase with the number of hidden units for values of \nH between 5 and 10 and then gradually taper off for values greater than 10. The results \npresented in the remainder of the paper all pertain to networks trained with H = 20 and \nwith hidden units using a tanh transfer function; all the results pertaining to S-T networks \nrefer to networks with 2 spatial output units and cost function coefficients A = 0.625 and \nB = 6.25. \n\n4 Results \n\nFigure 3 provides a combined view of the results from all three experiments. The left panel \nillustrates the evolution of the correlation between internal representation distances and \nlayout (original and permuted) distances. The right panel shows the distributions of the \ncorrelations at the end of training (1000 epochs). The first general result is that, when spa(cid:173)\ntial information is available and consistent with the temporal information (original layout), \nthe correlation between hidden unit distances and layout distances is consistently better \nthan the correlation obtained in the case of temporal associations alone. The second gen(cid:173)\neral result is that, when spatial information is available but not consistent with the temporal \ninformation (permuted layout), the correlation between hidden unit distances and original \nlayout distances (which represent temporal distances) is similar to that obtained in the case \nof temporal associations alone, except for the initial transient. When the correlation is com(cid:173)\nputed with respect to the permuted layout distances, its value peaks early during training \nand then decreases rapidly, to reach an asymptotic value well below the other three cases. \nThis behavior is illustrated in the box plots in the right panel of Figure 3, which report the \ndistribution of correlation values at the end of training. \n\n4.1 Temporal-only vs. spatio-temporal \n\nAs a first step in this study, the effects of adding spatial information to the basic temporal \nassociations used to train the network can be examined. Since the learning task is the same \nfor both the T-only and the S-T networks except for the absence or presence of spatial \ninformation during training, the differences observed can be attributed to the additional \nspatial information available to the S-T networks. The higher correlation between internal \nrepresentation distances and original layout distances obtained when spatial information is \n\n0 \n\n-\n., \n\n0 \n\n.. \n... \u2022 8 \" \n\n8 0 \nii \n\n0 \n\n'\" \n\nci \n\n0 \n0 \n\nS and T CO\"Isistent \n\nT-o\" \n\nSand T InCOnsistent \n(corr with T distance) \n\nS and T Ir'ICOOSlStent \n(corr. Wflh S distance) \n\n~ \n\n., \n\n0 \n\n.. \n\n0 \n\n\" \n\n0 \n\nN \n0 \n\n0 \n0 \n\ni:i \n\n-==-\n~ ~ \n-\n\n=s: \n\n........... \nE:2 \n--'----' \n\n200 \n\n400 \n600 \nOllnber 01 epochs \n\n800 \n\n1000 \n\nSandT \ncon_atent \n\nT-only \n\nSandT \n\nInconsistent \n\n(corr \" th T ast ) (corr wth 5 dst ) \n\nSandT \n\nineon.stant \n\nFigure 3: Evolution of correlation during training (0 - 1000 epochs) (left). Distributions of \ncorrelations at the end of training (1000 epochs) (right). \n\n\fSpatiotemporal Contiguity Effects on Spatial Information Acquisition \n\n21 \n\nN -\n0 -\n., \n\n0 \n\n\", \n'\" E 0 \n~ \n\n... \n\n0 \n\nN \n0 \n\n0 \n0 \n\ndHU = 0.6 + 3.4d T + 0.3ds - 2.1(dT)2 + 0.4(d S )2 - 0.4d T ds \n\n2 5 \n\n15 \n\n05 \n\n15 \n\n00 \n\n02 \n\n04 \n\n08 \n\n1 0 \n\n12 \n\n14 \n\n\" \n\nFigure 4: Distances in the original layout \n(x) vs_ distances in the permuted layout \n(y)_ The 66 training pairs are identified by \nfilled circles_ \n\nFigure 5: Similarities (Euclidean distances) \nbetween internal representations developed \nby a S-T network (after 300 epochs)_ Figure \n4 projects the data points onto the x, y plane_ \n\navailable (see Figure 3) is apparent also when the evolution of the internal representations \nis examined_ As Figure 6 illustrates, the presence of spatial information results in better \ngeneralization for the pattern pairs outside the training set While the distances between \ntraining pairs are mapped to similar distances in hidden unit space for both the T-only and \nthe S-T networks, the T-only network tends to cluster the non-training pairs into a narrow \nband of distances in hidden unit space. In the case of the S-T network instead, the hidden \nunit distances between non-training pairs are spread out over a wider range and tend to \nreflect the original layout distances. \n\n4.2 Permuted layout \n\nAs described above, with the permuted layout it is possible to decouple the spatial and \ntemporal contributions and therefore study the effects of each. A comprehensive view of \nthe results at a particular point during training (300 epochs) is presented in Figure 5, where \nthe x, y plane represents temporal distance vs. spatial distance (see also Figure 4) and the z \naxis represents the similarity between hidden unit representations. The figure also includes \na quadratic regression surface fitted to the data points. The coefficients in the equation of \nthe surface provide a quantitative measure of the relative contributions of spatial (ds) and \ntemporal distances (dT ) to the similarity between hidden unit representations (dHU ): \n\n(2) \n\nIn general, after the transient observed in early training (see Figure 3), the largest and most \nsignificant coefficients are found for dT and (dT?, indicating a stronger dependence of \ndHU on temporal distance than on spatial distance. \n\nThe results illustrated in Figure 5 represent the situation at a particular point during training \n(300 epochs). Similar plots can be generated for different points during training, to study \nthe evolution of the internal representations. A different view of the evolution process is \nprovided by Figure 7, in which the data points are projected onto the x,Z plane (top panel) \nand the y,z plane (bottom panel) at four different times during training. In the top panel, \n\n\f22 \n\nN ,.. \n\n~ \n\n0 \n\n_ \n\n\u2022 \n\n~, \n\n~ ~ \n~ ~ \n~ -... -\n\n00 02 \" 06 O. \" 12 \n\n\"_d \n\n.. \n\n, \n\n~ \n\n:; ~ ~' ;; ~ \n~, -\n~ \n~ \n: \ni \n~ ~ \n~ .~ \n~ \n\n~ \n\n~ \n\n::: \n~ \n\n00 \n\n' \n\n::: \n\n, \n\n0 \n\n_ \n\n\u2022 \n\n0 \n\nN \n\n~ \n\n~ \n\n, \n\n::: \n~ \n\n, \n, \n\n. \n00 02 .. 06 .. \" 12 \n\nT. B. Ghiselli-Crippa and P W Munro \n\n~ ,.. ~ \n~ roo ~ ~ ~ \n\n~ ~. ~ .~. \n00 02 .. .. .. \" 12 \n00 02 .. 06 .. \" \" \n::: ~ \n\n.. . \n\nf/Po \n\n,.~,o 0 \n\n.' \n\n~ : \n~ ~ \n\n~ \n\n~ \n~ \n~ ~ \n\n, \n, \n.I' \n\n. \n\n~ \n~ \n\n.:. \n\" \n\n',' \n\n: s \n\ne , \n\n',~-, \n\n',' \n\n, \n\n
(t (Xi ~~r) )2) =, F\u00a2 ( e( ')2) , \n\n(2) \n\nwhere e(k) = (c~k), ... , c};\u00bb) is the center of the tuning curve of neuron k, O'~k) is its \ntuning width in the i-th dimension, dk)2 := (Xi -\nc~k\u00bb)2/O'ik)2 for i = 1, ... ,D, and \n~(k)2 := ~~k)2 + ... + ~~)2. F > 0 denotes the maximal firing rate of the neurons, which \nrequires that maxz~o fj>(z) = 1. \n\nWe assume that the tuning widths O't), . .. ,O'~) of each neuron k are drawn from a distri(cid:173)\nbution PO' (0'1, ... ,O'D). For a population oftuning functions with centers e(l), ... , e(N), a \ndensity 1}(x) is introduced according to 1}(x) := L:~=l 8(x - e(k\u00bb). \nThe encoding accuracy can be quantified by the Fisher information matrix, J, which is \ndefined as \n\n(3) \n\nwhere E[ ... J denotes the expectation value over the probability distribution P(n; x) [2]. \nThe Fisher information yields a lower bound on the expected error of an unbiased estimator \nthat retrieves the stimulus x from the noisy neural activity (Cramer-Rao inequality) [2]. The \nminimal estimation error for the i-th feature Xi, ti,min, is given by t;,min = (J- 1 )ii which \nreduces to t;,min = 1/ Jii(X) if J is diagonal. \nWe shall now derive a general expression for the popUlation Fisher information. In the \nnext chapter, several cases and their consequences for neural encoding strategies will be \ndiscussed. \nFor model neuron (k), the Fisher information (3) reduces to \n\n(k) \n\n(k) _ \nJij (X'O'I \"\"'O'D) -\n\n(k) \n\n. \n\n1 \n(k) \n\n( (k)2 \n\n(k)Aq.. ~ \n\n) \n\n(k) (k) \n,F,T ~i ~j , \n\n(4) \n\nO'i O'j \n\n\fNeural Representation of Multi-Dimensional Stimuli \n\n117 \n\nwhere the dependence on the tuning widths is indicated by the list of arguments. The \nfunction A.p depends on the shape of the tuning function and is given in [13]. The in(cid:173)\ndependence assumption (1) implies that the population Fisher information is the sum of \nh \nt e contn utlOns 0 \n. ne now define \na population Fisher information which is averaged over the distribution of tuning widths \nPt:T(0\"1, . .. ,O\"D): \n\nt e III IVI ua neurons, L.Jk=1 \n\n. d\u00b7\u00b7d I \n\n, ... ,0\" D \n\n(k)) U7 \n\n\",N J(k)( \n\nij x; 0\"1 \n\n(k) \n\n\u00b7b\u00b7 \n\nf h \n\n(Jij (x)) 17 = L / d0\"1 . .. dO\"D Pt:T(0\"1,\u00b7 .. , O\"D) Ji~k) (x; 0\"1, \u00b7 .. , O\"D) . \n\nN \n\n(5) \n\nk= 1 \n\nIntroducing the density of tuning curves, 1J(x), into (5) and assuming a constant distri(cid:173)\nbution, 1J(x) == 1J == const., one obtains the result that the population Fisher information \nbecomes independentofx and that the off-diagonal elements of J vanish [13]. The average \npopulation Fisher information then becomes \n\n(Jij)t:T = 1J K.p F, r, D \n\nD \n\n( \n\n) / flt:l 0\"1) \n\\ \n\n0\"; \n\n~ \n\n17 Vij, \n\n(6) \n\nwhere K.p depends on the geometry of the tuning curves and is defined in [13]. \n\n3 Results \n\nIn this section, we consider different distributions of tuning widths in (6) and discuss ad(cid:173)\nvantageous and disadvantageous strategies for obtaining a high representational accuracy \nin the neural population. \n\nRadially symmetric tuning curves. For radially symmetric tuning curves of width a, \nthe tuning-width distribution reads \n\nPt:T(O\"l, .. . ,O\"D) = II O(O\"i -a); \n\nD \n\ni=l \n\nsee Fig. 1 a for a schematic visualization of the arrangement of the tuning widths for the \ncase D = 2. The average population Fisher information (6) for i = j becomes \n\n(Jii)t:T = 1JDK.p(F, r, D) aD - 2 , \n\n(7) \n\na result already obtained by Zhang and Sejnowski [13]. Equation (7) basically shows that \nthe minimal estimation error increases with a for D = 1, that it does not depend on a for \nD = 2, and that it decreases as a increases for D 2: 3. We shall discuss the relevance of \nthis case below. \n\nIdentical tuning curves without radial symmetry. Next we discuss tuning curves which \nare identical but not radially symmetric; the tuning-width distribution for this case is \n\nPt:T(0\"1, . .. ,O\"D) = II O(O\"i - ad, \n\nD \n\ni=l \n\nwhere ai denotes the fixed width in dimension i. For i = j, the average population Fisher \ninformation (6) reduces to [11,4] \n\n) \n(Jii)t:T = 1JDK.p F, r, D \n\n( \n\n. \n\n(8) \n\nflD -\n\n1=1 0\"1 \n-2 \nO\"i \n\n\f118 \n\nc. W. Eurich, S. D. Wilke and H. Schwegler \n\n(a) \n\n(b) \n\n/ \n\n(c) \n\nb \n\n_ \n\nb\n2\n\n. \n\n(d) \n\n. \n\n. \n\nFigure 1: Visualization of different distributions of \ntuning widths for D = 2. (a) Radially symmetric tun(cid:173)\ning curves. The dot indicates a fixed (j, while the diag(cid:173)\nonalline symbolizes a variation in (j discussed in [13]. \n(b) Identical tuning curves which are not radially sym(cid:173)\nmetric. (c) Tuning widths uniformly distributed within \n(d) Two sUbpopulations each of \na small rectangle. \nwhich is narrowly tuned in one dimension and broadly \ntuned in the other direction. \n\nEquation (8) contains (7) as a special case. From (8) it becomes immediately clear that the \nexpected minimal square encoding error for the i-th stimulus feature, \u20ac~ min = 1/ (Jii(X))u, \ndepends on i, i. e., the population specializes in certain features. The error obtained in \ndimension i thereby depends on the tuning widths in all dimensions. \n\nWhich encoding strategy is optimal for a population whose task it is to encode a single \nfeature, say feature i, with high accuracy while not caring about the other dimensions? In \norder to answer this question, we re-write (8) in terms of receptive field overlap. \n\nFor the tuning functions f(k) (x) encountered empirically, large values ofthe single-neuron \nFisher information (4) are typically restricted to a region around the center of the tuning \nfunction, c(k). The fraction p({3) of the Fisher information that falls into a region ED \nJ~(k)2 ~ (3 aroundc(k) is given by \n\np({3) := E; dD 2:~ J~~) ( ) \n\nt=l u \n\nX \n\nX \n\nf D \n\n(k) ( \nd X L....i=l Jii X \n\n\"\",D \n\n) \n\nX \n\nj3 f d~ ~D+l At/>(e, F, T) \no \n00 f d~ ~D+l At/>(~2, F, T) \no \n\n(9) \n\nwhere the index (k) was dropped because the tuning curves are assumed to have iden(cid:173)\ntical shapes. Equation (9) allows the definition of an effective receptive field, RF~~, \ninside of which neuron k conveys a major fraction Po of Fisher information, RF~~ := \n{xl~ ~ {3o} , where (3o is chosen such that p({3o) = Po. The Fisher information a \nneuron k carries is small unless x E RF~~. This has the consequence that a fixed stimulus \nx is actually encoded only by a subpopulation of neurons. The point x in stimulus space is \ncovered by \n\nNcode:= 1] Dr(D/2) }1 (Jj \n\n27rD/ 2({30)D D _ \n\n(10) \n\nreceptive fields. With the help of (10), the average population Fisher information (8) can \nbe re-written as \n\n(11) \n\nEquation (11) can be interpreted as follows: We assume that the population of neurons \nencodes stimulus dimension i accurately, while all other dimensions are of secondary im(cid:173)\nportance. The average population Fisher information for dimension i, (Jii ) u, is determined \nby the tuning width in dimension i, (ji, and by the size of the active subpopulation, Ncode ' \nThere is a tradeoff between these quantities. On the one hand, the encoding error can be \ndecreased by decreasing (ji, which enhances the Fisher information carried by each single \n\n\fNeural Representation of Multi-Dimensional Stimuli \n\n119 \n\nneuron. Decreasing ai, on the other hand, will also shrink the active subpopulation via \n(10). This impairs the encoding accuracy, because the stimulus position is evaluated from \nthe activity of fewer neurons. If (11) is valid due to a sufficient receptive field overlap, \nNcode can be increased by increasing the tuning widths, aj, in all other dimensions j i- i. \nThis effect is illustrated in Fig. 2 for D = 2. \n\nX2 \n\nc=:> \n\nx2, s \n\nX2 \n\nII\"\\.. \n, \n\n\\ U \n\nx2,s \n\nFigure 2: Encoding strategy for a stimulus characterized by parameters Xl,s and X2,s' Fea(cid:173)\nture Xl is to be encoded accurately. Effective receptive field shapes are indicated for both \npopulations. If neurons are narrowly tuned in X2 (left), the active population (solid) is \nsmall (here: Ncode = 3). Broadly tuned receptive fields for X2 (right) yield a much larger \npopulation (here: Ncode = 27) thus increasing the encoding accuracy. \n\nIt shall be noted that although a narrow tuning width ai is advantageous, the limit ai ---t 0 \nyields a bad representation. For narrowly tuned cells, gaps appear between the receptive \nfields: The condition 17(X) == const. breaks down, and (6) is no longer valid. A more \ndetailed calculation shows that the encoding error diverges as ai --* 0 [4]. The fact that \nthe encoding error decreases for both narrow tuning and broad tuning - due to (11) - proves \nthe existence of an optimal tuning width, An example is given in Fig. 3a. \n\n3 rTI~--~------~----~------~ \n\n(b) \n\n~~~~;::~-:.~~;: \n\n----- ---- ------- ---\n\n1\\ \nI i \n\n1\\ Ii I I \n\nI I \nI ; \n1\\ \n, \nI\n\n0.8 \n\n;to.6 \n~ \nN~O.4 \nw v \n\nA \n\n0.2 \n\n2 \n\nO'----~--~--~-----'-------' \n2 \n\n0.5 \n\n1.5 \n\no \n\n1 \nA \n\nFigure 3: (a) Example for the encoding behavior with narrow tuning curves arranged on \na regular lattice of dimension D = 1 (grid spacing ~). Tuning curves are Gaussian, and \nneural firing is modeled as a Poisson process, Dots indicate the minimal square encoding \nerror averaged over a uniform distribution of stimuli, (E~in)' as a function ofa. The mini(cid:173)\nmum is clearly visible. The dotted line shows the corresponding approximation according \nto (8). The inset shows Gaussian tuning curves of optimal width, aopt ~ 0.4~. (b) 9D()..) \nas a function of ).. for different values of D. \n\n\f120 \n\nc. W. Eurich, S. D. Wilke and H. Schwegler \n\nNarrow distribution of tuning curves. \nIn order to study the effects of encoding the \nstimulus with distributed tuning widths instead of identical tuning widths as in the previous \ncases, we now consider the distribution \n\nPu(lT1,'\" ,lTD) = g :i e [lTi - (O'i - i)] e [(O'i + i) -lTi] , \n\nD \n\n(12) \n\nwhere e denotes the Heaviside step function. Equation (12) describes a uniform distri(cid:173)\nbution in a D-dimensional cuboid of size b1, ... , b D around (0'1, .. . 0' D); cf. Fig. 1 c. A \nstraightforward calculation shows that in this case, the average population Fisher informa(cid:173)\ntion (6) for i = j becomes \n\n(Jii)u = f/DKtj) F, T, D) \n\n( \n\nn~l 0'1 { \n\nO'~ \n\n1 (bi ) 2 [( bi ) 4] } \n\n1 + 12 O'i + 0 \n\nO'i \n\n. \n\n(13) \n\nA comparison with (8) yields the astonishing result that an increase in bi results in an \nincrease in the i-th diagonal element of the average population Fisher information matrix \nand thus in an improvement in the encoding of the i-th stimulus feature, while the encoding \n:f. i is not affected. Correspondingly, the total encoding error can be \nin dimensions j \ndecreased by increasing an arbitrary number of edge lengths of the cube. The encoding by \na population with a variability in the tuning curve geometries as described is more precise \nthan that by a uniform population. This is true/or arbitrary D. Zhang and Sejnowski [13] \nconsider the more artificial situation of a correlated variability ofthe tuning widths: tuning \ncurves are always assumed to be radially symmetric. This is indicated by the diagonal \nline in Fig. 1 a. A distribution of tuning widths restricted to this subset yields an average \npopulation Fisher information ex: (O'D-2) and does not improve the encoding for D = 2 or \nD=3. \n\nFragmentation into D subpopulations. Finally, we study a family of distributions of \ntuning widths which also yields a lower minimal encoding error than the uniform popula(cid:173)\ntion. Let the density of tuning curves be given by \n\nPu(lT1,'\" ,lTD) = D L 6(lTi - AO') II 6(lTj - 0'), \n\n1 D \n\ni=l \n\nj\u00a5-i \n\n(14) \n\nwhere A > O. For A = 1, the population is uniform as in (7). For A :f. 1, the population \nis split up into D subpopulations; in subpopulation i, lTi is modified while lTj == 0' for \nj :f. i. See Fig. Id for an example. The diagonal elements ofthe average population Fisher \ninformation are \n\n(Jii)u = f/DKtj)(F, T, D) IT \n\n-D-2 {1 + (D - I)A2 } \n\nDA \n\n' \n\n(15) \n\nwhere the term in brackets will be abbreviated as 9D(A). (Jii)u does not depend on i in \nthis case because of the symmetry in the sUbpopulations. Equation (15) and the uniform \ncase (7) differ by 9D(A) which will now be discussed. Figure 3b shows 9D(A) for different \nvalues of D. For A = 1, 9D(A) = 1 and (7) is recovered as expected. 9D(A) = 1 \nalso holds for A = 1/ (D - 1) < 1: narrowing one tuning width in each subpopulation \nwill at first decrease the resolution provided D 2: 3; this is due to the fact that Ncode is \ndecreased. For A < 1/(D - 1), however, 9D(A) > 1, and the resolution exceeds (Jii)u in \n(7) because each neuron in the i-th subpopulation carries a high Fisher information in the \ni-th dimension. D = 2 is a special case where no impairment of encoding occurs because \nthe effect of a decrease of Ncode is less pronounced. Interestingly, an increase in A also \nyields an improvement in the encoding accuracy. This is a combined effect resulting from \nan increase in Ncode on the one hand and the existence of D subpopulations, D - 1 of \n\n\fNeural Representation of Multi-Dimensional Stimuli \n\n121 \n\nwhich maintain their tuning widths in each dimension on the other hand. The discussion \nof 9D(>\") leads to the following encoding strategy. For small >.., (Jii)u increases rapidly, \nwhich suggests a fragmentation of the population into D subpopulations each of which \nencodes one feature with high accuracy, i.e., one tuning width in each subpopulation is \nsmall whereas the remaining tuning widths are broad. Like in the case discussed above, the \ntheoretical limit of this method is a breakdown of the approximation of TJ == const. and the \nvalidity of (6) due to insufficient receptive field overlap. \n\n4 Discussion and Outlook \n\nWe have discussed the effects of a variation of the tuning widths on the encoding accuracy \nobtained by a population of stochastically spiking neurons. The question of an optimal \ntuning strategy has turned out to be more complicated than previously assumed. More \nspecifically, the case which focused most attention in the literature - radially symmetric \nreceptive fields [5, 1,9, 3, 13] - yields a worse encoding accuracy than most other cases we \nhave studied: uniform populations with tuning curves which are not radially symmetric; \ndistributions of tuning curves around some symmetric or non-symmetric tuning curve; and \nthe fragmentation of the population into D subpopulations each of which is specialized in \none stimulus feature. \nIn a next step, the theoretical results will be compared to empirical data on encoding prop(cid:173)\nerties of neural popUlations. One aspect is the existence of sensory maps which consist \nof neural subpopulations with characteristic tuning properties for the features which are \nrepresented. For example, receptive fields of auditory neurons in the midbrain of the barn \nowl have elongated shapes [6]. A second aspect concerns the short-term dynamics of re(cid:173)\nceptive fields. Using single-unit recordings in anaesthetized cats, Worgotter et al. [12] \nobserved changes in receptive field size taking place in 50-lOOms. Our findings suggest \nthat these dynamics alter the resolution obtained for the corresponding stimulus features. \nThe observed effect may therefore realize a mechanism of an adaptable selective signal \nprocessing. \n\nReferences \n[1] Baldi, P. & HeiJigenberg, W. (1988) BioI. Cybern. 59:313-318. \n[2] Deco, G. & Obradovic, D. (1997) An Information-Theoretic Approach to Neural Computing. \n\nNew York: Springer. \n\n[3] Eurich, C. W. & Schwegler, H. (1997) BioI. Cybern. 76: 357-363. \n[4] Eurich, C. W. & Wilke, S. D. (2000) NeuraL Compo (in press). \n[5] Hinton, G. E., McClelland, J. L. & Rumelhart, D. E (1986) In Rumelhart, D. E. & McClelland, \n\nJ. L. (eds.), ParaLLeL Distributed Processing, Vol. 1, pp. 77-109. Cambridge MA: MIT Press. \n\n[6] Knudsen, E. I. & Konishi, M. (1978) Science 200:795-797. \n[7] Kuffter, S. W. (1953) 1. Neurophysiol. 16:37-68. \n[8] Lettvin, J. Y., Maturana, H. R., McCulloch, W. S. & Pitts, W. H. (1959) Proc. Inst. Radio Eng. \n\nNY 47:1940-1951. \n\n[9] Snippe, H. P. & Koenderink, J. J. (1992) BioI. Cybern. 66:543-551. \n[10] Wiggers, W., Roth, G., Eurich, C. W. & Straub, A. (1995) J. Camp. Physiol. A 176:365-377. \n[11] Wilke, S. D. & Eurich, C. W. (1999) In Verleysen, M. (ed.), ESANN 99, European Symposium \n\non Artificial Neural Networks, pp. 435-440. Brussels: D-Facto. \n\n[12] Worgotter, F., Suder, K., Zhao, Y., Kerscher, N., Eysel, U. T. & Funke, K. (1998) Nature \n\n396:165-168. \n\n[13] Zhang, K. & Sejnowski, T. J. (1999) NeuraL Compo 11:75-84. \n\n\f", "award": [], "sourceid": 1760, "authors": [{"given_name": "Christian", "family_name": "Eurich", "institution": null}, {"given_name": "Stefan", "family_name": "Wilke", "institution": null}, {"given_name": "Helmut", "family_name": "Schwegler", "institution": null}]}