{"title": "Neural Network Visualization", "book": "Advances in Neural Information Processing Systems", "page_first": 465, "page_last": 472, "abstract": null, "full_text": "Neural Network Visualization \n\n465 \n\nNEURAL NETWORK VISUALIZATION \n\nJakub Wejchert \nGerald Tesauro \n\nIB M Research \n\nT.J. Watson Research \n\nCenter \n\nYorktown Heights \n\nNY 10598 \n\nABSTRACT \n\nWe have developed graphics to visualize static and dynamic infor(cid:173)\nmation in layered neural network learning systems. Emphasis was \nplaced on creating new visuals that make use of spatial arrange(cid:173)\nments, size information, animation and color. We applied these \ntools to the study of back-propagation learning of simple Boolean \npredicates, and have obtained new insights into the dynamics of \nthe learning process. \n\nINTRODUCTION \n\n1 \nAlthough neural network learning systems are being widely investigated by many \nresearchers via computer simulations, the graphical display of information in these \nIn other fields such as fluid \nsimulations has received relatively little attention. \ndynamics and chaos theory, the development of \"scientific visualization\" techniques \n(1,3) have proven to be a tremendously useful aid to research, development, and \neducation. Similar benefits should result from the application of these techniques \nto neural networks research. \nIn this article, several visualization methods are introduced to investigate learning \nin neural networks which use the back-propagation algorithm. A multi-window \n\n\f466 Wejchert and Tesauro \n\nenvironment is used that allows different aspects of the simulation to be displayed \nsimultaneously in each window. \n\nAs an application, the toolkit is used to study small networks learning Boolean \nfunctions. The animations are used to observe the emerging structure of connection \nstrengths, to study the temporal behaviour, and to understand the relationships and \neffects of parameters. The simulations and graphics can run at real-time speeds. \n\n2 VISUAL REPRESENTATIONS \nFirst, we introduce our techniques for representing both the instantaneous dynamics \nof the learning process, and the full temporal trajectory of the network during the \ncourse of one or more learning runs. \n\n2.1 The Bond Diagram \n\nIn the first of these diagrams, the geometrical structure of a connected network is \nused as a basis for the representation. As it is of interest to try to see how the \ninternal configuration of weights relates to the problem the network is learning, it is \nclearly worthwile to have a graphical representation that explicitly includes weight \ninformation integrated with network topology. This differs from \"Hinton diagrams\" \n(2), in which data may only be indirectly related to the network structure. In our \nrepresentation nodes are represented by circles, the area of which are proportional \nto the threshold values. Triangles or lines are used to represent the weights or their \nrate of change. The triangles or line segments emanate from the nodes and point \ntoward the connecting nodes. Their lengths indicate the magnitude of the weight \nor weight derivative. We call this the \"bond diagram\". \nIn this diagram, one can look at any node and clearly see the magnitude of the \nweights feeding into and out of it. Also, a sense of direction is built into the picture \nsince the bonds point to the node that they are connected to. Further, the collection \nof weights form distinct patterns that can be easily perceived, so that one can also \ninfer global information from the overall patterns formed. \n\n2.2 The Trajectory Diagram \n\nA further limitation of Hinton diagrams is that they provide a relatively poor rep(cid:173)\nresentation of dynamic information. Therefore, to understand more about the dy(cid:173)\nnamics of learning we introduce another visual tool that gives a two-dimensional \nprojection of the weight space of the network. This represents the learning pro(cid:173)\ncess as a trajectory in a reduced dimensional space. By representing the value of \nthe error function as the color of the point in weight space, one obtains a sense of \nthe contours of the error hypersurface, and the dynamics of the gradient-descent \nevolution on this hypersurface. We call this the \"trajectory diagram\". \n\nThe scheme is based on the premise that the human user has a good visual notion \nof vector addition. To represent an n-dimensional point, its axial components are \ndefined as vectors and then are plotted radially in the plane; the vector sum of \nthese is then calculated to yield the point representing the n-dimensional position. \n\n\fNeural Network Visualization \n\n467 \n\nIt is obvious that for n > 2 the resultant point is not unique, however, the method \ndoes allow one to infer information about families of similar trajectories, make \ncomparisons between trajectories and notice important deviations in behaviour. \n\n2.3 \n\nImplementation \n\nThe graphics software was written in C using X-Windows v. 11. The C code was \ninterfaced to a FORTRAN neural network simulator. The whole package ran under \nUNIX, on an RT workstation. Using the portability of X- Windows the graphics \ncould be run remotely on different machines using a local area network. Excecution \ntime was slow for real-time interaction except for very small networks (typically \nup 30 weights). For larger networks, the Stellar graphics workstation was used, \nwhereby the simulator code could be vectorized and parallelized. \n\n3 APPLICATION EXAMPLES \nWith the graphics we investigated networks learning Boolean functions: binary \ninput vectors were presented to the network through the input nodes, and the \nteacher signal was set to either 1 or O. Here, we show networks learning majority, and \nsymmetry functions. The output of the majority function is 1 only if more than half \nof the input nodes are on; simple symmetry distiguishes between input vectors that \nare symmetric or anti-symmetric about a central axis; general symmetry identifies \nperfectly symmetric patterns out of all other permutations. Using the graphics, \none can watch how solutions to a particular problem are obtained, how different \nparameters affect these solutions, and observe stages at which learning decisions are \nmade. \n\nAt the start of the simulations the weights are set to small random values. During \nlearning, many example patterns of vectors are presented to the input of the network \nand weights are adjusted accordingly. \nInitially the rate of change of weights is \nsmall, later as the simulation gets under way the weights change rapidly, until small \nchanges are made as the system moves toward the final solution. Distinct patterns \nof triangles show the configuration of weights in their final form. \n\n3.1 The Majority Function \n\nFigure 1 shows a bond diagram for a network that has learnt the majority function. \nDuring the run, many input patterns were presented to the network during which \ntime the weights were changed. The weights evolve from small random values \nthrough to an almost uniform set corresponding to the solution of the problem. \nTowards the end, a large output node is displayed and the magnitudes of all the \nweights are roughly uniform, indicating that a large bias (or threshold) is required \nto offset the sum of the weights. Majority is quite a simple problem for the network \nto learn; more complicated functions require hidden units. \n\n3.2 The Simple Symmetry Function \n\nIn this case only symmetric or perfectly anti-symmetric patterns are presented and \nthe network is taught to distinguish between these. In solving this problem, the \n\n\f468 Wejchert and Tesauro \n\nFigure 1: A near-final configuration of weights for the majority function. All the \nweights are positive. The disc corresponds to the threshold of the output unit. \n\n\fNeural Network Visualization \n\n469 \n\nnetwork chose (correctly) that it needs only two units to make the decision whether \nthe input is totally symmetric or totally anti-symmetric. (In fact, any symmetrically \nseparated input pair will work.) It was found that the simple pattern created by the \nbond representation carries over into the more general symmetry function, where the \nnetwork must identify perfectly symmetric inputs from all the other permutations. \n\n3.3 The General Symmetry Function \n\nHere, the network is required to detect symmtery out of all the possible input \npatterns. As can be seen from the bond diagram (figure 2) the network has chosen \na hierarchical structure of weights to solve the problem, using the basic pattern of \nweights of simple symmtery. The major decision is made on the outer pair and \nadditional decisions are made on the remaining pairs with decreasing strength. As \nbefore, the choice of pairs in the hierarchy depends on the initial random weights. \nBy watching the animations, we could make some observations about the stages of \nlearning. We found that the early behavior was the most critical as it was at this \nstage that the signs of the weights feeding to the hidden units were determined. At \nthe later stages the relative magnitudes of the weights were adapted. \n\n3.4 The Visualization Environment \n\nFigure 3 shows the visualization environment with most of the windows active. The \nupper window shows the total error, and the lower window the state of the output \nunit. Typically, the error initially stays high then decreases rapidly and then levels \noff to zero as final adjustments are made to the weights. Spikes in this curve are \ndue to the method of presenting patterns at random. The state of the output unit \ninitially oscillates and then bifurcates into the two requires output states. \n\nThe two extra windows on the right show the trajectory diagrams for the two \nhidden units. These diagrams are generalizations of phase diagrams: components \nof a point in a high dimensional space are plotted radially in the plane and treated \nas vectors whose sum yields a point in the two-dimensional representation. We have \nfound these diagrams useful in observing the trajectories of the two hidden units, \nin which case they are representations of paths in a six-dimensional weight space. \nIn cases where the network does converge to a correct solution, the paths of the two \nhidden units either try to match each other (in which case the configurations of the \nunits were identical) or move in opposite directions (in which case the units were \nopposites ). \n\nBy contrast, for learning runs which do not converge to global optima we found \nthat usually one of the hidden units followed a normal trajectory whereas the other \nunit was not able to achieve the appropriate match or anti-match. This is because \nthe signs of the weights to the second hidden unit were not correct and the learning \nalgorithm could not make the necessary adjustments. At a certain point early in \nlearning the unit would travel off on a completely different trajectory. These obser(cid:173)\nvations suggest a heuristic that could improve learning by setting initial trajectories \nin the \"correct\" directions. \n\n\f470 \n\n\\Vejchert and Tesauro \n\nFigure 2: The bond diagram for a network that has learnt the symmetry function. \nThere are six input units, two hidden and one output. Weights are shown by bonds \nemantating from nodes. In the graphics positive and negative weights are colored \nred and blue respectively. In this grey-scale photo the negative weights are marked \nwith diagonal lines to distiguish them from positive weights. \n\n\fNeural Network Visualization \n\n471 \n\nFigure 3: An example of the graphics with most of the windows active; the com(cid:173)\nmand line appears on the bottom. The central window shows the bond diagram \nof the General Symmetry function. The upp er left window shows the total error, \nand the lower left window the state of the output unit. The two windows on the \nright show the trajectory diagrams for the two hidden units. The \"spokes\" in this \ndiagram correspond to the magnitude of the weights. The trace of dots are the \npaths of the two units in weight space. \n\n\f472 Wejchert and Tesauro \n\nIn general, the trajectory diagram has similar uses to a conventional phase plot: it \ncan distinguish between different regions of configuration space; it can be used to \ndetect critical stages of the dynamics of a system; and it gives a \"trace\" of its time \nevolution. \n\n4 CONCLUSION \nA set of computer graphics visualization programs have been designed and interfaced \nto a back-propagation simulator. Some new visualization tools were introduced such \nas the bond and trajectory diagrams. These and other visual tools were integrated \ninto an interactive multi-window environment. \n\nDuring the course of the work it was found that the graphics was useful in a number \nof ways: in giving a clearer picture of the internal representation of weights, the \neffects of parameters, the detection of errors in the code, and pointing out aspects \nof the simulation that had not been expected beforehand. Also, insight was gained \ninto principles of designing graphics for scientific processes. \n\nIt would be of interest to extend our visualization techniques to include large net(cid:173)\nworks with thousands of nodes and tens of thousands of weights. We are currently \nexamining a number of alternative techniques which are more appropriate for large \ndata-set regimes. \n\nAcknow ledgements \n\nWe wish to thank Scott Kirkpatrick for help and encouragment during the project. \nWe also thank members of the visualization lab and the animation lab for use of \ntheir resources. \n\nReferences \n\n(1) McCormick B H, DeFanti T A Brown M D (Eds), \"Visualization in Scientific \nComputing\" Computer Graphics 21, 6, November (1987). See also \"Visualization \nin Scientific Computing-A Synopsis\", IEEE Computer Graphics and Applications, \nJuly (1987). \n\n(2) Rumelhart D E, McClelland J L, \"Parallel Distributed Processing: Explorations \nin the Microstructure of Cognition. Volume 1\" MIT Press, Cambridge, MA (1986). \n\n(3) Tufte E R, \"The Visual Display of Quantitative Information\", Graphic Press, \nChesire, CT (1983). \n\n\fPART VI: \n\nNEW LEARNING ALGORITHMS \n\n\f", "award": [], "sourceid": 286, "authors": [{"given_name": "Jakub", "family_name": "Wejchert", "institution": null}, {"given_name": "Gerald", "family_name": "Tesauro", "institution": null}]}