{"title": "Perception of the Structure of the Physical World Using Unknown Multimodal Sensors and Effectors", "book": "Advances in Neural Information Processing Systems", "page_first": 945, "page_last": 952, "abstract": "", "full_text": "Perception of the structure of the physical world\nusing unknown multimodal sensors and effectors\n\nD. Philipona\n\nSony CSL, 6 rue Amyot\n\n75005 Paris, France\n\ndavid.philipona@m4x.org\n\nJ.K. O\u2019Regan\n\nLaboratoire de Psychologie Exp\u00b4erimentale, CNRS\n\nUniversit\u00b4e Ren\u00b4e Descartes, 71, avenue Edouard Vaillant\n\n92774 Boulogne-Billancourt Cedex, France\n\nhttp://nivea.psycho.univ-paris5.fr\n\nJ.-P. Nadal\n\nLaboratoire de Physique Statistique, ENS\n\nrue Lhomond\n\n75231 Paris Cedex 05\n\nO. J.-M. D. Coenen\n\nSony CSL, 6 rue Amyot\n\n75005 Paris, France\n\nAbstract\n\nIs there a way for an algorithm linked to an unknown body to infer by\nitself information about this body and the world it is in? Taking the case\nof space for example, is there a way for this algorithm to realize that its\nbody is in a three dimensional world? Is it possible for this algorithm to\ndiscover how to move in a straight line? And more basically: do these\nquestions make any sense at all given that the algorithm only has access\nto the very high-dimensional data consisting of its sensory inputs and\nmotor outputs?\nWe demonstrate in this article how these questions can be given a positive\nanswer. We show that it is possible to make an algorithm that, by ana-\nlyzing the law that links its motor outputs to its sensory inputs, discovers\ninformation about the structure of the world regardless of the devices\nconstituting the body it is linked to. We present results from simulations\ndemonstrating a way to issue motor orders resulting in \u201cfundamental\u201d\nmovements of the body as regards the structure of the physical world.\n\n1 Introduction\n\nWhat is it possible to discover from behind the interface of an unknown body, embedded\nin an unknown world? In previous work [4] we presented an algorithm that can deduce\nthe dimensionality of the outside space in which it is embedded, by making random move-\nments and studying the intrinsic properties of the relation linking outgoing motor orders to\n\n\fresulting changes of sensory inputs (the so called sensorimotor law [3]).\n\nIn the present article we provide a more advanced mathematical overview together with a\nmore robust algorithm, and we also present a multimodal simulation.\n\nThe mathematical section provides a rigorous treatment, relying on concepts from differ-\nential geometry, of what are essentially two very simple ideas. The \ufb01rst idea is that trans-\nformations of the organism-environment system which leave the sensory inputs unchanged\nwill do this independently of the code or the structure of sensors, and are in fact the only\naspects of the sensorimotor law that are independent of the code (property 1). In a single\ngiven sensorimotor con\ufb01guration the effects of such transformations induce what is called\na tangent space over which linear algebra can be used to extract a small number of inde-\npendent basic elements, which we call \u201cmeasuring rod\u201d. The second idea is that there is\na way of applying these measuring rods globally (property 2) so as to discover an overall\nsubstructure in the set of transformations that the organism-environment system can suf-\nfer, and that leave sensory inputs unchanged. Taken together these ideas make it possible,\nif the sensory devices are suf\ufb01ciently informative, to extract an algebraic group structure\ncorresponding to the intrinsic properties of the space in which the organism is embedded.\n\nThe simulation section is for the moment limited to an implementation of the \ufb01rst idea. It\npresents brie\ufb02y the main steps of an implementation giving access to the measuring rods,\nand presents the results of its application to a virtual rat with mixed visual, auditory and\ntactile sensors (see Figure 2). The group discovered reveals the properties of the Euclidian\nspace implicit in the equations describing the physics of the simulated world.\n\nFigure 1: The virtual organism used for the simulations. Random motor commands pro-\nduce random changes in the rat\u2019s body con\ufb01guration, involving uncoordinated movements\nof the head, changes in the gaze direction, and changes in the aperture of the eyelids and\ndiaphragms.\n\n2 Mathematical formulation\n\nLet us note S the sensory inputs, and M the motor outputs. They are the only things\nthe algorithm can access. Let us note P the con\ufb01gurations of the body controlled by the\nalgorithm and E the con\ufb01gurations of the environment.\n\nWe will assume that the body position is controlled by the multidimensional motor outputs\nthrough some law \u03d5a and that the sensory devices together deliver a multidimensional\ninput that is a function \u03d5b of the con\ufb01guration of the body and the con\ufb01guration of the\nenvironment:\n\nP = \u03d5a(M)\n\nand\n\nS = \u03d5b(P, E)\n\nWe shall write \u03d5(M, E) def= \u03d5b(\u03d5a(M), E), note S, M, P, E the sets of all S, M, P , E,\nand assume that M and E are manifolds.\n\n\f2.1 Isotropy group of the sensorimotor law\n\nThrough time, the algorithm will be able to experiment a set of sensorimotor laws linking\nits inputs to its outputs:\n\n\u03d5(\u00b7,E) def= {M 7\u2192 \u03d5(M, E), E \u2208 E}\n\nThese are a set of functions linking S to M, parametrized by the environmental state E.\nOur goal is to extract from this set something that does not depend on the way the sensory\ninformation is provided. In other words something that would be the same for all h\u25e6\u03d5(\u00b7,E),\nwhere h is an invertible function corresponding to a change of encoding, including changes\nof the sensory devices (as long as they provide access to the same information).\nIf we note Sym(X) def= {f : X \u2192 X, f one to one mapping}, and consider :\n\n\u0393(\u03d5) = {f \u2208 Sym(M \u00d7 E) such that \u03d5 \u25e6 f = \u03d5}\n\nthen\nProperty 1 \u0393(\u03d51) = \u0393(\u03d52) \u21d4 \u2203f \u2208 Sym(S) such that \u03d51 = f \u25e6 \u03d52\nThus \u0393(\u03d5) is invariant by change of encoding, and retains from \u03d5 all that is independent\nof the encoding. This result is easily understood using an example from physics: think\nof a light sensor with unknown characteristics in a world consisting of a single point light\nsource. The values of the measures are very dependent on the sensor, but the fact that\nthey are equal on concentric spheres is an intrinsic property of the physics of the situation\n(\u0393(\u03d5), in this case, would be the group of rotations) and is independent of the code and of\nthe sensor\u2019s characteristics.\nBut how can we understand the transformations f which, \ufb01rst, involve a manifold E the\nalgorithm does not know, and second that are invisible since \u03d5\u25e6 f = \u03d5. We will show that,\nunder one reasonable assumption, there is an algorithm that can discover the Lie algebra of\nthe Lie subgroups of \u0393(\u03d5) that have independent actions over M and E, i.e. Lie groups G\nsuch that g(M, E) = (g1(M), g2(E)) for any \u2208 G, with\n\n\u03d5(g1(M), g2(E)) = \u03d5(M, E) \u2200g \u2208 G\n\n(1)\n\n2.2 Fundamental vector \ufb01elds over the sensory inputs\n\nWe will assume that the sensory inputs provide enough information to observe univocally\nthe changes of the environment when the exteroceptive sensors do not move. In mathemat-\nical form, we will assume that:\nCondition 1 There exists U \u00d7 V \u2282 M \u00d7 E such that \u03d5(M,\u00b7) is an injective immersion\nfrom V to S for any M \u2208 U\nUnder this condition, \u03d5(M,V) is a manifold for any P \u2208 U and \u03d5(M,\u00b7) is a diffeomor-\nphism from V to \u03d5(M,V). We shall write \u03d5\u22121(M,\u00b7) its inverse. Choosing M0 \u2208 U, it is\nthus possible to de\ufb01ne an action \u03c6M0 of G over the manifold \u03d5(M0,V) :\n\u03c6M0(g, S) def= \u03d5(M0, g2(\u03d5\u22121(M0, S))) \u2200 S \u2208 \u03d5(M0,V)\n\nAs a consequence (see for instance [2]), for any left invariant vector \ufb01eld X on G there is\nan associated fundamental vector \ufb01eld XS on \u03d5(M0,V)1 :\n\nXS(S) def= d\ndt\n\n\u03c6M0(e\u2212tX , S)|t=0 \u2200 S \u2208 \u03d5(M0,V)\n\n1To avoid heavy notations we have written XS instead of X \u03d5(M0,V).\n\n\fThe key point for us is that this whole vector \ufb01eld can be discovered experimentally by\nthe algorithm from one vector alone : let us suppose the algorithm knows the one vector\ndt \u03c61(e\u2212tX , M0)|t=0 \u2208 TM|M0 (the tangent space of M at M0), that we will call a\nd\nmeasuring rod. Then it can construct a motor command MX(t) such that\n\u03c61(e\u2212tX , M0)|t=0\n\nMX(0) = M0\n\nand\n\n\u02d9MX(0) = \u2212 d\ndt\n\ndt \u03d5(MX(t), \u03d5\u22121(M0, S))|t=0 \u2200 S \u2208 \u03d5(M0,V)\n\nand observe the fundamental \ufb01eld, thanks to the property:\nProperty 2 XS(S) = d\nIndeed the movements of the environment reveal a sub-manifold \u03d5(M0,V) of the manifold\nS of all sensory inputs, and this means they allow to transport the sensory image of the\ngiven measuring rod over this sub-manifold : X(S) is the time derivative of the sensory\ninputs at t = 0 in the movement implied by the motor command MX in that con\ufb01guration\nof the environment yielding S at t = 0.\n\nThe fundamental vector \ufb01elds are the key to our problem because [2] :\n\n(cid:2)XS , Y S(cid:3) = [X, Y ]S\n\nwhere the left term uses the bracket of the vectors \ufb01elds on \u03d5(M0,V) and the right term\nuses the bracket in the Lie algebra of G. Thus clearly we can get insight into the properties\nof the latter by the study of these \ufb01elds. If the action \u03c6M0 is effective (and it is possible to\nshow that for any G there is a subgroup such that it is),we have the additional properties:\n1. X 7\u2192 XS is an injective Lie algebra morphism: we can understand the whole Lie\n\nalgebra of G through the Lie bracket over the fundamental vector \ufb01elds\n\n2. G is diffeomorphic to the group of \ufb01nite compositions of fundamental \ufb02ows : any\n\nelement g of G can be written as g = eX1eX2 . . . eXk, and\n\n\u03c6M0(g, S) = \u03c6M0(eX1, \u03c6M0(eX2, . . . \u03c6M0(eXk , S)))\n\n2.3 Discovery of the measuring rods\n\nh \u02d9M \u2212 \u02d9MX\n\ni\n\nThus the question is: how can the algorithm come to know the measuring rods? If \u03d5 is not\nsingular (that is: is a subimmersion on U \u00d7 V, see [1]), then it can be demonstrated that:\n\n= 0 \u21d2 d\n\n\u2202M (M0, E0)\n\ndt \u03d5(M(t),\u00b7)|t=0 = XS(\u03d5(M0,\u00b7))\n\nProperty 3 \u2202\u03d5\nThis means that the particular choice of one vector of TM|M0 among those that have the\nsame sensory image as a given measuring rod is of no importance for the construction of\nthe associated vector \ufb01eld. Consequently, the search for the measuring rods becomes the\nsearch for their sensory image, which form a linear subspace of the intersection of the\ntangent spaces of \u03d5(M0,V) and \u03d5(U, E0) (as a direct consequence of property 2):\nT \u03d5(U, E0)|S0\n\n\u03c61(e\u2212tX , M0)|t=0 \u2208 T \u03d5(M0,V)|S0\n\n\\\n\n\u2200X\n\n\u2202\u03d5\n\u2202M\n\n(M0, E0) d\ndt\n\nBut what about the rest of the intersection? Reciprocally, it can be shown that:\n\nProperty 4 Any measuring rod that has a sensory image in the intersection of the tangent\nspaces of \u03d5(M0,V) and \u03d5(U, E) for any E \u2208 V reveals a monodimensional subgroup of\ntransformations over V that is invariant under any change of encoding.\n\n\f3 Simulation\n\n3.1 Description of the virtual rat\n\nWe have applied these ideas to a virtual body satisfying the different necessary conditions\nfor the theory to be applied. Though our approach would also apply to the situation where\nthe sensorimotor law involves time-varying functions, for simplicity here we shall take\nthe restricted case where S and M are linked by a non-delayed relationship. We thus\nimplemented a rat\u2019s head with instantaneous reactions so that M \u2208 Rm and S \u2208 Rs. In\nthe simulation, m and s have been arbitrarily assigned the value 300.\n\nThe head had visual, auditory and tactile input devices (see Figure 2). The visual device\nconsisted of two eyes, each one being constituted by 40 photosensitive cells randomly\ndistributed on a planar retina, one lens, one diaphragm (or pupil) and two eyelids. The\nimages of the 9 light sources constituting the environment were projected through the lens\non the retina to locally stimulate photosensitive cells, with a total in\ufb02ux related to the\naperture of the diaphragm and the eyelids. The auditory device was constituted by one\namplitude sensor in each of the two ears, with a sensitivity pro\ufb01le favoring auditory sources\nwith azimuth and elevation 0\u25e6 with respect to the orientation of the head. The tactile device\nwas constituted by 4 whiskers on each side of the rat\u2019s jaw, that stuck to an object when\ntouching it, and delivered a signal related to the shift from rest position. The global sensory\ninputs of dimension 90 (2 \u00d7 40 photosensors plus 2 auditory sensors plus 8 tactile sensors)\nwere delivered to the algorithm through a linear mixing of all the signals delivered by\nthese sensors, using a random matrix WS \u2208 M(s, 90) representing some sensory neural\nencoding in dimension s = 300.\n\nazimuth\n\n(a)\n\n(b)\n\n(c)\n\nFigure 2: The sensory system. (a) the sensory part of both eyes is constituted of randomly\ndistributed photosensitive cells (small dark dots).\n(b) the auditory sensors have a gain\npro\ufb01le favoring sounds coming from the front of the ears. (c) tactile devices stick to the\nsources they come into contact with.\n\nThe motor device was as follows. Sixteen control parameters were constructed from lin-\near combinations of the motor outputs of dimension m = 300 using a random matrix\nWM \u2208 M(16, m) representing some motor neural code. The con\ufb01guration of the rat\u2019s\nhead was then computed from these sixteen variables in this way: six parameters con-\ntrolled the position and orientation of the head, and, for each eye, three controlled the eye\norientation plus two the aperture of the diaphragm and the eyelids. The whiskers were not\ncontrollable, but were \ufb01xed to the head.\nIn the simulation we used linear encoding WS and WM in order to show that the algorithm\nworked even when the dimension of the sensory and motor vectors was high. Note \ufb01rst\nhowever that any, even non-linear, continuous high-dimensional function could have been\nused instead of the linear mixing matrices. More important, note that even when linear\n\n\fmixing is used, the sensorimotor law is highly nonlinear: the sensors deliver signals that\nare not linear with respect to the con\ufb01guration of the rat\u2019s head, and this con\ufb01guration is\nitself not linear with respect to the motor outputs.\n\n3.2 The algorithm\n\nThe \ufb01rst important result of the mathematical section was that the sensory images of the\nmeasuring rods are in the intersection between the tangent space of the sensory inputs\nobserved when issuing different motor outputs while the environment is immobile, and the\ntangent space of the sensory inputs observed when the command being issued is constant.\n\nIn the present simulation we will only be making use of this point, but keep in mind that\nthe second important result was the relation between the fundamental vector \ufb01elds and\nthese measuring rods. This implies that the tangent vectors we are going to \ufb01nd by an\nexperiment for a given sensory input S0 = \u03d5(M0, E0) can be transported in a particular\nway over the whole sub-manifold \u03d5(M0,V), thereby generating the sensory consequences\nof any transformation of E associated with the Lie subgroup of \u0393(\u03d5) whose measuring rods\nhave been found.\n\nFigure 3: Amplitudes of the ratio of successive singular values of : (a) the estimated tangent\nsensorimotor law (when E is \ufb01xed at E0) during the bootstrapping process; (b) the matrix\ncorresponding to an estimated generating family for the tangent space to the manifold of\nsensory inputs observed when M is \ufb01xed at M0; (c) the matrix constituted by concatenating\nthe vectors found in the two previous cases. The nullspaces of the two \ufb01rst matrices re\ufb02ect\nredundant variables; the nullspace of the last one is related to the intersection of the two\n\ufb01rst tangent spaces (see equation 2). The graphs show there are 14 control parameters\nwith respect to the body, and 27 variables to parametrize the environment (see text). The\nnullspace of the last matrix leads to the computation of an intersection of dimension 6\nre\ufb02ecting the Lie group of Euclidian transformations SE(3) (see text).\n\nIn [4], the simulation aimed to demonstrate that the dimensions of the different vector\nspaces involved were accessible. We now present a simulation that goes beyond this by\nestimating these vector space themselves, in particular T \u03d5(M0,V)|S0\nthe case of multimodal sensory inputs and with a robust algorithm. The method previously\nused to estimate the \ufb01rst tangent space, and more speci\ufb01cally its dimension, indeed required\n\nT T \u03d5(U, E0)|S0, in\n\n\fan unrealistic level of accuracy. One of the reasons was the poor behavior of the Singular\nValue Decomposition when dealing with badly conditioned matrices. We have developed a\nmuch more stable method, that furthermore uses time derivatives as a more plausible way\nto estimate the differential than multivariate linear approximation. Indeed, the nonlinear\nfunctional relationship between the motor output and the sensory inputs implies an exact\nlinear relationship between their respective time derivative at a given motor output M0\n\nS(t) = \u03d5(M(t), E0) \u21d2 \u02d9S(0) = \u2202\u03d5\n\u2202M\n\n(M0, E0) \u02d9M(0)\n\nand this linear relationship can be estimated as the linear mapping associating the \u02d9M(0),\nfor any curve in the motor command space such that M(0) = M0, to the resulting \u02d9S(0).\nThe idea is then to use bootstrapping to estimate the time derivative of the \u201cgood\u201d sensory\ninput combinations along the \u201cgood\u201d movements so that this linear relation is diagonal and\nthe decomposition unnecessary : the purpose of the SVD used at each step is to provide\nan indication of what vectors seem to be of interest. At the end of the process, when\nthe linear relationship is judged to be suf\ufb01ciently diagonal, the singular values are taken\nas the diagonal elements, and are thus estimated with the precision of the time derivative\nestimator. Figure 3a presents the evolution of the estimated dimension of the tangent space\nduring this bootstrapping process.\n\nUsing this method in the \ufb01rst stage of the experiment when the environment is immobile\nmakes it possible for the algorithm, at the same time as it \ufb01nds a basis for the tangent\nspace, to calibrate the signals coming from the head : it extracts sensory input combinations\nthat are meaningful as regards its own mobility. Then during a second stage, using these\ncombinations, it estimates the tangent space to sensory inputs resulting from movement of\nthe environment while it keeps its motor output \ufb01xed at M0. Finally, using the tangent\nspaces estimated in these two stages, it computes their intersection : if T SM is a matrix\ncontaining the basis of the \ufb01rst tangent space, and T SE a basis of the second tangent space,\nthen the nullspace of [T SM , T SE] allows to generate the intersection of the two spaces:\n\n[T SM , T SE]\u03bb = 0 \u21d2 T SM \u03bbM = \u2212T SE\u03bbE where \u03bb = (\u03bbT\n\nM , \u03bbT\n\nE)T\n\n(2)\n\nTo conclude, using the pseudo-inverse of the tangent sensorimotor law, the algorithm com-\nputes measuring rods that have a sensory image in that intersection; and this computation\nis simple since the adaptation process made the tangent law diagonal.\n\n3.3 Results2\n\nFigure 3a demonstrates the evolution of the estimation of the ratio between successive sin-\ngular values. The maximum of this ratio can be taken as the frontier between signi\ufb01cantly\nnon-zero values and zero ones, and thus reveals the dimension of the tangent space to the\nsensory inputs observed in an immobile environment. There are indeed 14 effective pa-\nrameters of control of the body with respect to the sensory inputs: from the 16 parameters\ndescribed in section 3.1, for each eye the two parameters controlling the aperture of the di-\naphragm and the eyelids combine in a single effective one characterizing the total incoming\nlight in\ufb02ux.\n\nAfter this adaptation process the tangent space to sensory inputs observed for a \ufb01xed motor\noutput M0 can be estimated without bootstrapping as shown, as regards its dimension (27 =\n9 \u00d7 3 for the 9 light sources moving in a three dimensional space), in Figure 3b. The\nintersection is computed from the nullspace of the matrix constituted by concatenation\nof generating vectors of the two previous spaces, using equation 2. This nullspace is of\n\n2The Matlab code of the simulation can be downloaded at http://nivea.psycho.\n\nuniv-paris5.fr/\u02dcphilipona for further examination.\n\n\fFigure 4: The effects of motor commands corresponding to a generating family of 6 inde-\npendent measuring rods computed by the algorithm. They reveal the control of the head\nin a rigid fashion. Without the Lie bracket to understand commutativity, these movements\ninvolve arbitrary compositions of translations and rotations.\n\ndimension 41 \u2212 35 = 6, as shown in Figure 3c. Note that the graph shows the ratio\nof successive singular values, and thus has one less value than the number of vectors.\nFigure 4 demonstrates the movements of the rat\u2019s head associated with the measuring rods\nfound using the pseudoinverse of the sensorimotor law. Contrast these with the non-rigid\nmovements of the rat\u2019s head associated with random motor commands of Figure 1.\n\n4 Conclusion\n\nWe have shown that sensorimotor laws possess intrinsic properties related to the structure\nof the physical world in which an organism\u2019s body is embedded. These properties have an\noverall group structure, for which smoothly parametrizable subgroups that act separately on\nthe body and on the environment can be discovered. We have brie\ufb02y presented a simulation\ndemonstrating the way to access the measuring rods of these subgroups.\n\nWe are currently conducting our \ufb01rst successful experiments on the estimation of the Lie\nbracket. This will allow the groups whose measuring rods have been found to be decom-\nposed. It will then be possible for the algorithm to distinguish for instance between trans-\nlations and rotations, and between rotations around different centers.\n\nThe question now is to determine what can be done with these \ufb01rst results: is this intrinsic\nunderstanding of space enough to discover the subgroups of \u0393(\u03d5) that do not act both on\nthe body and the environment: for example those acting on the body alone should provide\na decomposition of the body with respect to its articulations.\n\nThe ultimate goal is to show that there is a way of extracting objects in the environment\nfrom the sensorimotor law, even though nothing is known about the sensors and effectors.\n\nReferences\n\n[1] N. Bourbaki. Vari\u00b4etes diff\u00b4erentielles et analytiques. Fascicule de r\u00b4esultats. Hermann,\n\n1971-1997.\n\n[2] T. Masson. G\u00b4eom\u00b4etrie diff\u00b4erentielle, groupes et alg`ebres de Lie, \ufb01br\u00b4es et connexions.\n\nLPT, 2001.\n\n[3] J. K. O\u2019Regan and A. No\u00a8e. A sensorimotor account of vision and visual consciousness.\n\nBehavioral and Brain Sciences, 24(5), 2001.\n\n[4] D. Philipona, K. O\u2019Regan, and J.-P. Nadal. Is there something out there ? Inferring\n\nspace from sensorimotor dependencies. Neural Computation, 15(9), 2003.\n\n\f", "award": [], "sourceid": 2348, "authors": [{"given_name": "D.", "family_name": "Philipona", "institution": null}, {"given_name": "J.k.", "family_name": "O'regan", "institution": null}, {"given_name": "J.-p.", "family_name": "Nadal", "institution": null}, {"given_name": "Olivier", "family_name": "Coenen", "institution": null}]}