{"title": "Learning transport operators for image manifolds", "book": "Advances in Neural Information Processing Systems", "page_first": 423, "page_last": 431, "abstract": "We describe a method for learning a group of continuous transformation operators  to traverse smooth nonlinear manifolds. The method is applied to model how  natural images change over time and scale. The group of continuous transform  operators is represented by a basis that is adapted to the statistics of the data so  that the in\ufb01nitesimal generator for a measurement orbit can be produced by a  linear combination of a few basis elements. We illustrate how the method can be  used to ef\ufb01ciently code time-varying images by describing changes across time  and scale in terms of the learned operators.", "full_text": "Learning transport operators for image manifolds\n\nBenjamin J. Culpepper\nDepartment of EECS\n\nComputer Science Division\n\nUniversity of California, Berkeley\n\nBerkeley, CA 94720\n\nbjc@cs.berkeley.edu\n\nBruno A. Olshausen\n\nbaolshausen@berkeley.edu\n\nHelen Wills Neuroscience Institute\n\n& School of Optometry\n\nUniversity of California, Berkeley\n\nBerkeley, CA 94720\n\nAbstract\n\nWe describe an unsupervised manifold learning algorithm that represents a surface\nthrough a compact description of operators that traverse it. The operators are\nbased on matrix exponentials, which are the solution to a system of \ufb01rst-order\nlinear differential equations. The matrix exponents are represented by a basis\nthat is adapted to the statistics of the data so that the in\ufb01nitesimal generator for a\ntrajectory along the underlying manifold can be produced by linearly composing\na few elements. The method is applied to recover topological structure from low\ndimensional synthetic data, and to model local structure in how natural images\nchange over time and scale.\n\n1\n\nIntroduction\n\nIt is well known that natural images occupy a small fraction of the space of all possible images.\nMoreover, as images change over time in response to observer motion or changes in the environ-\nment they trace out particular trajectories along manifolds in this space. It is reasonable to expect\nthat perceptual systems have evolved ways to ef\ufb01ciently model these manifolds, and thus mathe-\nmatical models that capture their structure in operators that transport along them may be of use for\nunderstanding perceptual systems, as well as for engineering arti\ufb01cial vision systems. In this paper,\nwe derive methods for learning these transport operators from data.\nRather than simply learning a mapping of individual data points to a low-dimensional space, we seek\na compact representation of the entire manifold via the operators that traverse it. We investigate a\ndirect application of the Lie approach to invariance [1] utilizing a matrix exponential generative\nmodel for transforming images. This is in contrast to previous methods that rely mainly upon a \ufb01rst-\norder Taylor series approximation of the matrix exponential [2,3], and bilinear models, in which the\ntransformation variables interact multiplicatively with the input [4,5,6]. It is also distinct from the\nclass of methods that learn embeddings of manifolds from point cloud data [7,8,9,10]. The spirit\nof this work is similar to [11], which also uses a spectral decomposition to make learning tractable\nin extremely high dimensional Lie groups, such as those over images. We share the goal of [12] of\nlearning a model of the manifold which can then be generalized to new data.\nHere we show how a particular class of transport operators for moving along manifolds may be\nlearned from data. The model is \ufb01rst applied to synthetic datasets to demonstrate interesting cases\nwhere it can recover topology, and that for more dif\ufb01cult cases it neatly approximates the local\nstructure. Subsequently, we apply it to time-varying natural images and extrapolate along inferred\ntrajectories to demonstrate super-resolution and temporal \ufb01lling-in of missing video frames.\n\n1\n\n\fM(cid:88)\n\n2 Problem formulation\nLet us consider an image of the visual world at time t as a point x \u2208 RN , where the elements of x\ncorrespond to image pixels. We describe the evolution of x as\n\n\u02d9x = A x ,\n\n(1)\n\nwhere the matrix A is a linear operator capturing some action in the environment that transforms\nthe image. Such an action belongs to a family that occupies a subspace of RN\u00d7N given by\n\nA =\n\n\u03a8m cm\n\n(2)\n\nm=1\n\nfor some M \u2264 N 2 (usually M << N 2), with \u03a8m \u2208 RN\u00d7N . The amount of a particular action\nfrom the dictionary \u03a8m that occurs is controlled by the corresponding cm. At t = 0, a vision system\ntakes an image x0, and then makes repeated observations at intervals \u2206t. Given x0, the solution to\n(1) traces out a continuously differentiable manifold of images given by xt = exp(At) x0, which\nwe observe periodically. Our goal is to learn an appropriate set of bases, \u03a8, that allow for a compact\ndescription of this set of transformations by training on many pairs of related observations.\nThis generative model for transformed images has a number of attractive properties. First, it factors\napart the time-varying image into an invariant part (the initial image, x0) and variant part (the trans-\nformation, parameterized by the coef\ufb01cient vector c), thus making explicit the underlying causes.\nSecond, the learned exponential operators are quite powerful in terms of modeling capacity, com-\npared to their linear counterparts. Lastly, the partial derivatives of the objective function have a\nsimple form that may be computed ef\ufb01ciently.\n\n3 Algorithm\n\nThe model parameters are learned by maximizing the log-likelihood of the model. Consider two\n\u2018close\u2019 states of the system in isolation. Let x0 be our initial condition, and x1 be a second observa-\ntion. These points are related through an exponentiated matrix that itself is composed of a few basis\nelements, plus zero-mean white i.i.d. Gaussian noise, n:\n\nx1 = T(c) x0 + n\n\nT(c) = exp((cid:88)\n\n(3)\n\n1\n2\n\n(cid:88)\n\n(4)\nWe assume a factorial sparse prior over the transform variables c of the form P (cm) \u221d\nexp(\u2212\u03b6 |cm|). The negative log of the posterior probability of the data under the model is given\nby\n\n\u03a8m cm) .\n\nm\n\n||\u03a8m||2\n\nF + \u03b6||c||1 ,\n\n||x1 \u2212 T(c) x0||2\n\n2 + \u03b3\n2\n\nm\n\nE =\n\n(5)\nwhere || \u00b7 ||F is the Frobenius norm, which acts to regularize the dictionary element lengths. The\n1-norm encourages sparsity. Given two data points, the solution of the c variables which relate them\nthrough \u03a8 is found by a fast minimization of E with respect to c.\nLearning of the basis \u03a8 proceeds by gradient descent with respect to E. (Note that this constitutes\na variational approximation to the log-likelihood, similar to [13].) The \u03a8 variables are initialized\nrandomly, and adjusted according to \u2206\u03a8 = \u2212\u03b7 \u2202E\n\u2202\u03a8, using the solution, c, for a pair of observations\nx0, x1. Figure 1 outlines the steps of the algorithm.\nThe partial derivatives of E w.r.t. c and \u03a8 can be cast in a simple form using the spectral decomposi-\n\u03b1, with right eigenvectors u\u03b1, left eigenvectors v\u03b1, and eigenvalues\n\u03bb\u03b1 [14]. Let U = [u1u2...uN ], V = [v1v2...vN ] and D be a diagonal matrix of the eigenvalues\n\u03bb\u03b1. Then\n\ntion of A, given by(cid:80)\n\n\u03b1 \u03bb\u03b1u\u03b1vT\n\nF\u03b1\u03b2Ui\u03b1Vk\u03b1Ul\u03b2Vj\u03b2 ,\n\n(6)\n\n\u2202 exp(A)ij\n\n\u2202Akl\n\n=(cid:88)\n\n\u03b1\u03b2\n\n2\n\n\fchoose M \u2264 N 2\ninitialize \u03a8\n\n1\n2\n3 while stopping criteria is not met,\n4\n5\n6\n7\n8\n9\n\npick x0, x1\ninitialize c to zeros\nc \u2190 arg minc E\n\u2206\u03a8 = \u2212\u03b7 \u2202E\nsort \u03a8m by ||\u03a8m||F\nM \u2190 max m s.t. ||\u03a8m||F > \u0001\n\n\u2202\u03a8\n\nFigure 1: Pseudo-code for the learning algorithm. Steps 1-2 initialize. A typical stopping criteria\nin step 3 is that the reconstruction error or sparsity on some held-out data falls below a threshold.\nSteps 4-6 compute an E-step on some pair of data points. Step 7 computes a \u2018partial\u2019 M-step.\nSteps 8-9 shrink the subspace spanned by the dictionary if one or more of the elements have shrunk\nsuf\ufb01ciently in norm.\n\nwhere the matrix F is given by:\n\nF\u03b1\u03b2 =\n\n(cid:40) exp(\u03bb\u03b2 )\u2212exp(\u03bb\u03b1)\n\n\u03bb\u03b2\u2212\u03bb\u03b1\nexp(\u03bb\u03b1)\n\nif \u03bb\u03b2 (cid:54)= \u03bb\u03b1\notherwise\n\n(7)\n\nApplication of the chain rule and a re-arrangement of terms yields simpli\ufb01ed forms for the partials\nof E w.r.t. c and \u03a8. After computing two intermediate terms P and Q,\n\nT + x0x0\nVk\u03b1Ul\u03b2F\u03b1\u03b2P\u03b1\u03b2 ,\n\nTTT)V\n\nP = UT(x1x0\n\nQkl = (cid:88)\n= (cid:88)\n\n\u03b1\u03b2\n\n\u2202E\n\u2202cm\n\n(8)\n(9)\n\n(10)\n\n(11)\n\nthe two partial derivatives for inference and learning are:\n\nQkl \u03a8klm + \u03b6 sgn(cm)\n\nkl\n\n= Qkl cm + \u03b3 \u03a8klm .\n\n\u2202E\n\n\u2202\u03a8klm\n\nThe order of complexity for both derivatives is determined by the computation of Q, which requires\nan eigen-decomposition and a few matrix multiplications, giving O(N p) with 2 < p < 3.\n\n4 Experiments on point sets\n\nWe \ufb01rst test the model by applying it to simple datasets where the solutions are known: learning\nthe topology of a sphere and a torus. Second, we apply the model to learn the manifold of time-\nvarying responses to a natural movie from complex oriented \ufb01lters. These demonstrations illustrate\nthe algorithm\u2019s capability for learning signi\ufb01cant non-linear structure.\nWe have also applied the model to the Klein bottle. Though closely related to the torus, it is an\nexample of a low-dimensional surface whose topology can not be captured by a \ufb01rst-order Lie\noperator, though our model is able to interpolate between points on the surface using a piecewise\napproximation (see the supplementary material accompanying this paper for further discussion of\nthis point).\nRelated pairs of points on a torus are generated by choosing two angles \u03b80, \u03c60 uniformly at random\nfrom [0, 2\u03c0]; two related angles \u03b81, \u03c61 are produced by sampling from two von Mises distributions\nwith means \u03b80 and \u03c60, and concentration \u03ba = 5 using the circular statistics toolbox of [15]. For the\nsphere, we generate the \ufb01rst pair of angles using the normal-deviate method, to avoid concentration\nof samples near the poles. Though parameterized by two angles, the coordinates of points on these\nsurfaces are 3- and 4-dimensional; pairs of points xt for t = 0, 1 on the unit sphere are given by xt =\n(sin \u03b8t cos \u03c6t, sin \u03b8t sin \u03c6t, cos \u03b8t), and points on a torus by xt = (cos \u03b8t, sin \u03b8t, cos \u03c6t, sin \u03c6t).\n\n3\n\n\fFigure 2: Orbits of learned sphere operators.\n(a) Three \u03a8m basis elements applied to points\nat the six poles of the sphere, (1, 0, 0), (0, 1, 0), (0, 0, 1), (\u22121, 0, 0), (0,\u22121, 0), and (0, 0,\u22121). The\norbits are generated by setting x0 to a pole, then plotting xt = exp(\u03a8m t) x0 for t = [\u2212100, 100].\n(b) When superimposed on top of each other, the three sets of orbits clearly de\ufb01ne the surface of a\nsphere.\n\nFigure 3: Orbits of learned torus operators. Each row shows three projections of a \u03a8m basis\nelement applied to a point on the surface of the torus. The orbits shown are generated by setting\nx0 = (0, 1, 0, 1) then plotting xt = exp(\u03a8m t) x0 for t = [\u22121000, 1000] in projections constructed\nfrom each triplet of the four coordinates.\nIn each plot, two coordinates always obey a circular\nrelationship, while the third varies more freely.\n\n4\n\n\u2212101\u2212101\u22121\u22120.500.51\u2212101\u2212101\u22121\u22120.500.51\u2212101\u2212101\u22121\u22120.500.51\u22121\u22120.500.51\u22121\u22120.500.51\u22121\u22120.500.51(a)(b)\u2212101\u22121\u22120.500.51\u22121\u22120.500.51x1x2x3\u2212101\u22121\u22120.500.51\u22121\u22120.500.51x1x2x4\u2212101\u22121\u22120.500.51\u22121\u22120.500.51x2x3x4\u2212101\u22121\u22120.500.51\u22121\u22120.500.51x1x2x3\u2212101\u22121\u22120.500.51\u22121\u22120.500.51x1x2x4\u2212101\u22121\u22120.500.51\u22121\u22120.500.51x2x3x4\fFigure 4: Learning transformations of oriented \ufb01lter pairs across time. The orbits of three\nthree complex \ufb01lter outputs in response to a natural movie. The blue points denote the complex\noutput for each frame in the movie sequence and are linked to their neighbors via the blue line. The\npoints circled in red were observed by the model, and the red curve shows an extrapolation along\nthe estimated trajectory.\n\nFor the sphere, N = 3, thus setting M = 9 gives the model the freedom to generate the full space\nof A operators. The \u03a8 are initialized to mean-zero white Gaussian noise with variance 0.01, and\n10, 000 learning updates are computed by generating a pair of related points, minimizing E w.r.t.\nc, then updating \u03a8 according to \u2206\u03a8 = \u2212\u03b7 \u2202E\n\u2202\u03a8. In all of the point set experiments, \u03b3 = 0.0001\nand \u03b6 = 0.01. For cases where topology can be recovered, the solution is robust to the settings of\n\u03b3 and \u03b6 \u2013 changing either variable by an order of magnitude does not change the solution, though\nit may increase the number of learning steps required to get to it. In cases where the topology can\nnot be recovered, the in\ufb02uence on the solution of the settings of \u03b3 and \u03b6 is more subtle, as their\nrelative values effectively trade-off the importance of data reconstruction and the sparsity of the\nvector c. We adjust \u03b7 during learning as follows: when \u2206\u03a8 causes E to decrease, we multiply \u03b7 by\n1.01; otherwise, we multiply \u03b7 by 0.99. When the model has more parameters than it needs to fully\ncapture the topology of the sphere this fact is evident from the solution it learns: six of the dictionary\nelements \u03a8m drop out (they have norm less than 10\u22126), since the F-norm \u2018weight decay\u2019 term kills\noff dictionary elements that are used rarely. Figure 2 shows orbits produced by applying each of the\nremaining \u03a8m operators to points on the sphere. Similar experiments are successful for the torus;\nFigure 3 shows trajectories of the operators learned for the torus.\nAs an intermediate step towards modeling time varying natural images, we investigate the model\u2019s\nability to learn the response surface for a single complex oriented \ufb01lter to a moving image. A\ncomplex pyramid is built from each frame in a movie, and pairs of \ufb01lter responses 1 to 4 frames\napart are observed by the model. Four 2x2 basis functions are learned in the manner described\nabove. Figure 4 shows three representative examples that illustrate how well the model is able to\nextrapolate from the solution estimated using the learned basis \u03a8, and complex responses from the\nsame \ufb01lter within a 4 frame time interval. In most cases, this trajectory follows the data closely for\nseveral frames.\n\n5 Experiments on movies\n\nIn the image domain, our model has potential applications in temporal interpolation/\ufb01lling in of\nvideo, super-resolution, compression, and geodesic distance estimation. We apply the model to\nmoving natural images and investigate the \ufb01rst three applications; the third will be the subject of\nfuture work. Here we report on the ability of the model to learn transformations across time, as\nwell as across scales of the Laplacian pyramid. Our data is many short grayscale video sequences\nof Africa from the BBC.\n\n5.1 Time\n\nWe apply the model to natural movies by presenting it with patches of adjacent frame pairs. Using an\nanalytically generated in\ufb01nitesimal shift operator, we \ufb01rst run a series of experiments to determine\nthe effect of local minima on the recovery of a known displacement through the minimization of E\n\n5\n\n\fFigure 5: Shift operator learned from synthetically transformed natural images. The operator\n\u03a81, displayed as an array of weights that, for each output pixel, shows the strength of its connection\nto each input pixel. Each of the 15x15 arrays represents one output pixel\u2019s connections. Because of\nthe 1/f 2 falloff in the power spectrum of natural images, synthetic images with a wildly different\ndistribution of spatial frequency content, such as uncorrelated noise, will not be properly shifted by\nthis operator.\n\nFigure 6: Interpolating between shifted images to temporally \ufb01ll in missing video frames. Two\nimages x0 and x1 are generated by convolving an image of a diagonal line and a shifted diagonal\nline by a 3x3 Gaussian kernel with \u03c3 = 0.8, and the operator A is inferred. The top row shows the\nsequence of images xt = exp(At) x0 for t = 0.25, 0.50, 0.75, 1.00. The middle row shows linear\ninterpolation between x0 and x1. The bottom row shows the sequence of images xt = (I + At) x0,\nthat is, the \ufb01rst-order Taylor expansion of the matrix exponential, which performs poorly for shifts\ngreater than one pixel.\n\n6\n\nt = 0.00t = 0.25t = 0.50t = 0.75t = 1.00t = 0.00t = 0.25t = 0.50t = 0.75t = 1.00exponentiallinear1st order Taylorapproximation\fw.r.t. c. When initialized to zero, the c vector often converges to the wrong displacement, but this\nproblem can be avoided with high probability using a coarse-to-\ufb01ne technique [16,17]. Doing so\nrequires a slight alteration to our inference algorithm: now we must solve a sequence of optimiza-\ntion problems on frame pairs convolved with a Gaussian kernel whose variance is progressively\ndecreased. At each step in the sequence, both frames are convolved by the kernel before a patch is\nselected. For the \ufb01rst step, the c variables are initialized to zero; for subsequent steps they are ini-\ntialized to the solution of the previous step. For our analytical shifting operator, two blurring \ufb01lters\n\u2013 \ufb01rst a 5x5 kernel with variance 10, then a 3x3 kernel with variance 5 \u2013 reliably gives a proper\ninitialization for the \ufb01nal minimization that runs on the unaltered data.\nFor control purposes, the video for this experiment comes from a camera \ufb02y over; thus, most of\nthe motion in the scene is due to camera motion. We apply the model pairs of 11x11 patches,\nselected from random locations in the video, but discarding patches near the horizon where there is\nlittle or no motion. We initialize M = 16; after learning, the basis function with the longest norm\nhas the structure of a shift operator in the primary direction of motion taking place in the video\nsequence. Using these 16 operators, we run inference on 1,000 randomly selected pairs of patches\nfrom a second video, not used during learning, and measure the quality of the reconstruction as the\ntrajectory is used to predict into the future. At 5 frames into the future, our model is able to maintain\nan average SNR of 7, compared to SNR 5 when a \ufb01rst-order Taylor approximation is used in place\nof the matrix exponential; for comparison, the average SNR for the identity transformation model\non this data is 1.\nSince the primary form of motion going on in these small patches is translation, we also train a\nsingle operator using arti\ufb01cially translated natural data to make clear that the model can learn this\ncase completely. For this last experiment we take a 360x360 pixel frame of our natural movie, and\ncontinuously translate the entire frame in the Fourier domain by a displacement chosen uniformly\nat random from [0, 3] pixels. We then randomly select a 15x15 region on the interior of the pair of\nframes and use the two 225 pixel vectors as our x0 and x1. We modify the objective function to be\n\n(cid:88)\n\nm\n\nE =\n\n1\n2\n\n||W [x1 \u2212 T(c) x0]||2\n\n2 + \u03b3\n2\n\n||\u03a8m||2\n\nF + \u03b6||c||1 ,\n\n(12)\n\nwhere W is a binary windowing function that selects the central 9x9 region from a 15x15 patch;\nthus, the residual errors that come from new content translating into the patch are ignored. After\nlearning, the basis function \u03a81 (shown in \ufb01gure 5) is capable of translating natural images up to 3\npixels while maintaining an average SNR of 16 in the 9x9 center region. Figure 6 shows how this\noperator is able to correctly interpolate between two measurements of a shifted image in order to\ntemporally up-sample a movie.\n\n5.2 Scale\n\nThe model can also learn to transform between successive scales in the Laplacian pyramid built\nfrom a single frame of a video sequence. Figure 7 depicts the system transforming an image patch\nfrom scale 2 to 3 of a 256x256 pixel image. We initialize M = 100, but many basis elements shrink\nduring learning; we use only the 16 \u03a8m with non-negligible norm to encode a scale change. The\nbasis \u03a8 is initialized to mean-zero white Gaussian noise with variance 0.01; the same inference and\nlearning procedures as described for the point sets are then run on pairs x0, x1 selected at random\nfrom the corpus of image sequences in the following way. First, we choose a random frame from\na random sequence, then up-sample and blur scale 3 of its Laplacian pyramid. Second, we select\nan 8x8 patch from scale 2 (x0) of the corresponding up-blurred image patch (x1). Were it not for\nthe highly structured manifold on which natural images live, the proposition of \ufb01nding an operator\nthat maps a blurred, subsampled image to its high-resolution original state would seem untenable.\nHowever, our results show that in many cases, a reduced representation of such two-way mappings\ncan be found, even for small patches.\n\n6 Discussion and conclusions\n\nWe have shown that it is possible to learn low-dimensional parameterizations of operators that trans-\nport along non-linear manifolds formed by natural images, both across time and scale. Our focus\nthus far has been primarily on understanding the model and how to properly optimize its parameters,\n\n7\n\n\fFigure 7: Learning transformations across scale.\n(a) Scale 3 of the Laplacian pyramid for a\nnatural scene we wish to code, by describing how it transforms across scale, in terms of our learned\ndictionary. (b) The estimated scale 2, computed by transforming 8x8 regions of the up-sampled and\nblurred scale 3. The estimated scale 2 has SNR 9.60; (c) shows the actual scale 2 and (d) shows the\nerrors made by our estimation. For reconstruction we use only 16 dictionary elements.\n\nas little work has previously been done on learning such high dimensional Lie groups. A promising\ndirection for future work is to explore higher-order models capable of capturing non-commutative\noperators, such as\n\nxt = exp(\u03a81 c1) exp(\u03a82 c2) \u00b7\u00b7\u00b7 exp(\u03a8K cK) x0 ,\n\n(13)\nas this formulation may be more parsimonious for factoring apart transformations which are preva-\nlent in natural movies, such as combinations of translation and rotation.\nEarly attempts to model the manifold structure of images train on densely sampled point clouds\nand \ufb01nd an embedding into a small number of coordinates along the manifold. However such an\napproach does not actually constitute a model, since there is no function for mapping arbitrary\npoints, or moving along the manifold. One must always refer back to original data points on which\nthe model was trained \u2013 i.e., it works as a lookup table rather than being an abstraction of the data.\nHere, by learning operators that transport along the manifold we have been able to learn a compact\ndescription of its structure.\nThis model-based representation can be leveraged to compute geodesics using a numerical approxi-\nmation to the arc length integral:\n||A exp(A t)||2\n\n) x0 \u2212 exp(A t \u2212 1\n\nS =\n\n) x0||2\n2 ,\n\nT\n\n(14)\n\n(cid:90) 1\n\n0\n\nT(cid:88)\n\nt=1\n\n2 dt = lim\nT\u2192\u221e\n\n|| exp(A t\nT\n\nwhere T is the number of segments chosen to use in the piecewise linear approximation of the curve,\nand each term in the summation gives the length of a segment. We believe that this aspect of our\nmodel will be of use in dif\ufb01cult classi\ufb01cation problems, such as face identi\ufb01cation, where Euclidean\ndistances measured in pixel-space give poor results.\nPrevious attempts to learn Lie group operators have focused on linear approximations. Here we show\nthat utilizing the full Lie operator/matrix exponential in learning, while computationally intensive,\nis tractable, even in the extremely high dimensional cases required by models of natural movies.\nOur spectral decomposition is the key component that enables this, and, in combination with careful\nmitigation of local minima in the objective function using a coarse-to-\ufb01ne technique, gives us the\npower to factor out large transformations from data.\nOne shortcoming of the approach described here is that transformations are modeled in the original\npixel domain. Potentially these transformations may be described more economically by working in\na feature space, such as a sparse decomposition of the image. This is a direction of ongoing work.\n\nAcknowledgments\n\nThe authors gratefully acknowledge many useful discussions with Jascha Sohl-Dickstein, Jimmy\nWang, Kilian Koepsell, Charles Cadieu, and Amir Khosrowshahi, and the insightful comments from\nour anonymous reviewers.\n\n8\n\n(a)(b)(c)(d)Scale 3Estimated Scale 2Actual Scale 2Error\fReferences\n\n[1] VanGool, L., Moons, T., Pauwels, E. & Oosterlinck, A. (1995) Vision and Lie\u2019s approach to invariance.\nImage and Vision Computing, 13(4): 259-277.\n[2] Miao, X. & Rao, R.P.N. (2007) Learning the Lie groups of visual invariance. Neural Computation, 19(10):\n2665-2693.\n[3] Rao, R.P.N & Ruderman D.L. (1999) Learning Lie Groups for Invariant Visual Perception. Advances in\nNeural Information Processing Systems, 11:810-816. Cambridge, MA: MIT Press.\n[4] Grimes, D.B., & Rao, R.P.N. (2002). A Bilinear Model for Sparse Coding. Advances in Neural Information\nProcessing Systems, 15. Cambridge, MA: MIT Press.\n[5] Olshausen, B.A., Cadieu, C., Culpepper, B.J. & Warland, D. (2007) Bilinear Models of Natural Images.\nSPIE Proceedings vol. 6492: Human Vision Electronic Imaging XII (B.E. Rogowitz, T.N. Pappas, S.J. Daly,\nEds.), Jan 28 - Feb 1, 2007, San Jose, California.\n[6] Tenenbaum, J. B. & Freeman, W. T. (2000) Separating style and content with bilinear models. Neural\nComputation, 12(6):1247-1283.\n[7] Roweis, S. & Saul, L. (2000) Nonlinear dimensionality reduction by locally linear embedding. Science,\n290(5500): 2323-2326.\n[8] Weinberger, K. Q. & Saul, L. K. (2004) Unsupervised learning of image manifolds by semide\ufb01nite pro-\ngramming. Computer Vision and Pattern Recognition.\n[9] Tenenbaum, J. B., de Silva, V. & Langford, J. C. (2000) A Global Geometric Framework for Nonlinear\nDimensionality Reduction. Science, 22 December 2000: 2319-2323.\n[10] Belkin, M., & Niyogi, P. (2002). Laplacian eigenmaps and spectral techniques for embedding and cluster-\ning. Advances in Neural Information Processing Systems, 14. Cambridge, MA: MIT Press.\n[11] Wang, C. M., Sohl-Dickstein, J., & Olshausen, B. A. (2009) Unsupervised Learning of Lie Group Opera-\ntors from Natural Movies. Redwood Center for Theoretical Neuroscience, Technical Report; RCTR 01-09.\n[12] Dollar, P., Rabaud, V., & Belongie, S (2007) Non-isometric Manifold Learning: Analysis and an Algo-\nrithm. Int. Conf. on Machine Learning , 241-248.\n[13] Olshausen, B.A. & Field, D.J. (1997) Sparse Coding with an Overcomplete Basis Set: A Strategy Em-\nployed by V1? Vision Research, 37: 3311-3325.\n[14] Ortiz, M., Radovitzky, R.A. & Repetto, E.A (2001) The computation of the exponential and logarithmic\nmappings and their \ufb01rst and second linearizations. International Journal For Numerical Methods In Engineer-\ning. 52: 1431-1441.\n[15] Berens, P. & Velasco, M. J. (2009) The circular statistics toolbox for Matlab. MPI Technical Report, 184.\n[16] Anandan, P. (1989) A computational framework and an algorithm for the measurement of visual motion.\nInt. J. Comput. Vision, 2(3): 283-310.\n[17] Glazer, F. (1987) Hierarchical Motion Detection. Ph.D. thesis, Univ. of Massachusetts, Amherst, MA;\nCOINS TR 87-02.\n\n9\n\n\f", "award": [], "sourceid": 1147, "authors": [{"given_name": "Benjamin", "family_name": "Culpepper", "institution": null}, {"given_name": "Bruno", "family_name": "Olshausen", "institution": null}]}