{"title": "Predicting EMG Data from M1 Neurons with Variational Bayesian Least Squares", "book": "Advances in Neural Information Processing Systems", "page_first": 1361, "page_last": 1368, "abstract": null, "full_text": "Predicting EMG Data from M1 Neurons\nwith Variational Bayesian Least Squares\n\nJo-Anne Ting1, Aaron D\u2019Souza1\n\nKenji Yamamoto3, Toshinori Yoshioka2 , Donna Ho\ufb00man3\n\nShinji Kakei4, Lauren Sergio6, John Kalaska5\nMitsuo Kawato2, Peter Strick3, Stefan Schaal1,2\n\n1Comp. Science & Neuroscience, U.of S. California, Los Angeles, CA 90089, USA\n\n2ATR Computational Neuroscience Laboratories, Kyoto 619-0288, Japan\n\n3University of Pittsburgh, Pittsburgh, PA 15261, USA\n\n4Tokyo Metropolitan Institute for Neuroscience, Tokyo 183-8526, Japan\n\n5University of Montreal, Montreal, Canada H3C-3J7\n6York University, Toronto, Ontario, Canada M3J1P3\n\nAbstract\n\nAn increasing number of projects in neuroscience requires the sta-\ntistical analysis of high dimensional data sets, as, for instance, in\npredicting behavior from neural \ufb01ring or in operating arti\ufb01cial de-\nvices from brain recordings in brain-machine interfaces. Linear\nanalysis techniques remain prevalent in such cases, but classical\nlinear regression approaches are often numerically too fragile in\nhigh dimensions. In this paper, we address the question of whether\nEMG data collected from arm movements of monkeys can be faith-\nfully reconstructed with linear approaches from neural activity in\nprimary motor cortex (M1). To achieve robust data analysis, we\ndevelop a full Bayesian approach to linear regression that auto-\nmatically detects and excludes irrelevant features in the data, reg-\nularizing against over\ufb01tting.\nIn comparison with ordinary least\nsquares, stepwise regression, partial least squares, LASSO regres-\nsion and a brute force combinatorial search for the most predictive\ninput features in the data, we demonstrate that the new Bayesian\nmethod o\ufb00ers a superior mixture of characteristics in terms of reg-\nularization against over\ufb01tting, computational e\ufb03ciency and ease of\nuse, demonstrating its potential as a drop-in replacement for other\nlinear regression techniques. As neuroscienti\ufb01c results, our anal-\nyses demonstrate that EMG data can be well predicted from M1\nneurons, further opening the path for possible real-time interfaces\nbetween brains and machines.\n\n1 Introduction\n\nIn recent years, there has been growing interest in large scale analyses of brain ac-\ntivity with respect to associated behavioral variables. For instance, projects can be\nfound in the area of brain-machine interfaces, where neural \ufb01ring is directly used\nto control an arti\ufb01cial system like a robot [1, 2], to control a cursor on a computer\nscreen via non-invasive brain signals [3] or to classify visual stimuli presented to\n\n\fa subject [4, 5]. In these projects, the brain signals to be processed are typically\nhigh dimensional, on the order of hundreds or thousands of inputs, with large num-\nbers of redundant and irrelevant signals. Linear modeling techniques like linear\nregression are among the primary analysis tools [6, 7] for such data. However, the\ncomputational problem of data analysis involves not only data \ufb01tting, but requires\nthat the model extracted from the data has good generalization properties. This is\ncrucial for predicting behavior from future neural recordings, e.g., for continual on-\nline interpretation of brain activity to control prosthetic devices or for longitudinal\nscienti\ufb01c studies of information processing in the brain. Surprisingly, robust linear\nmodeling of high dimensional data is non-trivial as the danger of \ufb01tting noise and\nencountering numerical problems is high. Classical techniques like ridge regression,\nstepwise regression or partial least squares regression are known to be prone to\nover\ufb01tting and require careful human supervision to ensure useful results.\nIn this paper, we will focus on how to improve linear data analysis for the high di-\nmensional scenarios described above, with a view towards developing a statistically\nrobust \u201cblack box\u201d approach that automatically detects the most relevant input\ndimensions for generalization and excludes other dimensions in a statistically sound\nway. For this purpose, we investigate a full Bayesian treatment of linear regres-\nsion with automatic relevance detection [8]. Such an algorithm, called Variational\nBayesian Least Squares (VBLS), can be formulated in closed form with the help of\na variational Bayesian approximation and turns out to be computationally highly\ne\ufb03cient. We apply VBLS to the reconstruction of EMG data from motor cortical\n\ufb01ring, using data sets collected by [9] and [10, 11]. This data analysis addresses\nimportant neuroscienti\ufb01c questions in terms of whether M1 neurons can directly\npredict EMG traces [12], whether M1 has a muscle-based topological organization\nand whether information in M1 should be used to predict behavior in future brain-\nmachine interfaces. Our main focus in this paper, however, will be on the robust\nstatistical analysis of these kinds of data. Comparisons with classical linear analy-\nsis techniques and a brute force combinatorial model search on a cluster computer\ndemonstrate that our VBLS algorithm achieves the \u201cblack box\u201d quality of a robust\nstatistical analysis technique without any tunable parameters.\nIn the following sections, we will \ufb01rst sketch the derivation of Variational Bayesian\nLeast Squares and subsequently perform extensive comparative data analysis of this\ntechnique in the context of prediction EMG data from M1 neural \ufb01ring.\n\n2 High Dimensional Regression\n\nBefore developing our VBLS algorithm, let us brie\ufb02y revisit classical linear regres-\nsion techniques. The standard model for linear regression is:\n\ndX\n\nm=1\n\ny =\n\nbmxm + \u0001\n\n(1)\n\n(cid:1)\n\nXT X\n\nwhere b is the regression vector composed of bm components, d is the number of\ninput dimensions, \u0001 is additive mean-zero noise, x are the inputs and y are the\noutputs. The Ordinary Least Squares (OLS) estimate of the regression vector is\n(cid:2)\u22121 XT y. The main problem with OLS regression in high dimensional\nb =\n(cid:2)\u22121 is often violated due to\ninput spaces is that the full rank assumption of\nunderconstrained data sets. Ridge regression can \u201c\ufb01x\u201d such problems numerically,\nbut introduces uncontrolled bias. Additionally, if the input dimensionality exceeds\naround 1000 dimensions, the matrix inversion can become prohibitively computa-\ntionally expensive.\nSeveral ideas exist how to improve over OLS. First, stepwise regression [13] can\nbe employed. However, it has been strongly criticized for its potential for over\ufb01t-\nting and its inconsistency in the presence of collinearity in the input data [14]. To\n\nXT X\n\n(cid:1)\n\n\fxi1\n\nyi\n\nxid\n\ni=1..N\n(a) Linear regression\n\nxi1\n\nxid\n\nzi1\n\nzid\n\nyi\n\ni=1..N\n\nb1\n\nbd\n\n\u03b11\n\n\u03b1d\n\nb1\n\nbd\n\nxi1\n\nxid\n\nzi1\n\nzid\n\nyi\n\ni=1..N\n\n(b) Probabilistic back\ufb01tting\n\n(c) VBLS\n\nFigure 1: Graphical Models for Linear Regression. Random variables are in circular\nnodes, observed random variables are in double circles and point estimated parameters are\nin square nodes.\n\ndeal with such collinearity directly, dimensionality reduction techniques like Prin-\ncipal Components Regression (PCR) and Factor Regression (FR) [15] are useful.\nThese methods retain components in input space with large variance, regardless of\nwhether these components in\ufb02uence the prediction [16], and can even eliminate low\nvariance inputs that may have high predictive power for the outputs [17]. Another\nclass of linear regression methods are projection regression techniques, most notably\nPartial Least Squares Regression (PLS) [18]. PLS performs computationally inex-\npensive O(d) univariate regressions along projection directions, chosen according to\nthe correlation between inputs and outputs. While slightly heuristic in nature, PLS\nis a surprisingly successful algorithm for ill-conditioned and high-dimensional re-\ngression problems, although it also has a tendency towards over\ufb01tting [16]. LASSO\n(Least Absolute Shrinkage and Selection Operator) regression [19] shrinks certain\nregression coe\ufb03cients to 0, giving interpretable models that are sparse. However, a\ntuning parameter needs to be set, which can be done using n-fold cross-validation\nor manual hand-tuning. Finally, there are also more e\ufb03cient methods for matrix\ninversion [20, 21], which, however, assume a well-condition regression problem a\npriori and degrade in the presence of collinearities in inputs.\nIn the following section, we develop a linear regression algorithm in a Bayesian\nframework that automatically regularizes against problems of over\ufb01tting. Moreover,\nthe iterative nature of the algorithm, due to its formulation as an Expectation-\nMaximization problem [22], avoids the computational cost and numerical problems\nof matrix inversions. Thus, it addresses the two major problems of high-dimensional\nOLS simultaneously. Conceptually, the algorithm can be interpreted as a Bayesian\nversion of either back\ufb01tting or partial least squares regression.\n\n3 Variational Bayesian Least Squares\n\nFigure 1 illustrates the progression of graphical models that we need in order to\ndevelop a robust Bayesian version of linear regression. Figure 1a depicts the stan-\ndard linear regression model. In the spirit of PLS, if we knew an optimal projection\ndirection of the input data, then the entire regression problem could be solved by\na univariate regression between the projected data and the outputs. This optimal\nprojection direction is simply the true gradient between inputs and outputs. In the\ntradition of EM algorithms [22], we encode this projection direction as a hidden\nvariable, as shown in Figure 1b. The unobservable variables zim (where i = 1..N\ndenotes the index into the data set of N data points) are the results of each input\nbeing multiplied with its corresponding component of the projection vector (i.e.\nbm). Then, the zim are summed up to form a predicted output yi.\nMore formally, the linear regression model in Eq. (1) is modi\ufb01ed to become:\n\nzim = bmxim\n\nyi =\n\nzim + \u0001\n\ndX\n\nm=1\n\n\fFor a probabilistic treatment with EM, we make a standard normal assumption of\nall distributions in form of:\n\nyi|zi \u223c Normal\n\nyi; 1T zi, \u03c8y\n\nzim|xi \u223c Normal (zim; bmxim, \u03c8zm)\n\n\u201c\n\n\u201d\n\nwhere 1 = [1, 1, .., 1]T . While this model is still identical to OLS, notice that in the\ngraphical model, the regression coe\ufb03cients bm are behind the fan-in to the outputs\nyi. Given the data D = {xi, yi}N\ni=1, we can view this new regression model as an\nEM problem and maximize the incomplete log likelihood log p(y|X) by maximizing\nthe expected complete log likelihood (cid:1)log p(y, Z|X)(cid:2):\nyi \u2212 1T zi\n\nlog p(y, Z|X) = \u2212 N\n\n\u00b42 \u2212 N\n\nPN\n\nPd\n\nm=1 log \u03c8zm\n\n`\n\ni=1\n\n2\n\n2 log \u03c8y \u2212 1\n\u2212Pd\n\n1\n\n2\u03c8y\n\nm=1\n\n2\u03c8zm (zim \u2212 bmxim)\n\n2\n\n+ const\n\n(2)\n\ns\n\n2\n\nzm\n\ni=1\n\n`\n\nim\n\nPN\n\n\u03c8zm\n\n\u201di\n\n+ \u03c32\n\ni=1 x2\n\nbm =\n\nM-step :\n\n1T \u03a3z1 =\n\n2 + 1T \u03a3z1\n\n\u03c8y = 1\nN\n\u03c8zm = 1\nN\n\n1 \u2212 1\n\u00b4\nyi \u2212 bT xi\n\n\u201cPd\nm=1 \u03c8zm\n\u00b4\n\n\u201cPd\nE-step :\n`\nm=1 \u03c8zm\n1 \u2212 1\n\u03c32\nzm = \u03c8zm\n(cid:2)zim(cid:3) = bmxi + 1\n\nwhere Z denotes the N by d matrix of all zim. The resulting EM updates require\nstandard manipulations of normal distributions and result in:\n\u201dh\n\ni=1(cid:1)zim(cid:2)xim\nyi \u2212 1T (cid:2)zi(cid:3)\u00b4\nPN\nPN\nPN\ni=1 ((cid:2)zim(cid:3) \u2212 bmxim)\nPd\nm=1 \u03c8xm and \u03a3z = Cov(z|y, X). It is very important to\nwhere we de\ufb01ne s = \u03c8y +\nnote that one EM update has a computationally complexity of O(d), where d is the\nnumber of input dimensions, instead of the O(d3) associated with OLS regression.\nThis e\ufb03ciency comes at the cost of an iterative solution, instead of a one-shot\nsolution for b as in OLS. It can be proved that this EM version of least squares\nregression is guaranteed to converge to the same solution as OLS [23].\nThis new EM algorithm appears to only replace the matrix inversion in OLS by an\niterative method, as others have done with alternative algorithms [20, 21], although\nthe convergence guarantees of EM are an improvement over previous approaches.\nThe true power of this probabilistic formulation, though, becomes apparent when we\nadd a Bayesian layer that achieves the desired robustness in face of ill-conditioned\ndata.\n\ns\n\u03c8xm\n\n`\n\ns\n\n3.1 Automatic Relevance Determination\n\nFrom a Bayesian point of view, the parameters bm should be treated probabilistically\nso that we can integrate them out to safeguard against over\ufb01tting. For this purpose,\nas shown in Figure 1c, we introduce precision variables \u03b1m over each regression\nparameter bm:\n\n`\n\n\u00b4 1\n\nQd\nQd\n\np(b|\u03b1) =\np(\u03b1) =\n\nm=1\n\nm=1\n\n\u03b1m\n2\u03c0\n\n2 exp\n\nba\u03b1\n\u03b1\n\nGamma(a\u03b1)\n\n\u00af\n\u02d8\u2212 \u03b1m\nb2\nm\nexp{\u2212b\u03b1\u03b1m}\n\u03b1(a\u03b1\u22121)\n\nm\n\n2\n\n(3)\n\nwhere \u03b1 is the vector of all \u03b1m. In order to obtain a tractable posterior distribution\nover all hidden variables b, zim and \u03b1, we use a factorial variational approximation\nof the true posterior Q(\u03b1, b, Z) = Q(\u03b1, b)Q(Z). Note that the connection from\nthe \u03b1m to the corresponding zim in Figure 1c is an intentional design. Under this\ngraphical model, the marginal distribution of bm becomes a Student t-distribution\nthat allows traditional hypothesis testing [24]. The minimal factorization of the\nposterior into Q(\u03b1, b)Q(Z) would not be possible without this special design.\nThe resulting augmented model has the following distributions:\n\nyi|zi \u223c N (yi; 1T zi, \u03c8y)\n\nzim|bm, \u03b1mxim \u223c N (zim; bmxim, \u03c8zm/\u03b1m)\n\nbm|\u03b1m \u223c N (wbm; 0, 1/\u03b1m)\n\n\u03b1m \u223c Gamma(\u03b1m; a\u03b1, b\u03b1)\n\n\fWe now have a mechanism that infers the signi\ufb01cance of each dimension\u2019s contribu-\ntion to the observed output y. Since bm is zero mean, a very large \u03b1m (equivalent\nto a very small variance of bm) suggests that bm is very close to 0 and has no contri-\nbution to the output. An EM-like algorithm [25] can be used to \ufb01nd the posterior\nupdates of all distributions. We omit the EM update equations due to space con-\nstraints as they are similar to the EM update above and only focus on the posterior\nupdate for bm and \u03b1:\n\n\u03c32\nbm|\u03b1m = \u03c8zm\n\u03b1m\n(cid:2)bm|\u03b1m(cid:3) =\n\n\u201cPN\n\n\u201cPN\n\ni=1 x2\n\nim + \u03c8zm\n\n\u201d\u22121\n\u201d\u22121\u201cPN\n\ni=1 x2\n\nim + \u03c8zm\n\nN\n\n\u201d\n\ni=1 (cid:2)zim(cid:3) xim\n\u201cPN\n\u00b8 \u2212\n\u201d\n\ni=1 x2\n\n(cid:2)bm|\u03b1m(cid:3)(n)\n\n\u02c6a\u03b1 = a\u03b1 +\n\n2\n\u02c6b(m)\n\u03b1 = b\u03b1 + 1\n\n\u201d\u22121\u201cPN\nNote that the update equation for (cid:1)bm|\u03b1m(cid:2) can be rewritten as:\n\nim + \u03c8zm\n\nz2\nim\n\n2\u03c8zm\n\n\u02d9\n\ni=1\n\ni=1 (cid:2)zim(cid:3) xim\n\njPN\n\u201c PN\n\n(4)\n\n\ufb00\n\n\u201d2\n\n(cid:2)bm|\u03b1m(cid:3)(n+1)\n\n=\n\ni=1 x2\n\nim\n\nPN\n\ni=1 x2\n\nim+\u03c8zm\n\n+ \u03c8zm\ns\u03b1m\n\nPN\n\ni=1(yi\u2212(cid:1)b|\u03b1(cid:2)(n)T xi)xim\n\nPN\n\ni=1 x2\n\nim+\u03c8zm\n\n(5)\n\nEq.\n(5) demonstrates that in the absence of a correlation between the current\ninput dimension and the residual error, the \ufb01rst term causes the current regression\ncoe\ufb03cient to decay. The resulting regression solution regularizes over the number of\nretained inputs in the \ufb01nal regression vector, performing a functionality similar to\nAutomatic Relevance Determination (ARD) [8]. The update equations\u2019 algorithmic\ncomplexity remains O(d). One can further show that the marginal distribution of all\nbm is a t-distribution with t = (cid:1)bm|\u03b1m(cid:2) /\u03c3bm|\u03b1m and 2\u02c6a\u03b1 degrees of freedom, which\nallows a principled way of determining whether a regression coe\ufb03cient was excluded\nby means of standard hypothesis testing. Thus, Variational Bayesian Least Squares\n(VBLS) regression is a full Bayesian treatment of the linear regression problem.\n\n4 Evaluation\n\nWe now turn to the application and evaluation of VBLS in the context of predict-\ning EMG data from neural data recorded in M1 of monkeys. The key questions\naddressed in this application were i) whether EMG data can be reconstructed ac-\ncurately with good generalization, ii) how many neurons contribute to the recon-\nstruction of each muscle and iii) how well the VBLS algorithm compares to other\nanalysis techniques. The underlying assumption of this analysis is that the rela-\ntionship between neural \ufb01ring and muscle activity is approximately linear.\n\n4.1 Data sets\n\nWe investigated data from two di\ufb00erent experiments. In the \ufb01rst experiment by\nSergio & Kalaska [9], the monkey moved a manipulandum in a center-out task in\neight di\ufb00erent directions, equally spaced in a horizontal planar circle of 8cm radius.\nA variation of this experiment held the manipulandum rigidly in place, while the\nmonkey applied isometric forces in the same eight directions. In both conditions,\nmovement or force, feedback was given through visual display on a monitor. Neural\nactivity for 71 M1 neurons was recorded in all conditions (2400 data points for each\nneuron), along with the EMG outputs of 11 muscles.\nThe second experiment by Kakei et al. [10] involved a monkey trained to perform\neight di\ufb00erent combinations of wrist \ufb02exion-extension and radial-ulnar movements\nwhile in three di\ufb00erent arm postures (pronated, supinated and midway between the\ntwo). The data set consisted of neural data of 92 M1 neurons that were recorded\n\n\f3\n\n2.5\n\n2\n\nE\nS\nM\nn\n\n1.5\n\n1\n\n0.5\n\n0\n\nOLS\nSTEP\nPLS\nLASSO\nVBLS\nModelSearch\n\nnMSE Train\n\nnMSE Test \n\n(a) Sergio & Kalaska [9] data\n\n3\n\n2.5\n\n2\n\nE\nS\nM\nn\n\n1.5\n\n1\n\n0.5\n\n0\n\nOLS\nSTEP\nPLS\nLASSO\nVBLS\nModelSearch\n\nnMSE Train\n\nnMSE Test \n(b) Kakei et al. [10] data\n\nFigure 2: Normalized mean squared error for Cross-validation Sets (6-fold for [10] and\n8-fold for [9])\n\nSergio & Kalaska data set\n\nKakei et al. data set\n\nVBLS PLS\n93.6% 7.44% 8.71%\n87.1% 40.1% 72.3%\n\nSTEP LASSO\n8.42%\n76.3%\n\nTable 1: Percentage neuron matches between baseline and all other algorithms, averaged\nover all muscles in the data set\nat all three wrist postures (producing 2664 data points for each neuron) and the\nEMG outputs of 7 contributing muscles. In all experiments, the neural data was\nrepresented as average \ufb01ring rates and was time aligned with EMG data based on\nanalyses that are outside of the scope of this paper.\n4.2 Methods\n\nFor the Sergio & Kalaska data set, a baseline comparison of good EMG reconstruction\nwas obtained through a limited combinatorial search over possible regression models. A\nparticular model is characterized by a subset of neurons that is used to predict the EMG\ndata. Given 71 neurons, theoretically 271 possible models exist. This value is too large\nfor an exhaustive search. Therefore, we considered only possible combinations of up to 20\nneurons, which required several weeks of computation on a 30-node cluster computer. The\noptimal predictive subset of neurons was determined from an 8-fold cross validation. This\nbaseline study served as a comparison for PLS, stepwise regression, LASSO regression,\nOLS and VBLS. The \ufb01ve other algorithms used the same validation sets employed in the\nbaseline study. The number of PLS projections for each data \ufb01t was found by leave-one-\nout cross-validation. Stepwise regression used Matlab\u2019s \u201cstepwise\ufb01t\u201d function. LASSO\nregression was implemented, manually choosing the optimal tuning parameter over all\ncross-validation sets. OLS was implemented using a small ridge regression parameter of\n10\n\n\u221210 in order to avoid ill-conditioned matrix inversions.\n\nd\nn\nu\no\nF\n \ns\nn\no\nr\nu\ne\nN\n\n \nf\n\no\n\n \n\n#\n\n \n\ne\nv\nA\n\n90\n\n80\n\n70\n\n60\n\n50\n\n40\n\n30\n\n20\n\n10\n\n0\n\nSTEP\n\nPLS\n\nLASSO\n\nVBLS\n\nModelSearch\n\n1\n\n2\n\n3\n\n4\n\n5\n\n6\n\nMuscle\n\n7\n\n8\n\n9\n\n10\n\n11\n\nd\nn\nu\no\nF\n \ns\nn\no\nr\nu\ne\nN\n\n \nf\n\no\n\n \n\n#\n\n \n\ne\nv\nA\n\n90\n\n80\n\n70\n\n60\n\n50\n\n40\n\n30\n\n20\n\n10\n\n0\n\nSTEP\n\nPLS\n\nLASSO\n\nVBLS\n\nModelSearch\n\n1\n\n2\n\n3\n\n4\n\nMuscle\n\n5\n\n6\n\n7\n\n(a) Sergio & Kalaska [9] data\n\n(b) Kakei et al. [10] data\n\nFigure 3: Average Number of Relevant Neurons found over Cross-validation Sets (6-fold\nfor [10] and 8-fold for [9])\n\n\fThe average number of relevant neurons was calculated over all 8 cross-validation sets\nand a \ufb01nal set of relevant neurons was reached for each algorithm by taking the common\nneurons found to be relevant over the 8 cross-validation sets. Inference of relevant neurons\nin PLS was based on the subspace spanned by the PLS projections, while relevant neurons\nin VBLS were inferred from t-tests on the regression parameters, using a signi\ufb01cance of\np <0.05. Stepwise regression and LASSO regression determined the number of relevant\nneurons from the inputs that were included in the \ufb01nal model. Note that since OLS\nretained all input dimensions, this algorithm was omitted in relevant neuron comparisons.\n\nAnalogous to the \ufb01rst data set, a combinatorial analysis was performed on the Kakei et al.\ndata set in order to determine the optimal set of neurons contributing to each muscle (i.e.\nproducing the lowest possible prediction error) in a 6-fold cross-validation. PLS, stepwise\nregression, LASSO regression, OLS and VBLS were applied using the same cross-validation\nsets, employing the same procedure described for the \ufb01rst data set.\n\n4.3 Results\n\nFigure 2 shows that, in general, EMG traces seem to be well predictable from M1 neural\n\ufb01ring. VBLS resulted in a generalization error comparable to that produced by the base-\nline study. In the Kakei et al. dataset, all algorithms performed similarly, with LASSO\nregression performing a little better than the rest. However, OLS, stepwise regression,\nLASSO regression and PLS performed far worse on the Sergio & Kalaska dataset, with\nOLS regression attaining the worst error. Such performance is typical for traditional linear\nregression methods on ill-conditioned high dimensional data, motivating the development\nof VBLS. The average number of relevant neurons found by VBLS was slightly higher\nthan the baseline study, as seen in Figure 3. This result is not surprising as the baseline\nstudy did not consider all possible combination of neurons. Given the good generalization\nresults of VBLS, it seems that the Bayesian approach regularized the participating neu-\nrons su\ufb03ciently so that no over\ufb01tting occurred. Note that the results for muscle 6 and 7\nin Figure 3b seem to be due to some irregularities in the data and should be considered\noutliers. Table 1 demonstrates that the relevant neurons identi\ufb01ed by VBLS coincided at\na very high percentage with those of the baseline results, while PLS, stepwise regression\nand LASSO regression had inferior outcomes.\n\nThus, in general, VBLS achieved comparable performance with the baseline study when\nreconstructing EMG data from M1 neurons. While VBLS is an iterative statistical method,\nwhich performs slower than classical \u201cone-shot\u201d linear least squares methods (i.e., on the\norder of several minutes for the data sets in our analyses), it achieved comparable results\nwith our combinatorial model search, which took weeks on a cluster computer.\n\n5 Discussion\n\nThis paper addressed the problem of analyzing high dimensional data with linear regression\ntechniques, as encountered in neuroscience and the new \ufb01eld of brain-machine interfaces.\nTo achieve robust statistical results, we introduced a novel Bayesian technique for linear\nregression analysis with automatic feature detection, called Variational Bayesian Least\nSquares. Comparisons with classical linear regression methods and a \u201cgold standard\u201d\nobtained from a brute force search over all possible linear models demonstrate that VBLS\nperforms very well without any manual parameter tuning, such that it has the quality of\na \u201cblack box\u201d statistical analysis technique.\n\nA point of concern against the VBLS algorithm is how the variational approximation in\nthis algorithm a\ufb00ects the quality of function approximation. It is known that factorial\napproximations to a joint distribution create more peaked distributions, such that one\ncould potentially assume that VBLS might tend to over\ufb01t. However, in the case of VBLS,\na more peaked distribution over bm pushes the regression parameter closer to zero. Thus,\nVBLS will be on the slightly pessimistic side of function \ufb01tting and is unlikely to over\ufb01t.\nFuture evaluations and comparisons with Markov Chain Monte Carlo methods will reveal\nmore details of the nature of the variational approximation. Regardless, it appears that\nVBLS could become a useful drop-in replacement for various classical regression methods.\nIt lends itself to incremental implementation as would be needed in real-time analyses of\nbrain information.\n\n\fAcknowledgments\n\nThis research was supported in part by National Science Foundation grants ECS-0325383, IIS-0312802,\nIIS-0082995, ECS-0326095, ANI-0224419, a NASA grant AC#98 \u2212 516, an AFOSR grant on Intelligent\nControl, the ERATO Kawato Dynamic Brain Project funded by the Japanese Science and Technology\nAgency, the ATR Computational Neuroscience Laboratories and by funds from the Veterans Adminis-\ntration Medical Research Service.\n\nReferences\n\n[1] M.A. Nicolelis. Actions from thoughts. Nature, 409:403\u2013407, 2001.\n\n[2] D.M. Taylor, S.I. Tillery, and A.B. Schwartz. Direct cortical control of 3d neuroprosthetic devices.\n\nScience, 296:1829\u20131932, 2002.\n\n[3] J.R. Wolpaw and D.J. McFarland. Control of a two-dimensional movement signal by a noninvasive\nbrain-computer interface in humans. Proceedings of the National Academy of Sciences, 101:17849\u2013\n17854, 2004.\n\n[4] Y. Kamitani and F. Tong. Decoding the visual and subjective contents of the human brain. Nature\n\nNeuroscience, 8:679, 2004.\n\n[5] J.D. Haynes and G. Rees. Predicting the orientation of invisible stimuli from activity in human\n\nprimary visual cortex. Nature Neuroscience, 8:686, 2005.\n\n[6] J. Wessberg and M.A. Nicolelis. Optimizing a linear algorithm for real-time robotic control using\nchronic cortical ensemble recordings in monkeys. Journal of Cognitive Neuroscience, 16:1022\u20131035,\n2004.\n\n[7] S. Musallam, B.D. Corneil, B. Greger, H. Scherberger, and R.A. Andersen. Cognitive control signals\n\nfor neural prosthetics. Science, 305:258\u2013262, 2004.\n\n[8] R.M. Neal. Bayesian learning for neural networks. PhD thesis, Dept. of Computer Science,\n\nUniversity of Toronto, 1994.\n\n[9] L.E. Sergio and J.F. Kalaska. Changes in the temporal pattern of primary motor cortex activity in a\ndirectional isometric force versus limb movement task. Journal of Neurophysiology, 80:1577\u20131583,\n1998.\n\n[10] S. Kakei, D.S. Ho\ufb00man, and P.L. Strick. Muscle and movement representations in the primary\n\nmotor cortex. Science, 285:2136\u20132139, 1999.\n\n[11] S. Kakei, D.S. Ho\ufb00man, and P.L. Strick. Direction of action is represented in the ventral premotor\n\ncortex. Nature Neuroscience, 4:1020\u20131025, 2001.\n\n[12] E. Todorov. Direct cortical control of muscle activation in voluntary arm movements: a model.\n\nNature Neuroscience, 3:391\u2013398, 2000.\n\n[13] N. R. Draper and H. Smith. Applied Regression Analysis. Wiley, 1981.\n\n[14] S. Derksen and H.J. Keselman. Backward, forward and stepwise automated subset selection algo-\nrithms: Frequency of obtaining authentic and noise variables. British Journal of Mathematical\nand Statistical Psychology, 45:265\u2013282, 1992.\n\n[15] W.F. Massey. Principal component regression in exploratory statistical research. Journal of the\n\nAmerican Statistical Association, 60:234\u2013246, 1965.\n\n[16] S. Schaal, S. Vijayakumar, and C.G. Atkeson. Local dimensionality reduction. In M.I. Jordan, M.J.\nKearns, and S.A. Solla, editors, Advances in Neural Information Processing Systems. MIT Press,\n1998.\n\n[17] I.E. Frank and J.H. Friedman. A statistical view of some chemometric regression tools. Techno-\n\nmetrics, 35:109\u2013135, 1993.\n\n[18] H. Wold. Soft modeling by latent variables: The nonlinear iterative partial least squares approach.\nIn J. Gani, editor, Perspectives in probability and statistics, papers in honor of M. S. Bartlett.\nAcademic Press, 1975.\n\n[19] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of Royal Statistical Society,\n\nSeries B, 58(1):267\u2013288, 1996.\n\n[20] V. Strassen. Gaussian elimination is not optimal. Num Mathematik, 13:354\u2013356, 1969.\n\n[21] T. J. Hastie and R. J. Tibshirani. Generalized additive models. Number 43 in Monographs on\n\nStatistics and Applied Probability. Chapman and Hall, 1990.\n\n[22] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the em\n\nalgorithm. Journal of Royal Statistical Society. Series B, 39(1):1\u201338, 1977.\n\n[23] A. D\u2019Souza, S. Vijayakumar, and S. Schaal. The bayesian back\ufb01tting relevance vector machine. In\n\nProceedings of the 21st International Conference on Machine Learning. ACM Press, 2004.\n\n[24] A. Gelman, J. Carlin, H.S. Stern, and D.B. Rubin. Bayesian Data Analaysis. Chapman and Hall,\n\n2000.\n\n[25] Z. Ghahramani and M.J. Beal. Graphical models and variational methods. In D. Saad and M. Opper,\n\neditors, Advanced Mean Field Methods - Theory and Practice. MIT Press, 2000.\n\n\f", "award": [], "sourceid": 2841, "authors": [{"given_name": "Jo-anne", "family_name": "Ting", "institution": null}, {"given_name": "Aaron", "family_name": "D'souza", "institution": null}, {"given_name": "Kenji", "family_name": "Yamamoto", "institution": null}, {"given_name": "Toshinori", "family_name": "Yoshioka", "institution": null}, {"given_name": "Donna", "family_name": "Hoffman", "institution": null}, {"given_name": "Shinji", "family_name": "Kakei", "institution": null}, {"given_name": "Lauren", "family_name": "Sergio", "institution": null}, {"given_name": "John", "family_name": "Kalaska", "institution": null}, {"given_name": "Mitsuo", "family_name": "Kawato", "institution": null}]}