{"title": "Bayesian Experimental Design of Magnetic Resonance Imaging Sequences", "book": "Advances in Neural Information Processing Systems", "page_first": 1441, "page_last": 1448, "abstract": "We show how improved sequences for magnetic resonance imaging can be found through automated optimization of Bayesian design scores. Combining recent advances in approximate Bayesian inference and natural image statistics with high-performance numerical computation, we propose the first scalable Bayesian experimental design framework for this problem of high relevance to clinical and brain research. Our solution requires approximate inference for dense, non-Gaussian models on a scale seldom addressed before. We propose a novel scalable variational inference algorithm, and show how powerful methods of numerical mathematics can be modified to compute primitives in our framework. Our approach is evaluated on a realistic setup with raw data from a 3T MR scanner.", "full_text": "Bayesian Experimental Design of Magnetic\n\nResonance Imaging Sequences\n\nMatthias W. Seeger, Hannes Nickisch, Rolf Pohmann and Bernhard Sch\u00a8olkopf\n\n{seeger,hn,rolf.pohmann,bs}@tuebingen.mpg.de\n\n72012 T\u00a8ubingen, Germany\n\nMax Planck Institute for Biological Cybernetics\n\nSpemannstra\u00dfe 38\n\nAbstract\n\nWe show how improved sequences for magnetic resonance imaging can be\nfound through optimization of Bayesian design scores. Combining approximate\nBayesian inference and natural image statistics with high-performance numeri-\ncal computation, we propose the \ufb01rst Bayesian experimental design framework\nfor this problem of high relevance to clinical and brain research. Our solution\nrequires large-scale approximate inference for dense, non-Gaussian models. We\npropose a novel scalable variational inference algorithm, and show how powerful\nmethods of numerical mathematics can be modi\ufb01ed to compute primitives in our\nframework. Our approach is evaluated on raw data from a 3T MR scanner.\n\n1 Introduction\n\nMagnetic resonance imaging (MRI) [7, 2] is a key diagnostic technique in healthcare nowadays, and\nof central importance for experimental research of the brain. Without applying any harmful ioniz-\ning radiation, this technique stands out by its amazing versatility: by combining different types of\nradiofrequency irradiation and rapidly switched spatially varying magnetic \ufb01elds (called gradients)\nsuperimposing the homogeneous main \ufb01eld, a large variety of different parameters can be recorded,\nranging from basic anatomy to imaging blood \ufb02ow, brain function or metabolite distribution. For\nthis large spectrum of applications, a huge number of sequences has been developed that describe\nthe temporal \ufb02ow of the measurement, ranging from a relatively low number of multi-purpose tech-\nniques like FLASH [5], RARE [6], or EPI [9], to specialized methods for visualizing bones or\nperfusion. To select the optimum sequence for a given problem, and to tune its parameters, is a dif-\n\ufb01cult task even for experts, and even more challenging is the design of new, customized sequences\nto address a particular question, making sequence development an entire \ufb01eld of research [1]. The\nmain drawbacks of MRI are high initial and running costs, since a very strong homogeneous mag-\nnetic \ufb01eld has to be maintained, moreover long scanning times due to weak signals and limits to\ngradient amplitude. With this in mind, by far the majority of scienti\ufb01c work on improving MRI\nis motivated by obtaining diagnostically useful images in less time. Beyond reduced costs, faster\nimaging also leads to higher temporal resolution in dynamic sequences for functional MRI (fMRI),\nless annoyance to patients, and fewer artifacts due to patient motion.\nIn this paper, we employ Bayesian experimental design to optimize MRI sequences. Image recon-\nstruction from MRI raw data is viewed as a problem of inference from incomplete observations. In\ncontrast, current reconstruction techniques are non-iterative. For most sequences used in hospitals\ntoday, reconstruction is done by a single fast Fourier transform (FFT). However, natural and MR\nimages show stable low-level statistical properties,1 which allows them to be reconstructed from\n\n1These come from the presence of edges and smooth areas, which on a low level de\ufb01ne image structure, and\n\nwhich are not present in Gaussian data (noise).\n\n1\n\n\ffewer observations. In our work, a non-Gaussian prior distribution represents low-level spectral and\nlocal natural image statistics. A similar idea is known as compressed sensing (CS), which has been\napplied to MRI [8].\nA different and more dif\ufb01cult problem is to improve the sequence itself. In our Bayesian method,\na posterior distribution over images is maintained, which is essential for judging the quality of the\nsequence: the latter can be modi\ufb01ed so as to decrease uncertainty in regions or along directions of\ninterest, where uncertainty is quanti\ufb01ed by the posterior. Importantly, this is done without the need\nto run many MRI experiments in random a priori data collections. It has been proposed to design\nsequences by blindly randomizing aspects thereof [8], based on CS theoretical results. Beyond being\nhard to achieve on a scanner, our results indicate that random measurements do not work well for\nreal MR images. Similar negative \ufb01ndings for a variety of natural images are given in [12].\nOur proposal requires ef\ufb01cient Bayesian inference for MR images of realistic resolution. We present\na novel scalable variational approximate inference algorithm inspired by [16]. The problem is re-\nduced to numerical mathematics primitives, and further to matrix-vector multiplications (MVM)\nwith large, structured matrices, which are computed by ef\ufb01cient signal processing code. Most pre-\nvious algorithms [3, 14, 11] iterate over single non-Gaussian potentials, which renders them of no\nuse for our problem here.2 Our solutions for primitives required here should be useful for other ma-\nchine learning applications as well. Finally, we are not aware of Bayesian or classical experimental\ndesign methods for dense non-Gaussian models, scaling comparably to ours. The framework of\n[11] is similar, but could not be applied to the scale of interest here. Our model and experimental\ndesign framework are described in Section 2, a novel scalable approximate inference algorithm is\ndeveloped in Section 3, and our framework is evaluated on a large-scale realistic setup with scanner\nraw data in Section 4.\n\n2 Sparse Linear Model. Experimental Design\nDenote the desired MR image by u \u2208 Rn, where n is the number of pixels. Under ideal conditions,\nthe raw data y \u2208 Rm from the scanner is a linear map3 of u, motivating the likelihood\n\ny = Xu + \u03b5,\n\n\u03b5 \u223c N(0, \u03c32I).\n\nHere, each row of X is a single Fourier \ufb01lter, determined by the sequence. In the context of this\npaper, the problem of experimental design is how to choose X within a space of technically feasible\nsequences, so that u can be best recovered given y. As motivated in Section 1, we need to specify\na prior P (u) which represents low-level statistics of (MR) images, distinctly super-Gaussian distri-\nbutions \u2014 a Gaussian prior would not be a sensible choice. We use the one proposed in [12]. The\nposterior has the form\n\nqY\n\nP (u|y) \u221d N(y|Xu, \u03c32I)\n\ne\u2212\u02dc\u03c4j|sj|,\n\ns = Bu, \u02dc\u03c4j = \u03c4j/\u03c3,\n\n(1)\n\nj=1\n\nthe prior being a product of Laplacians on linear projections sj of u, among them the image gradient\nand wavelet coef\ufb01cients. The Laplace distribution encourages sparsity of s. Further details are given\nin [12]. MVMs with B cost O(q) with q \u2248 3n. MAP estimation for the same model was used in\n[8].\nBayesian inference for (1) is analytically not tractable, and an ef\ufb01cient deterministic approximation\nis discussed in Section 3. In the variant of Bayesian sequential experimental design used here, an\nextension of X by X\u2217 \u2208 Rd,n is scored by the entropy difference\n\n\u2206(X\u2217) := H[P (u|y)] \u2212 EP (y\u2217|y) [H[P (u|y, y\u2217)]] ,\n\n(2)\nwhere P (u|y, y\u2217) is the posterior after including (X\u2217, y\u2217). This criterion measures the decrease in\nuncertainty about u, averaged over the posterior P (y\u2217|y). Our approach is sequential: a sequence\nis combined from parts, each extension being chosen by maximizing the entropy difference over a\n\n2The model we use has q = 196096 potentials and n = 65536 latent variables. Any algorithm that iterates\nover single potentials, has to solve at least q linear systems of size n, while our method often converges after\nsolving less than 50 of these.\n\n3Phase contributions in u are discussed in Section 4.\n\n2\n\n\fcandidate set {X\u2217}. After each extension, a new scanner measurement is obtained for the single\nextended sequence only. Our Bayesian predictive approach allows us to score many candidates\n(X\u2217, y\u2217) without performing costly MR measurements for them. The sequential restriction makes\nsense for several reasons. First, MR sequences naturally decompose in a sequential fashion: they\ndescribe a discontinuous path of several smooth trajectories (see Section 4). Also, a non-sequential\napproach would never make use of any real measurements, relying much more on the correctness\nof the model. Finally, the computational complexity of optimizing over complete sequences is\nstaggering. Our sequential approach seems also better suited for dynamic MRI applications.\n\n3 Scalable Approximate Inference\n\nIn this section, we propose a novel scalable algorithm for the variational inference approx-\nFirst, e\u2212\u02dc\u03c4j|sj| =\nimation proposed in [3]. We make use of ideas presented in [16].\nmax\u03c0j >0 e\u2212\u03c0j s2\n, using Legendre duality (the Laplace site is log-convex in s2\nj)\n[3]. Let \u03c0 = (\u03c0j) and \u03a0 = diag \u03c0. To simplify the derivation, assume that BT \u03a0B is invertible,4\nand let Q(u) \u221d exp(\u2212uT BT \u03a0Bu/(2\u03c32)), Q(y, u) := P (y|u)Q(u). The joint distribution is\nGaussian, and\n\nj /(2\u03c32)e\u2212(\u03c4 2\n\nj /2)\u03c0\n\n\u22121\nj\n\nQ(u|y) = N(u|h, \u03c32\u03a3), \u03a3\u22121 = A := X T X + BT \u03a0B, h = \u03a3X T y.\n\n2 (\u03c4 2)T (\u03c0\u22121)|BT \u03a0B/(2\u03c0\u03c32)|\u22121/2R P (y|u)Q(u) du, and\n\nWe have that P (y) \u2265 e\u2212 1\n\n(3)\n\nZ\n\nP (y|u)Q(u) du = |2\u03c0\u03c32\u03a3|1/2 max\n\nu\n\nQ(u|y)Q(y) = |2\u03c0\u03c32\u03a3|1/2 max\n\nu\n\nP (y|u)Q(u),\n\nwhere the maximum is attained at u = h. Therefore, P (y) \u2265 C1(\u03c32)e\u2212\u03c6(\u03c0)/2 with\n\n\u03c6(\u03c0) := log |A| + (\u03c4 2)T (\u03c0\u22121) + min\n\n\u03c3\u22122ky \u2212 Xuk2 + \u03c3\u22122sT \u03a0s,\n\ns = Bu,\n\nand the bound is tightened by minimizing \u03c6(\u03c0). Now, g(\u03c0) := log |A| is concave, so we can\nuse another Legendre duality, g(\u03c0) = minz(cid:23)0 zT \u03c0 \u2212 g\u2217(z), to obtain an upper bound \u03c6z(\u03c0) =\nminu \u03c6z(u, \u03c0) \u2265 \u03c6(\u03c0). In the outer loop steps of our algorithm, we need to \ufb01nd the minimizer\nz \u2208 Rq\n+; the inner loop consists of minimizing the upper bound w.r.t. \u03c0 for \ufb01xed z. Introducing\n\u03b3 := \u03c0\u22121, we \ufb01nd that (u, \u03b3) 7\u2192 \u03c6z(u, \u03b3\u22121) is jointly convex, which follows just as in [16], and\nbecause zT (\u03b3\u22121) is convex (all zj \u2265 0). Minimizing over \u03b3 gives the convex problem\n\n\u03c3\u22122ky \u2212 Xuk2 + 2\n\nmin\nu\n\n\u221a\n\n\u03c4j\n\npj,\n\npj := zj + \u03c3\u22122s2\n\nj , s = Bu,\n\n(4)\n\nu\n\nqX\n\nj=1\n\nwhich is of standard form and can be solved very ef\ufb01ciently by the iteratively reweighted least\nsquares (IRLS) algorithm, a special case of Newton-Raphson. In every iteration, we have to solve\n(X T X + BT (diag e)B)d = r, where r, e are simple functions of u. We use the linear conjugate\ngradients (LCG) algorithm [4], requiring a MVM with X, X T , B, and BT per iteration. The\nline search along the Newton direction d can be done in O(q), no further MVMs are required.\nIn our experiments, IRLS converged rapidly. At convergence, \u03c00\nj)\u22121/2, p0 = p0(u0).\nFor updating z \u2192 z0 given \u03c0, note that \u03c0T z0 \u2212 g(\u03c0) = g\u2217(z0) = min \u02dc\u03c0 \u02dc\u03c0T z0 \u2212 g(\u02dc\u03c0), so that\n0 = \u2207\u03c0\u03c0T z0 \u2212 g(\u03c0) = z0 \u2212 \u2207\u03c0g(\u03c0), and\n\nz0 = diag\u22121(cid:0)BA\u22121BT(cid:1) = \u03c3\u22122(VarQ[sj | y]).\n\n(5)\nz0 cannot be computed by a few LCG runs. Since A has no sparse graphical structure, we cannot\nuse belief propagation either. However, the Lanczos algorithm can be used to estimate z0 [10].\nThis algorithm is also essential for scoring many candidates in each design step of our method (see\nSection 3.1).\nOur algorithm iterates between updates of z (outer loop steps) and inner loop convex optimization\nof (u, \u03c0). We show in [13] that min\u03c0 \u03c6(\u03c0) is a convex problem, whenever all model sites are\nlog-concave (as is the case for Laplacians), a \ufb01nding which is novel to the best of our knowledge.\n\nj = \u03c4j(p0\n\n4The end result is valid for singular BT \u03a0B, by a continuity argument.\n\n3\n\n\fOnce converged to the global optimum of \u03c6(\u03c0), the posterior is approximated by Q(\u00b7|y) of (3),\nwhose mean is given by u. The main idea is to decouple \u03c6(\u03c0) by upper bounding the critical\nterm log |A|. If the z updates are done exactly, the algorithm is globally convergent [16]. Our\nalgorithm is inspired by [16], where a different problem is addressed. Their method produces very\nsparse solutions of Xu \u2248 y, while our focus is on close approximate inference, especially w.r.t.\nthe posterior covariance matrix. It was found in [12] that aggressive sparsi\ufb01cation, notwithstanding\nbeing computationally convenient, hurts experimental design (and even reconstruction) for natural\nimages. Their update of z requires (5) as well, but can be done more cheaply, since most \u03c0j = +\u221e,\nand A can be replaced by a much smaller matrix. Finally, note that MAP estimation [8] is solving\n(4) once for z = 0, so can be seen as special case of our method.\n\n3.1 Lanczos Algorithm. Ef\ufb01cient Design\n\nThe Lanczos algorithm [4] is typically used to \ufb01nd extremal eigenvectors of large, positive de\ufb01nite\nmatrices A. Requiring an MVM with A in each iteration, it produces QT AQ = T \u2208 Rk,k\nafter k iterations, where QT Q = I, T tridiagonal. Lanczos estimates of expressions linear in\n\u03a3 = A\u22121 are obtained by plugging in the low-rank approximation QT \u22121QT \u2248 \u03a3 [10]. In our\ncase, z(k) := diag\u22121(BQT \u22121QT BT ) \u2192 z0, L(k) := log |T| \u2192 g(\u03c0). We also use Lanczos\nto compute entropy difference scores, approximating (2) by using Q(u|y) instead of P (u|y), and\nQ0(u|y) \u221d Q(u|y)P (y\u2217|u) instead of P (u|y, y\u2217), with \u03c00 = \u03c0. The expectation over P (y\u2217|y)\nneed not be done then, and\n\n\u2206(X\u2217) \u2248 \u2212 log |A| + log(cid:12)(cid:12)A + X T\u2217 X\u2217(cid:12)(cid:12) = log(cid:12)(cid:12)I + X\u2217\u03a3X T\u2217\n\n(cid:12)(cid:12) .\n\nFor nc candidates of d rows, computing scores would need d \u00b7 nc LCG runs, which is not feasible.\nUsing the Lanczos approximation of \u03a3, we need k MVMs with X\u2217 for each candidate, then nc\nCholesky decompositions of min{k, d} \u00d7 min{k, d} matrices. Both computations can readily be\nparallelized, as is done in our implementation. Note that we can compute \u2202\u2206(X\u2217)/\u2202\u03b1 for X\u2217 =\nX\u2217(\u03b1), if \u2202X\u2217/\u2202\u03b1 is known, so that gradient-based score optimization can be used.\nThe basic recurrence of the Lanczos method is treacherously simple. The loss of orthogonality\nin Q has to be countered, thus typical Lanczos codes are intricate. Q has to be maintained in\nmemory. The matrices A we encounter here, have an almost linearly decaying spectrum, so standard\nLanczos codes, designed for geometrically decaying spectra, have to be modi\ufb01ed. Our A have no\nclose low rank approximations, and eigenvalues from both ends of the spectrum converge rapidly in\nLanczos. Therefore, our estimate z(k) is not very close to the true z0 even for quite large k. However,\nz(k) (cid:22) z0, since zk\u22121,j \u2264 zk,j for all j. Since the sparsity penalty on sj in (4) is stronger for smaller\nzj, underestimations from the Lanczos algorithm entail more sparsity (although still zk,j > 0). In\npractice, a smaller k often leads to somewhat better results, besides running much faster. While the\nglobal convergence proof for our algorithm hinges on exact updates of z, which cannot be done to\nthe best of our knowledge, the empirical success of Section 4 may be due to this observation, noting\nthat natural image statistics are typically more super-Gaussian than the Laplacian. In conclusion,\napproximate inference requires the computation of marginal variances, which for general models\ncannot be approximated closely with generic techniques. In the context of sparse linear models, it\nseems to be suf\ufb01cient to estimate the dominating covariance eigendirections, for which the Lanczos\nalgorithm with a moderate number of steps can be used. More generally, the Lanczos method is a\npowerful tool for approximate inference in Gaussian models, an insight which does not seem to be\nwidely known in machine learning.\n\n4 Experiments\n\nWe start with some MRI terminology. An MR scanner acquires Fourier coef\ufb01cients Y (k) at spatial\nfrequencies5 k (the 2d Fourier domain is called k-space), along smooth trajectories k(t) determined\nby magnetic \ufb01eld gradients g(t). The control \ufb02ow is called sequence. Its cost is determined by how\nlong it takes to obtain a complete image, depending on the number of trajectories and their shapes.\nGradient amplitude and slew rate constraints enforce smooth trajectories. In Cartesian sampling,\ntrajectories are parallel equispaced lines in k-space, so the FFT can be used for image reconstruc-\ntion. Spiral sampling offers a better coverage of k-space for given gradient power, leading to faster\n\n5Both k and spatial locations r are seen as \u2208 R2 or \u2208 C.\n\n4\n\n\fIt\n\nFigure 1: MR signal acquisition: r-space and k-space represen-\ntation of the signal on a rectangular grid as well as the trajectory\nobtained by means of magnetic \ufb01eld gradients\n\nacquisition. It is often used for dynamic studies, such as cardiac imaging and fMRI. A trajectory\nk(t) leads to data y = Xku, where Xk = [e\u2212i2\u03c0rT\nj k(t\u2018)]\u2018j. We use gridding interpolation6 with a\nKaiser-Bessel kernel [1, ch. 13.2] to approximate the multiplication with Xk, which would be too\nexpensive otherwise. As for other reconstruction methods, most of our running time is spent in the\ngridding (MVMs with X, X T , and X\u2217).\nFor our experiments, we acquired\ndata on an equispaced grid.7\nIn\ntheory, the image u is real-valued;\nin reality, due to resonance fre-\nquency offsets, magnetic \ufb01eld inho-\nmogeneities, and eddy currents [1,\nthe reconstruction con-\nch. 13.4],\ntains a phase \u03d5(r).\nis com-\nmon practice to discard \u03d5 after re-\nconstruction. Short of modelling a\ncomplex-valued u, we correct for\nlow-frequency phase contributions by\na cheap pre-measurement.8 Note\n|utrue|, against which recon-\nthat\nstructions are judged below, is not al-\ntered by this correction. From the\ncorrected raw data, we simulate all further measurements under different sequences using grid-\nding interpolation. While no noise is added to these measurements, there remain signi\ufb01cant high-\nfrequency erroneous phase contributions in utrue.\nInterleaved outgoing Archimedian spirals employ trajectories k(t) \u221d \u03b8(t)ei2\u03c0[\u03b8(t)+\u03b80], \u03b8(0) = 0,\nwhere the gradient g(t) \u221d dk/dt grows to maximum strength at the slew rate, then stays there [1,\nch. 17.6]. Sampling along an interleave respects the Nyquist limit. The number of revolutions Nr\nand interleaves Nshot determine the radial spacing. The scan time is proportional to Nshot. In our\nsetup, Nr = 8, resulting in 3216 complex samples per interleave. For equispaced offset angles \u03b80,\nthe Nyquist spiral (respecting the limit radially) has Nshot = 16. Our goal is to design spiral se-\nquences with smaller Nshot, reducing scan time by a factor 16/Nshot. We use the sequential method\ndescribed in Section 2, where {X\u2217 \u2208 Rn\u00d7d} is a set of potential interleaves, d = 6432. The image\nresolution is 256 \u00d7 256, so n = 65536. Since utrue is approximately real-valued, measurements at\nk and \u2212k are quite redundant, which is why we restrict9 ourselves to offset angles \u03b80 \u2208 [0, \u03c0). We\nscore candidates (\u03c0/256)[0 : 255] in each round, comparing to equispaced placements j\u03c0/Nshot,\nand to drawing \u03b80 uniformly at random. For the former, favoured by MRI practitioners right now,\nthe maximum k-space distance between samples is minimized, while the latter is aligned with com-\npressed sensing recommendations [8].\nFor a given sequence, we consider different image reconstructions: the posterior mode (convex MAP\nestimation) [8], linear least squares (LS; linear conjugate gradients), and zero \ufb01lling with density\ncompensation (ZFDC; based on Voronoi diagram) [1, ch. 13.2.4]. The latter requires a single MVM\nwith X T only, and is most commonly used in practice. We selected the \u03c4 scale parameters (there\nare two of them, as in [12]) optimally for the Nyquist spiral Xnyq, and set \u03c32 to the variance of\nXnyq(utrue \u2212 |utrue|). We worked on two slices (8,12) and used 750 Lanczos iterations in our\nmethod.10 We report L2 distances between reconstruction and true image |utrue|. Results are given\nin Table 3, and some reconstructions (slice 8) are shown in Figure 2.\n\n6NFFT: http://www-user.tu-chemnitz.de/\u02dcpotts/nfft/\n7Field of view (FOV) 260mm (256 \u00d7 256 voxels, 1mm2), 16 brain slices with a turbo-spin sequence, 23\nechoes per excitation. Train of 120\u25e6 refocusing pulses, each phase encoded differently. Slices are 4mm thick.\n8We sample the center of k-space on a p \u00d7 p Cartesian grid, obtaining a low-resolution reconstruction\nby FFT, whose phase \u02dc\u03d5 we use to correct the raw data. We tried p \u2208 {16, 32, 64} (larger p means better\ncorrection), results below are for p = 32 only. While reconstruction errors generally decrease somewhat with\nlarger p, the relative differences between all settings below are insensitive to p.\n9Dropping this restriction disfavours equispaced {\u03b80} setups with even Nshot.\n10This seems small, given that n = 65536. We also tried 1250 iterations, which needed more memory, ran\n\nalmost twice as long, and gave slightly worse results (see end of Section 3.1).\n\n5\n\nr\u2212space: U(r)1nn1k\u2212space: Y(k)\u22121/201/21/20\u22121/20246\u221250050gradients: g(t)gx in [mT/m]0246\u221250050t in [ms]gy in [mT/m]\fFigure 2: Reconstruction results. Differences to true image (a; scale [0, 1]) in (b-f), scale [\u22120.1, 0.1].\n\nNshot img MAPop MAPrd MAPeq LSop\n\nLSrd\n\nZFDCrd\n\nLSeq ZFDCop\n\nZFDCeq\n12.99 16.01 \u00b1 2.49 14.18 17.23 19.97 \u00b1 1.33 16.80 25.13 38.04 \u00b1 6.14 23.51\n8.31 12.46 \u00b1 2.46 10.06 12.67 16.24 \u00b1 1.13 13.19 18.79 33.29 \u00b1 4.71 18.16\n3.95 11.81 \u00b1 2.71 4.40\n14.55 33.67 \u00b1 5.90 12.73\n2.94 6.86 \u00b1 2.00 2.84\n13.08 26.96 \u00b1 4.47 6.20\n8.01 10.17 \u00b1 1.63 9.32 12.77 14.95 \u00b1 1.08 12.01 20.58 28.88 \u00b1 4.25 19.74\n4.94 7.74 \u00b1 1.75 5.21\n16.33 25.47 \u00b1 3.15 15.36\n12.34 26.02 \u00b1 3.44 10.62\n2.84 7.46 \u00b1 1.80 3.18\n2.20 4.60 \u00b1 1.26 2.09\n10.07 21.47 \u00b1 3.67 4.28\n\n7.80 13.71 \u00b1 2.25 7.80\n3.77 7.43 \u00b1 2.48 3.31\n9.77 11.89 \u00b1 0.95 9.77\n6.40 9.95 \u00b1 1.73 6.18\n3.32 5.33 \u00b1 1.73 2.27\n\n5\n6\n7\n8\n5\n6\n7\n8\n\n8\n8\n8\n8\n12\n12\n12\n12\n\nMAPeq\n10.67 \u00b1 2.1\n6.51 \u00b1 2.1\n3.27 \u00b1 0.8\n2.34 \u00b1 0.3\n\nNshot MAPop\n9.01 \u00b1 1.3\n5.43 \u00b1 1.1\n3.00 \u00b1 0.5\n2.42 \u00b1 0.3\n\n5\n6\n7\n8\nimg MAPeq, Nshot = 16, (Nyq)\n8\n12\n\n2.75\n1.96\n\nslices 2,4,6,10,12,14 from design of slice 8\n\nLSeq\n\nLSop\n\n14.70 \u00b1 1.6 14.57 \u00b1 2.1\n10.80 \u00b1 1.5 10.95 \u00b1 1.8\n7.08 \u00b1 1.1\n6.45 \u00b1 1.4\n3.16 \u00b1 0.6\n2.70 \u00b1 0.6\nLSeq, Nshot = 16, (Nyq)\n\n3.31\n2.27\n\nFigure 3: Results for spiral interleaves on slices 8, 12 (table left). Reconstruction: MAP (posterior mode [8]),\nLS (least squares), ZFDC (zero \ufb01lling, density compensation). Offset angles \u03b80 \u2208 [0, \u03c0): op (optimized; our\nmethod), rd (uniformly random; avg. 10 runs), eq (equispaced). Nshot: Number of interleaves.\nTable upper right: Avg. errors for slices 2,4,6,10,14, measured with sequences optimized on slice 8.\nTable lower right: Results for Nyquist spiral eq[Nshot = 16].\n\nThe standard reconstruction method ZFDC is improved upon strongly by LS (both are linear, but LS\nis iterative), which in turn is improved upon signi\ufb01cantly by MAP. This is true even for the Nyquist\nspiral (Nshot = 16). While the strongest errors of ZFDC lie outside the \u201ceffective \ufb01eld of view\u201d\n(roughly circular for spiral), panel f of Figure 2 shows that ZFDC errors contain important structures\nall over the image. Modern implementations of LS and MAP are more expensive than ZFDC by\nmoderate constant factors. Results such as ours, together with the availability of affordable high-\nperformance digital computation, strongly motivate the transition away from direct signal processing\nreconstruction algorithms to modern iterative statistical estimators. Note that ZFDC (and, to a lesser\nextent, LS) copes best with equispaced designs, while MAP works best with optimized angles. This\nis because the optimized designs leave larger gaps in k-space (see Figure 4). Nonlinear estimators\ncan interpolate across such gaps to some extent, using image sparsity priors. Methods like ZFDC\nmerely interpolate locally in k-space, uninformed about image statistics, so that violations of the\nNyquist limit anywhere necessarily translate into errors.\nIt is clearly evident that drawing the spiral offset angles at random does not work well, even if\nMAP reconstruction is used as in [8]. The ratio MAPrd/MAPop in L2 error is 1.23, 1.45, 2.99,\n2.33 in Table 3, upper left. While both MAPop and MAPeq essentially attain Nyquist performance\nwith Nshot = 8, MAPrd does not decrease to that level even with Nshot = 16 (not shown). Our\n\n6\n\n(a) Slice8(b) MAP\u2212op, Nshot=7, E=3.95(c) MAP\u2212eq, Nshot=7, E=4.40(d) MAP\u2212rd, Nshot=7, E=12.08(e) MAP\u2212eq, Nshot=8, E=2.84(f) ZFDC\u2212eq, Nshot=8, E=6.20\fresults strongly suggest that randomizing MR sequences is not a useful design principle.11 Similar\nshortcomings of randomly drawn designs were reported in [12], in a more idealized setup. Reasons\nwhy CS theory as yet fails to guide measurement design for real images, are reviewed there, see\nalso [15]. Beyond the rather bad average performance of random designs, the large variance across\ntrials in Table 3 means that in practice, a randomized sequence scan is much like a gamble. The\noutcome of our Bayesian optimized design is stable, in that sequences found in several repetitions\ngave almost identical reconstruction performance.\nThe closest competitors in Table 3 are MAPop\nand MAPeq. Since utrue is close to real, both\nattain close to Nyquist performance up from\nNshot = 8. In the true undersampling regime\nNshot \u2208 {5, 6, 7}, MAPop improves signi\ufb01-\ncantly12 upon MAPeq. Comparing panels b,c\nof Figure 2, the artifact across the lower right\nleads to distortions in the mouth area. Under-\nsampling artifacts are generally ampli\ufb01ed by\nregular sampling, which is avoided in the op-\ntimized designs. Breaking up such regular de-\nsigns seems to be the major role of random-\nization in CS theory, but our results show that\nmuch is lost in the process. We see that approx-\nimate Bayesian experimental design is useful to\noptimize measurement architectures for subsequent MAP reconstruction. To our knowledge, no\nsimilar design optimization method based purely on MAP estimation has been proposed (ours needs\napproximate inference), rendering the bene\ufb01cial interplay between our framework and subsequent\nMAP estimation all the more interesting. The computational primitives required for MAP estima-\ntion and our method are the same. Our implementation requires about 5 hours on a single standard\ndesktop machine to optimize 11 angles sequentially, 256 candidates per extension, with n and d as\nabove. The score computations dominate the running time, but can readily be parallelized.\nIt is neither feasible nor desirable on most current MR scanners to optimize the sequence during the\nmeasurement, so an important question is whether sequences optimized on some slices work better\nin general as well (for the same contrast and similar objects). We tested transferability by measuring\n\ufb01ve other slices not seen by the optimization method. The results (Table 3, upper right) indicate\nthat the main improvements are not speci\ufb01c to the object the sequence was optimized for.13 Two\nspirals found by our method are shown in Figure 4 (2 of 8 interleaves, Nshot = 8). The spacing\nis not equidistant, and as noted above, only nonlinear MAP estimation can successfully interpolate\nacross resulting larger k-space gaps. On the other hand, the spacing is more regular than is typically\nachieved by random sampling.\n\nFigure 4: Spirals found by our algorithm. The ordering\nis color-coded: dark spirals selected \ufb01rst.\n\n5 Discussion\n\nWe have presented the \ufb01rst scalable Bayesian experimental design framework for automatically\noptimizing MRI sequences, a problem of high impact on clinical diagnostics and brain research. The\nhigh demands on image resolution and processing time which come with this application are met in\nprinciple by our novel variational inference algorithm, reducing computations to signal processing\n\n11Images exhibit a decay in power as function of spatial frequence (distance to k-space origin), and the most\nevident failure of uniform random sampling is the ignorance of this fact [15]. While this point is noted in [8],\nthe variable-density weighting suggested there is built in to all designs compared here. Any spiral interleave\nsamples more closely around the origin. In fact, the sampling density as a function of spatial frequency |k(t)|\ndoes not depend on the offset angles \u03b80.\n12In another set of experiments (not shown), we compared optimization, randomization, and equispacing of\n\u03b80 \u2208 [0, 2\u03c0), in disregard of the approximate real-valuedness of utrue. In this setting, equispacing performs\npoorly (worse than randomization).\n\n13However, it is important that the object exhibits realistic natural image statistics. Arti\ufb01cial phantoms of\nextremely simple structure, often used in MR sequence design, are not suitable in that respect. Real MR images\nare much more complicated than simple phantoms, even in low level statistics, and results obtained on phantoms\nonly should not be given overly high attendance.\n\n7\n\n\u22120.0300.03\u22120.0300.03Slice 8, Nshot=8\u22120.0300.03\u22120.0300.03Slice 12, Nshot=8\fprimitives such as FFT and gridding. We demonstrated the power of our approach in a study with\nspiral sequences, using raw data from a 3T MR scanner. The sequences found by our method lead to\nreconstructions of high quality, even though they are faster than traditionally used Nyquist setups by\na factor up to two. They improve strongly on sequences obtained by blind randomization. Moreover,\nacross all designs, nonlinear Bayesian MAP estimation was found to be essential for reconstructions\nfrom undersamplings, and our design optimization framework is especially useful for subsequent\nMAP reconstruction.\nOur results strongly suggest that modi\ufb01cations to standard sequences can be found which produce\nsimilar images at lower cost. Namely, with so many handles to turn in sequence design nowadays,\nthis is a high-dimensional optimization problem dealing with signals (images) of high complexity,\nand human experts can greatly bene\ufb01t from goal-directed machine exploration. Randomizing param-\neters of a sequence, as suggested by compressed sensing theory, helps to break wasteful symmetries\nin regular standard sequences. As our results show, many of the advantages of regular sequences\nare lost by randomization though. The optimization of Bayesian information leads to irregular se-\nquences as well, improving on regular, and especially on randomized designs. Our insights should\nbe especially valuable in MR applications where a high temporal resolution is essential (such as\nfMRI studies), so that dense spatial sampling is not even an option. An extension to 3d volume\nreconstruction, making use of non-Gaussian hidden Markov models, is work in progress. Finally,\nour framework seems also promising for real-time imaging [1, ch. 11.4], where the scanner allows\nfor on-line adaptations of the sequence depending on measurement feedback. It could be used to\nhelp an operator homing in on regions of interest, or could even run without human intervention.\nWe intend to test our proposal directly on an MR scanner, using the sequential setup described in\nSection 2. This will come with new problems not addressed in Section 4, such as phase or image\nerrors that depend on the sequence employed14 (which could be accounted for by a more elaborate\nnoise model). In our experiments in Section 4, the choice of different offset angles is cost-neutral,\nbut when a larger set of candidates is used, respective costs have to be quanti\ufb01ed in terms of real\nscan time, error-proneness, heating due to rapid gradient switching, and other factors.\n\nAcknowledgments\n\nWe thank Stefan Kunis for help and support with NFFT.\n\nReferences\n[1] M.A. Bernstein, K.F. King, and X.J. Zhou. Handbook of MRI Pulse Sequences. Elsevier Academic Press,\n\n1st edition, 2004.\n\n13:2517\u20132532, 2001.\n\n[2] A. Garroway, P. Grannell, and P. Mans\ufb01eld. Image formation in NMR by a selective irradiative pulse. J.\n\nPhys. C: Solid State Phys., 7:L457\u2013L462, 1974.\n\n[3] M. Girolami. A variational method for learning sparse and overcomplete representations. N. Comp.,\n\n[4] G. Golub and C. Van Loan. Matrix Computations. Johns Hopkins University Press, 3rd edition, 1996.\n[5] A. Haase, J. Frahm, D. Matthaei, W. H\u00a8anicke, and K. Merboldt. FLASH imaging: Rapid NMR imaging\n\nusing low \ufb02ip-angle pulses. J. Magn. Reson., 67:258\u2013266, 1986.\n\n[6] J. Hennig, A. Nauerth, and H. Friedburg. RARE imaging: A fast imaging method for clinical MR. Magn.\n\n[7] P. Lauterbur.\n\nImage formation by induced local interactions: Examples employing nuclear magnetic\n\nReson. Med., 3(6):823\u2013833, 1986.\n\nresonance. Nature, 242:190\u2013191, 1973.\n\n[8] M. Lustig, D. Donoho, and J. Pauly. Sparse MRI: The application of compressed sensing for rapid MR\n\nimaging. Magn. Reson. Med., 85(6):1182\u20131195, 2007.\n\n[9] P. Mans\ufb01eld. Multi-planar image formation using NMR spin-echoes. J. Phys. C, 10:L50\u2013L58, 1977.\n[10] M. Schneider and A. Willsky. Krylov subspace estimation. SIAM J. Comp., 22(5):1840\u20131864, 2001.\n[11] M. Seeger. Bayesian inference and optimal design for the sparse linear model. JMLR, 9:759\u2013813, 2008.\n[12] M. Seeger and H. Nickisch. Compressed sensing and Bayesian experimental design. In ICML 25, 2008.\n[13] M. Seeger and H. Nickisch. Large scale variational inference and experimental design for sparse general-\nized linear models. Technical Report TR-175, Max Planck Institute for Biological Cybernetics, T\u00a8ubingen,\nGermany, September 2008.\n\n[14] M. Tipping and A. Faul. Fast marginal likelihood maximisation for sparse Bayesian models. In AI and\n\n[15] Y. Weiss, H. Chang, and W. Freeman. Learning compressed sensing. Snowbird Learning Workshop,\n\nStatistics 9, 2003.\n\nAllerton, CA, 2007.\n\n[16] D. Wipf and S. Nagarajan. A new view of automatic relevance determination. In NIPS 20, 2008.\n\n14Some common problems with spirals are discussed in [1, ch. 17.6.3], together with remedies.\n\n8\n\n\f", "award": [], "sourceid": 605, "authors": [{"given_name": "Hannes", "family_name": "Nickisch", "institution": null}, {"given_name": "Rolf", "family_name": "Pohmann", "institution": null}, {"given_name": "Bernhard", "family_name": "Sch\u00f6lkopf", "institution": null}, {"given_name": "Matthias", "family_name": "Seeger", "institution": null}]}