{"title": "Deep ADMM-Net for Compressive Sensing MRI", "book": "Advances in Neural Information Processing Systems", "page_first": 10, "page_last": 18, "abstract": "Compressive Sensing (CS) is an effective approach for fast Magnetic Resonance Imaging (MRI). It aims at reconstructing MR image from a small number of under-sampled data in k-space, and accelerating the data acquisition in MRI. To improve the current MRI system in reconstruction accuracy and computational speed, in this paper, we propose a novel deep architecture, dubbed ADMM-Net. ADMM-Net is defined over a data flow graph, which is derived from the iterative procedures in Alternating Direction Method of Multipliers (ADMM) algorithm for optimizing a CS-based MRI model. In the training phase, all parameters of the net, e.g., image transforms, shrinkage functions, etc., are discriminatively trained end-to-end using L-BFGS algorithm. In the testing phase, it has computational overhead similar to ADMM but uses optimized parameters learned from the training data for CS-based reconstruction task. Experiments on MRI image reconstruction under different sampling ratios in k-space demonstrate that it significantly improves the baseline ADMM algorithm and achieves high reconstruction accuracies with fast computational speed.", "full_text": "Deep ADMM-Net for Compressive Sensing MRI\n\nYan Yang\n\nXi\u2019an Jiaotong University\n\nyangyan92@stu.xjtu.edu.cn\n\nHuibin Li\n\nXi\u2019an Jiaotong University\n\nhuibinli@mail.xjtu.edu.cn\n\nJian Sun\n\nXi\u2019an Jiaotong University\n\njiansun@mail.xjtu.edu.cn\n\nZongben Xu\n\nXi\u2019an Jiaotong University\nzbxu@mail.xjtu.edu.cn\n\nAbstract\n\nCompressive Sensing (CS) is an effective approach for fast Magnetic Resonance\nImaging (MRI). It aims at reconstructing MR image from a small number of under-\nsampled data in k-space, and accelerating the data acquisition in MRI. To improve\nthe current MRI system in reconstruction accuracy and computational speed, in\nthis paper, we propose a novel deep architecture, dubbed ADMM-Net. ADMM-\nNet is de\ufb01ned over a data \ufb02ow graph, which is derived from the iterative pro-\ncedures in Alternating Direction Method of Multipliers (ADMM) algorithm for\noptimizing a CS-based MRI model. In the training phase, all parameters of the\nnet, e.g., image transforms, shrinkage functions, etc., are discriminatively trained\nend-to-end using L-BFGS algorithm. In the testing phase, it has computational\noverhead similar to ADMM but uses optimized parameters learned from the train-\ning data for CS-based reconstruction task. Experiments on MRI image reconstruc-\ntion under different sampling ratios in k-space demonstrate that it signi\ufb01cantly\nimproves the baseline ADMM algorithm and achieves high reconstruction accura-\ncies with fast computational speed.\n\n1 Introduction\n\nMagnetic Resonance Imaging (MRI) is a non-invasive imaging technique providing both functional\nand anatomical information for clinical diagnosis. Imaging speed is a fundamental challenge. Fast\nMRI techniques are essentially demanded for accelerating data acquisition while still reconstructing\nhigh quality image. Compressive sensing MRI (CS-MRI) is an effective approach allowing for data\nsampling rate much lower than Nyquist rate without signi\ufb01cantly degrading the image quality [1].\nCS-MRI methods \ufb01rst sample data in k-space (i.e., Fourier space), then reconstruct image using\ncompressive sensing theory. Regularization related to the data prior is a key component in a CS-\nMRI model to reduce imaging artifacts and improve imaging precision. Sparse regularization can be\nexplored in speci\ufb01c transform domain or general dictionary-based subspace [2]. Total Variation (TV)\nregularization in gradient domain has been widely utilized in MRI [3, 4]. Although it is easy and\nfast to optimize, it introduces staircase artifacts in reconstructed image. Methods in [5, 6] leverage\nsparse regularization in the wavelet domain. Dictionary learning methods rely on a dictionary of\nlocal patches to improve the reconstruction accuracy [7, 8]. The non-local method uses groups of\nsimilar local patches for joint patch-level reconstruction to better preserve image details [9, 10, 11].\nIn performance, the basic CS-MRI methods run fast but produce less accurate reconstruction results.\nThe non-local and dictionary learning-based methods generally output higher quality MR images,\nbut suffer from slow reconstruction speed.\nIn a CS-MRI model, it is commonly challenging to\nchoose an optimal image transform domain / subspace and the corresponding sparse regularization.\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fTo optimize the CS-MRI models, Alternating Direction Method of Multipliers (ADMM) has proven\nto be an ef\ufb01cient variable splitting algorithm with convergence guarantee [4, 12, 13]. It considers\nthe augmented Lagrangian function of a given CS-MRI model, and splits variables into subgroups,\nwhich can be alternatively optimized by solving a few simply subproblems. Although ADMM is\ngenerally ef\ufb01cient, it is not trivial to determine the optimal parameters (e.g., update rates, penalty\nparameters) in\ufb02uencing accuracy in CS-MRI.\nIn this work, we aim to design a fast yet accurate method to reconstruct high-quality MR images\nfrom under-sampled k-space data. We propose a novel deep architecture, dubbed ADMM-Net, in-\nspired by the ADMM iterative procedures for optimizing a general CS-MRI model. This deep archi-\ntecture consists of multiple stages, each of which corresponds to an iteration in ADMM algorithm.\nMore speci\ufb01cally, we de\ufb01ne a deep architecture represented by a data \ufb02ow graph [14] for ADMM\nprocedures. The operations in ADMM are represented as graph nodes, and the data \ufb02ow between\ntwo operations in ADMM is represented by a directed edge. Therefore, the ADMM iterative proce-\ndures naturally determine a deep architecture over a data \ufb02ow graph. Given an under-sampled data\nin k-space, it \ufb02ows over the graph and generates a reconstructed image. All the parameters (e.g.,\ntransforms, shrinkage functions, penalty parameters, etc.) in the deep architecture can be discrimi-\nnatively learned from training pairs of under-sampled data in k-space and reconstructed image using\nfully sampled data by backpropagation [15] over the data \ufb02ow graph.\nOur experiments demonstrate that the proposed deep ADMM-Net is effective both in reconstruc-\ntion accuracy and speed. Compared with the baseline methods using sparse regularization in trans-\nform domain, it achieves signi\ufb01cantly higher accuracy and takes comparable computational time.\nCompared with the state-of-the-art methods using dictionary learning and non-local techniques, it\nachieves high accuracy in signi\ufb01cantly faster computational speed.\nThe main contributions of this paper can be summarized as follows. We propose a novel deep\nADMM-Net by reformulating an ADMM algorithm to a deep network for CS-MRI. This is achieved\nby designing a data \ufb02ow graph for ADMM to effectively build and train the ADMM-Net. ADMM-\nNet achieves high accuracy in MR image reconstruction with fast computational speed justi\ufb01ed in\nexperiments. The discriminative parameter learning approach has been applied to sparse coding\nand Markov Random Filed [16, 17, 18, 19]. But, to the best of our knowledge, this is the \ufb01rst\ncomputational framework that maps an ADMM algorithm to a learnable deep architecture.\n\n2 Deep ADMM-Net for Fast MRI\n\n2.1 Compressive Sensing MRI Model and ADMM Algorithm\nGeneral CS-MRI Model: Assume x 2 CN is an MRI image to be reconstructed, y 2 CN\n<\nN) is the under-sampled k-space data, according to the CS theory, the reconstructed image can be\nestimated by solving the following optimization problem:\n\n(N\n\n\u2032\n\n\u2032\n\n}\n\nL\u2211\n\nl=1\n\n{\n\nL\u2211\nL\u2211\n\n^x = arg min\n\nx\n\n\u2225Ax (cid:0) y\u22252\n\n2 +\n\n1\n2\n\n(cid:21)lg(Dlx)\n\n;\n\n(1)\n\n\u2032(cid:2)N is a measurement matrix, P 2 RN\n\nwhere A = P F 2 RN\n\u2032(cid:2)N is a under-sampling matrix, and F\nis a Fourier transform. Dl denotes a transform matrix for a \ufb01ltering operation, e.g., Discrete Wavelet\nTransform (DWT), Discrete Cosine Transform (DCT), etc. g((cid:1)) is a regularization function derived\nfrom the data prior, e.g., lq-norm (0 (cid:20) q (cid:20) 1) for a sparse prior. (cid:21)l is a regularization parameter.\nADMM solver: [12] The above optimization problem can be solved ef\ufb01ciently using ADMM algo-\nrithm. By introducing auxiliary variables z = fz1; z2;(cid:1)(cid:1)(cid:1) ; zLg, Eqn. (1) is equivalent to:\n\nmin\nx;z\n\n1\n2\n\n\u2225Ax (cid:0) y\u22252\n\n2 +\n\nl=1\nIts augmented Lagrangian function is :\n\nL(cid:26)(x; z; (cid:11)) =\n\n1\n2\n\n\u2225Ax (cid:0) y\u22252\n\n2 +\n\n(cid:21)lg(zl)\n\ns:t: zl = Dlx; 8 l 2 [1; 2;(cid:1)(cid:1)(cid:1) ; L]:\n\n(2)\n\n(cid:21)lg(zl) (cid:0) L\u2211\n\n\u27e8(cid:11)l; zl (cid:0) Dlx\u27e9 +\n\nL\u2211\n\nl=1\n\n\u2225zl (cid:0) Dlx\u22252\n2;\n\n(cid:26)l\n2\n\n(3)\n\nl=1\n\nl=1\n\n2\n\n\f2\n\nl\n\n(cid:26)l\n2\n\nl\n\n(cid:26)l\n2\n\n1\n2\n\nL\n\nl\n\nl\n\nl\n\nx\n\nz\n\n(cid:11)\n\nFigure 1: The data \ufb02ow graph for the ADMM optimization of a general CS-MRI model. This graph\nconsists of four types of nodes: reconstruction (X), convolution (C), non-linear transform (Z), and\nmultiplier update (M). An under-sampled data in k-space is successively processed over the graph,\nand \ufb01nally generates a MR image. Our deep ADMM-Net is de\ufb01ned over this data \ufb02ow graph.\nwhere (cid:11) = f(cid:11)lg are Lagrangian multipliers and (cid:26) = f(cid:26)lg are penalty parameters. ADMM alterna-\n\u2211\ntively optimizes fx; z; (cid:11)g by solving the following three subproblems:\n\u2211\n; z(n)\n; zl (cid:0) Dlx(n+1)\u27e9 +\n\u27e9;\n\n(cid:0) Dlx\u22252\n2;\n\u2225zl (cid:0) Dlx(n+1)\u22252\n2;\n\n(cid:11)(n+1) = arg min\n\n(cid:0) Dlx\u27e9 +\n\nz(n+1) = arg min\n\n\u2225z(n)\n\nL\nl=1\n\nL\nl=1\n\nL\nl=1\n\nL\nl=1\n\nx(n+1) = arg min\n\n8>>><>>>:\n(cid:0)\u2211\n\u2211\nl=1 (cid:21)lg(zl) (cid:0)\u2211\n\u2225Ax (cid:0) y\u22252\n\u2211\n8><>:X(n) : x(n) = F T [P T P +\n\n\u27e8(cid:11)(n)\n\u27e8(cid:11)(n)\n\u27e8(cid:11)l; Dlx(n+1) (cid:0) z(n+1)\n\u2211\nl = S(Dlx(n) + (cid:12)(n(cid:0)1)\nl = (cid:12)(n(cid:0)1)\n\nL\nl=1 (cid:26)lF DT\n; (cid:21)l=(cid:26)l);\n+ (cid:17)l(Dlx(n) (cid:0) z(n)\n\nZ(n) : z(n)\nM(n) : (cid:12)(n)\n\nl DlF T ]\n\nL\nl=1\n\n(4)\n(l 2 [1; 2;(cid:1)(cid:1)(cid:1) ; L]), and\nl (z(n(cid:0)1)\n\n(cid:0) (cid:12)(n(cid:0)1)\n\n)];\n\nwhere n 2 [1; 2;(cid:1)(cid:1)(cid:1) ; Ns] denotes n-th iteration. For simplicity, let (cid:12)l = (cid:11)l\nsubstitute A = P F into Eqn. (4). Then the three subproblems have the following solutions:\n\n(cid:26)l\n\n(5)\nwhere x(n) can be ef\ufb01ciently computed by fast Fourier transform, S((cid:1)) is a nonlinear shrinkage\nfunction. It is usually a soft or hard thresholding function corresponding to the sparse regularization\nof l1-norm and l0-norm respectively [20]. The parameter (cid:17)l is an update rate.\nIn CS-MRI, it commonly needs to run the ADMM algorithm in dozens of iterations to get a satis-\nfactory reconstruction result. However, it is challenging to choose the transform Dl and shrinkage\nfunction S((cid:1)) for general regularization function g((cid:1)). Moreover, it is also not trivial to tune the\nparameters (cid:26)l and (cid:17)l for k-space data with different sampling ratios. To overcome these dif\ufb01cul-\nties, we will design a data \ufb02ow graph for the ADMM algorithm, over which we can de\ufb01ne a deep\nADMM-Net to discriminatively learn all the above transforms, functions, and parameters.\n\n(cid:0)1[P T y +\n\nL\nl=1 (cid:26)lF DT\n\n\u2211\n\nl\n\nl\n\nl\n\nl\n\n);\n\nl\n\n2.2 Data Flow Graph for the ADMM Algorithm\n\nTo design our deep ADMM-Net, we \ufb01rst map the ADMM iterative procedures in Eqn. (5) to a\ndata \ufb02ow graph [14]. As shown in Fig. 1, this graph comprises of nodes corresponding to different\noperations in ADMM, and directed edges corresponding to the data \ufb02ows between operations. In\nthis case, the n-th iteration of ADMM algorithm corresponds to the n-th stage of the data \ufb02ow graph.\nIn the n-th stage of the graph, there are four types of nodes mapped from four types of operations in\nADMM, i.e., reconstruction operation (X(n)), convolution operation (C(n)) de\ufb01ned by fDlx(n)gL\nl=1,\nnonlinear transform operation (Z(n)) de\ufb01ned by S((cid:1)), and multiplier update operation (M(n)) in\nEqn. (5). The whole data \ufb02ow graph is a multiple repetition of the above stages corresponding to\nsuccessive iterations in ADMM. Given an under-sampled data in k-space, it \ufb02ows over the graph\nand \ufb01nally generates a reconstructed image. In this way, we map the ADMM iterations to a data\n\ufb02ow graph, which is useful to de\ufb01ne and train our deep ADMM-Net in the following sections.\n\n2.3 Deep ADMM-Net\n\nOur deep ADMM-Net is de\ufb01ned over the data \ufb02ow graph. It keeps the graph structure but generalizes\nthe four types of operations to have learnable parameters as network layers. These operations are\nnow generalized as reconstruction layer, convolution layer, non-linear transform layer, and multiplier\nupdate layer. We next discuss them in details.\n\n3\n\nSampling datain k-spaceReconstructedMR image stage n(1)X(n-1)C(n-1)Z(n)X(n-1)M(n)C(n)Z(n+1)X(n)M(n+1)C(n+1)Zs1(N)X\uf02b(n+1)M(n-1)X\fL\u2211\n\nl=1\n\nReconstruction layer (X(n)): This layer reconstructs an MRI image following the reconstruction\n\noperation X(n) in Eqn. (5). Given z(n(cid:0)1)\n\nand (cid:12)(n(cid:0)1)\n\nL\u2211\n\nl\n\nl\n\n, the output of this layer is de\ufb01ned as:\n(cid:0)(cid:12)(n(cid:0)1)\n\nT (z(n(cid:0)1)\n\n(cid:0)1[P T y +\n\nl F T )\n\n(cid:26)(n)\nl F H (n)\n\nT H (n)\n)]; (6)\nis the l-th penalty parameter, l = 1;(cid:1)(cid:1)(cid:1) ; L, and y is the input\nare initialized to zeros,\n\nand (cid:12)(0)\n\nl=1\n\nl\n\nl\n\nl\n\nl\n\nx(n) = F T (P T P +\n\n(cid:26)(n)\nl F H (n)\n\nl\n\nl\n\nis the l-th \ufb01lter, (cid:26)(n)\n\nwhere H (n)\nunder-sampled data in k-space. In the \ufb01rst stage (n = 1), z(0)\ntherefore x(1) = F T (P T P +\n\nT H (1)\n\n(cid:0)1(P T y):\n\nl=1 (cid:26)(1)\n\nl F H (1)\n\nl F T )\n\nL\n\nl\n\nl\n\nl\n\n\u2211\n\nConvolution layer (C(n)): It performs convolution operation to transform an image into trans-\n\nform domain. Given an image x(n), i.e., a reconstructed image in stage n, the output is\n\nwhere D(n)\nconstrain the \ufb01lters D(n)\n\nl\n\nl\n\n(7)\nis a learnable \ufb01lter matrix in stage n. Different from the original ADMM, we do not\n\nl x(n);\n\nc(n)\nl = D(n)\n\nand H (n)\n\nl\n\nto be the same to increase the network capacity.\n\nNonlinear transform layer (Z(n)): This layer performs nonlinear transform inspired by the\nshrinkage function S((cid:1)) de\ufb01ned in Z(n) in Eqn. (5). Instead of setting it to be a shrinkage func-\ntion determined by the regularization term g((cid:1)) in Eqn. (1), we aim to learn more general function\nusing piecewise linear function. Given c(n)\n\nl\n\nand (cid:12)(n(cid:0)1)\nl + (cid:12)(n(cid:0)1)\n\nl\n\nl\n\nz(n)\nl = SP LF (c(n)\n\n, the output of this layer is de\ufb01ned as:\n;fpi; q(n)\n\ngNc\ni=1);\n\nl;i\n\nwhere SP LF ((cid:1)) is a piecewise linear function determined by a set of control points fpi; q(n)\n\n(8)\ngNc\ni=1. i.e.\n\nl;i\n\n\u230b, fpigNc\n\ngNc\nwhere k = \u230a a(cid:0)p1\np2(cid:0)p1\ni=1\nare the values at these positions for l-th \ufb01lter in n-th stage. Figure 2 gives an illustrative example.\nSince a piecewise linear function can approximate any function, we can learn \ufb02exible nonlinear\ntransform function from data beyond the off-the-shelf hard or soft thresholding functions.\n\ni=1 are prede\ufb01ned positions uniformly located within [-1,1], and fq(n)\n\nl;i\n\nSP LF (a;fpi; q(n)\n\nl;i\n\ngNc\ni=1) =\n\n8>><>>: a + q(n)\n\nl;1\na + q(n)\nl;Nc\nq(n)\nl;k +\n\n(cid:0) p1;\n(cid:0) pNc;\n(a(cid:0)pk)(q(n)\n\na < p1,\na > pNc,\n\n(9)\n\n(cid:0)q(n)\nl;k )\n\n; p1 (cid:20) a (cid:20) pNc,\n\nl;k+1\n\npk+1(cid:0)pk\n\nFigure 2: Illustration of a piecewise linear function determined by a set of control points.\n\nMultiplier update layer (M(n)): This layer is de\ufb01ned by the Lagrangian multiplier updating\n\nprocedure M(n) in Eqn. (5). The output of this layer in stage n is de\ufb01ned as:\n\nl = (cid:12)(n(cid:0)1)\n(cid:12)(n)\nare learnable parameters.\n\nl\n\n+ (cid:17)(n)\n\nl\n\n(c(n)\n\nl\n\n(cid:0) z(n)\n\nl\n\nwhere (cid:17)(n)\n\nl\n\n);\n\n(10)\n\nl\n\nin convolution layer, fq(n)\n\nNetwork Parameters: These layers are organized in a data \ufb02ow graph shown in Fig. 1. In the\ndeep architecture, we aim to learn the following parameters: H (n)\nin reconstruction layer,\n\ufb01lters D(n)\nin multiplier update\nlayer, where l 2 [1; 2;(cid:1)(cid:1)(cid:1) ; L] and n 2 [1; 2;(cid:1)(cid:1)(cid:1) ; Ns] are the indexes for the \ufb01lters and stages\nrespectively. All of these parameters are taken as the network parameters to be learned.\nFigure 3 shows an example of a deep ADMM-Net with three stages. The under-sampled data in\nk-space \ufb02ows over three stages in a order from circled number 1 to number 12, followed by a\n\ufb01nal reconstruction layer with circled number 13 and generates a reconstructed image. Immediate\nreconstruction result at each stage is shown under each reconstruction layer.\n\ngNc\ni=1 in nonlinear transform layer, (cid:17)(n)\n\nand (cid:26)(n)\n\nl;i\n\nl\n\nl\n\nl\n\n4\n\n(\ud835\udc5d\ud835\udc56,\ud835\udc5e\ud835\udc59,\ud835\udc56(\ud835\udc5b))\u2026\u2026-11\fFigure 3: An example of deep ADMM-Net with three stages. The sampled data in k-space is\nsuccessively processed by operations in a order from 1 to 12, followed by a reconstruction layer\nX (4) to output the \ufb01nal reconstructed image. The reconstructed image in each stage is shown under\neach reconstruction layer.\n\n3 Network Training\n\nWe take the reconstructed MR image using fully sampled data in k-space as the ground-truth MR\nimage xgt, and under-sampled data y in k-space as the input. Then a training set (cid:0) is constructed\ncontaining pairs of under-sampled data and ground-truth MR image. We choose normalized mean\nsquare error (NMSE) as the loss function in network training. Given pairs of training data, the loss\nbetween the network output and ground truth is de\ufb01ned as:\n\n\u2211\n\n\u221a\n\u221a\n\u2225^x(y; (cid:2)) (cid:0) xgt\u22252\n\u2225xgt\u22252\n\n2\n\n2\n\nE((cid:2)) =\n\n1j(cid:0)j\n\n(y;xgt)2(cid:0)\n\n;\n\n(11)\n\nwhere ^x(y; (cid:2)) is the network output based on network parameter (cid:2) and under-sampled data y in k-\nspace. We learn the parameters (cid:2) = f(q(n)\ng\n; (cid:26)(Ns+1)\n(l = 1;(cid:1)(cid:1)(cid:1) ; L) by minimizing the loss w.r.t. them using L-BFGS1. In the following, we \ufb01rst discuss\nthe initialization of these parameters and then compute the gradients of the loss function E((cid:2)) w.r.t.\nparameters (cid:2) using backpropagation (BP) [21] over the data \ufb02ow graph.\n\n[ fH (Ns+1)\n\ni=1; D(n)\n\nl;i )Nc\n\n; H (n)\n\n; (cid:17)(n)\n\n; (cid:26)(n)\n\ngNs\n\nl\n\nl\n\nl\n\nl\n\nn=1\n\nl\n\nl\n\n{\n\n}\n\nL\u2211\n\nl=1\n\n3.1\n\nInitialization\n\nWe initialize the network parameters (cid:2) according to the ADMM solver of the following baseline\nCS-MRI model:\n\narg min\n\nx\n\n\u2225Ax (cid:0) y\u22252\n\n2 + (cid:21)\n\n1\n2\n\njjDlxjj1\n\n:\n\n(12)\n\nin convolution layers and H (n)\n\nIn this model, we set Dl as a DCT basis and impose l1-norm regularization in the DCT trans-\nform space. The function S((cid:1)) in ADMM algorithm (Eqn. (5)) is a soft thresholding function:\nS(t; (cid:21)=(cid:26)l) = sgn(t)(jtj (cid:0) (cid:21)=(cid:26)l) when jtj > (cid:21)=(cid:26)l, and 0 otherwise. For each n-th stage of deep\nADMM-Net, \ufb01lters D(n)\nin reconstruction layers are initialized to be\nDl in Eqn. (12). In the nonlinear transform layer, we uniformly choose 101 positions located within\n[-1,1], and each value q(n)\nare initialized to\nl;i\nbe the corresponding values in the ADMM algorithm. In this case, the initialized net is exactly a\nrealization of ADMM optimizing Eqn. (12), therefore outputs the same reconstructed image as the\nADMM algorithm. The optimization of the network parameters is expected to produce improved\nreconstruction result.\n\nis initialized as S(pi; (cid:21)=(cid:26)l). Parameters (cid:21); (cid:26)(n)\n\n; (cid:17)(n)\n\nl\n\nl\n\nl\n\nl\n\n3.2 Gradient Computation by Backpropagation over Data Flow Graph\n\nIt is challenging to compute the gradients of loss w.r.t. parameters using backpropagation over the\ndeep architecture in Fig. 1, because it is a directed graph. In the forward pass, we process the data\nof n-th stage in the order of X(n); C(n); Z(n) and M(n). In the backward pass, the gradients are\n\n1http://users.eecs.northwestern.edu/~nocedal/lbfgsb.html\n\n5\n\n\u2460\u2463(1)X\u2461(1)C\u2462(1)Z\u2464(2)X(1)M\u2467\u2465(2)C\u2466(2)Z\u2468(3)X(2)M\u246b\u2469(3)C\u246a(3)Z\u246c(4)X(3)MSampling datain k-spaceReconstructedMR image \fl\n\nl\n\ng. Its output f(cid:12)(n)\n\nf(cid:12)(n(cid:0)1)\nx(n+1). The parameters of this layer are (cid:17)(n)\nters can be computed as:\n@E\n@(cid:17)(n)\nl\n@E\n@(cid:12)(n)\ngradients of the output in this layer w.r.t. its inputs:\n\n; where @E\n@(cid:12)(n)\n\n@(cid:12)(n)\n@(cid:17)(n)\n\n@E\n@(cid:12)(n)\n\n@(cid:12)(n+1)\n\n@E\n\n=\n\n=\n\nl\n\nl\n\nl\n\nl\n\nl\n\nl\n\nl\n\nis the summation of gradients along the three dashed blue arrows in Fig. 4(a). We also compute\n\nl\n\n@(cid:12)(n+1)\n@(cid:12)(n)\n\nl\n\n+\n\n@E\n\n@z(n+1)\n\nl\n\n@z(n+1)\nl\n@(cid:12)(n)\n\nl\n\n+\n\n@E\n\n@x(n+1)\n\n@x(n+1)\n@(cid:12)(n)\n\nl\n\n:\n\nFigure 4: Illustration of four types of graph nodes (i.e., layers in network) and their data \ufb02ows in\nstage n. The solid arrow indicates the data \ufb02ow in forward pass and dashed arrow indicates the\nbackward pass when computing gradients in backpropagation.\n\ncomputed in an inverse order. Figure 3 shows an example, where the gradient can be computed\nbackwardly from the layers with circled number 13 to 1 successively. For a stage n, Fig. 4 shows\nfour types of nodes (i.e., network layers) and the data \ufb02ow over them. Each node has multiple inputs\nand (or) outputs. We next brie\ufb02y introduce the gradients computation for each layer in a typical\nstage n (n < Ns). Please refer to supplementary material for details.\n\nMultiplier update layer (M(n)): As shown in Fig. 4(a), this layer has three sets of inputs:\ng and\n; l = 1;(cid:1)(cid:1)(cid:1) ; L. The gradients of loss w.r.t. the parame-\n\ng is the input to compute f(cid:12)(n+1)\n\ng and fz(n)\n\ng;fz(n+1)\n\ng;fc(n)\n\nl\n\nl\n\nl\n\nl\n\n@(cid:12)(n)\n@(cid:12)(n(cid:0)1)\n\nl\n\nl\n\n, @(cid:12)(n)\n@c(n)\n\nl\n\nl\n\n, and @(cid:12)(n)\n@z(n)\n\nl\n\nl\n\n.\n\ng;fc(n)\n\nNonlinear transform layer (Z(n)): As shown in Fig. 4(b), this layer has two sets of inputs:\ng is the input for computing f(cid:12)(n)\ng and x(n+1) in next stage.\ni=1; l = 1;(cid:1)(cid:1)(cid:1) ; L. The gradient of loss w.r.t. parameters can\ngNc\n\nf(cid:12)(n(cid:0)1)\ng, and its output fz(n)\nThe parameters of this layers are fq(n)\nbe computed as\n\nl;i\n\nl\n\nl\n\nl\n\nl\n\n@E\n@q(n)\nl;i\n\n@(cid:12)(n)\n@z(n)\nWe also compute the gradients of layer output to its inputs: @z(n)\n@(cid:12)(n)\n\n; where @E\n@z(n)\n\n@z(n)\n@q(n)\nl;i\n\n@E\n@(cid:12)(n)\n\n@E\n@z(n)\n\n=\n\n=\n\nl\n\nl\n\nl\n\nl\n\nl\n\nl\n\nl\n\n+\n\n@E\n\n@x(n+1)\n\n@x(n+1)\n@z(n)\n\nl\n\n:\n\nand @z(n)\n@c(n)\n\nl\n\n.\n\nl\n\nl\n\nConvolution layer (C(n)): The parameters of this layer are D(n)\n\nthe \ufb01lter by D(n)\ncoef\ufb01cients to be learned. The gradients of loss w.r.t. \ufb01lter coef\ufb01cients are computed as\n\nl;mBm, where Bm is a basis element, and f!(n)\n\nm=1 !(n)\n\nl =\n\nl;m\n\nt\n\nl\n\n(l = 1;(cid:1)(cid:1)(cid:1) ; L). We represent\ng is the set of \ufb01lter\n\n\u2211\n\nl\n\n\u2211\n\n@E\n@!(n)\nl;m\n\n@E\n@(cid:12)(n)\nThe gradient of layer output w.r.t. input is computed as @c(n)\n@x(n) .\n\n; where @E\n@c(n)\n\n@E\n@c(n)\n\n=\n\n=\n\n@c(n)\n@!(n)\nl;m\n\nl\n\nl\n\nl\n\nl\n\nl\n\n@(cid:12)(n)\n@c(n)\n\nl\n\n+\n\n@E\n@z(n)\n\nl\n\nl\n\n@z(n)\n@c(n)\n\nl\n\n:\n\nReconstruction layer (X(n)): The parameters of this layer are H (n)\n\nl;mBm, where f(cid:13)(n)\nto convolution layer, we represent the \ufb01lter by H (n)\nl;m\n\ufb01lter coef\ufb01cients to be learned. The gradients of loss w.r.t. parameters are computed as\n\nm=1 (cid:13)(n)\n\nl =\n\n; (cid:26)(n)\n\ns\n\nl\n\nl\n\n(l = 1;(cid:1)(cid:1)(cid:1) ; L). Similar\ng is the set of\n\n=\n\n@E\n@E\n@x(n)\n@(cid:13)(n)\nl;m\n; if n (cid:20) Ns;\n\n@c(n)\n@x(n)\n\nwhere @E\n@x(n)\n\n=\n\n@E\n@c(n)\n\n@x(n)\n@(cid:13)(n)\nl;m\n\n;\n\n@E\n@x(n)\n\nl\n\n@E\n@(cid:26)(n)\n1j(cid:0)j\n\n=\n\n=\n\n\u221a\n\n@E\n@x(n)\n\nl\n\n;\n\n@x(n)\n\u221a\n@(cid:26)(n)\n(x(n) (cid:0) xgt)\n\u2225x(n) (cid:0) xgt\u22252\n\n2\n\n\u2225xgt\u22252\n\n2\n\nThe gradients of layer output w.r.t. inputs are computed as @x(n)\n@(cid:12)(n(cid:0)1)\n\nl\n\nand @x(n)\n@z(n(cid:0)1)\n\nl\n\n.\n\n6\n\n; if n = Ns + 1:\n\n()nlc(n)Z(1)nx\uf02b(b) Non-linear transform layer()nl\uf062(c) Convolution layer()nx(n)C()nlz()nl\uf062()nlz(1)nl\uf062\uf02d(n)M(1)nlz\uf02b(a) Multiplier update layer(1)nl\uf062\uf02b(d) Reconstruction layer(n)X()nlc()nlc()nl\uf062()nl\uf062()nl\uf062(1)nl\uf062\uf02d()nlz(1)nlz\uf02d()nlc()nlc()nx(1)nl\uf062\uf02d(1)nx\uf02b()nlz\f4 Experiments\n\nWe train and test ADMM-Net on brain and chest MR images2. For each dataset, we randomly\ntake 100 images for training and 50 images for testing. ADMM-Net is separately learned for each\nsampling ratio. The reconstruction accuracies are reported as the average NMSE and Peak Signal-\nto-Noise Ratio (PSNR) over the test images. The sampling pattern in k-space is the commonly used\npseudo radial sampling. All experiments are performed on a desktop with Intel core i7-4790k CPU.\n\nTable 1: Performance comparisons on brain data with different sampling ratios.\n\nMethod\n\nZero-\ufb01lling\nTV [2]\nRecPF [4]\nSIDWT\nPBDW [6]\nPANO [10]\nFDLCP [8]\nBM3D-MRI [11]\nInit-Net13\nADMM-Net13\nADMM-Net14\nADMM-Net15\n\n20%\n\n30%\n\n40%\n\n50%\n\nNMSE\n0.1700\n0.0929\n0.0917\n0.0885\n0.0814\n0.0800\n0.0759\n0.0674\n0.1394\n0.0752\n0.0742\n0.0739\n\nPSNR NMSE\n0.1247\n29.96\n0.0673\n35.20\n35.32\n0.0668\n0.0620\n35.66\n0.0627\n36.34\n0.0592\n36.52\n0.0592\n36.95\n37.98\n0.0515\n0.1225\n31.58\n0.0553\n37.01\n0.0548\n37.13\n37.17\n0.0544\n\nPSNR NMSE\n0.0968\n32.59\n0.0534\n37.99\n38.06\n0.0533\n0.0484\n38.72\n0.0518\n38.64\n0.0477\n39.13\n0.0500\n39.13\n40.33\n0.0426\n0.1128\n32.71\n0.0456\n39.70\n0.0448\n39.78\n39.84\n0.0447\n\nPSNR NMSE\n0.0770\n34.76\n0.0440\n40.00\n40.03\n0.0440\n0.0393\n40.88\n0.0437\n40.31\n0.0390\n41.01\n0.0428\n40.62\n41.99\n0.0359\n0.1066\n33.44\n0.0395\n41.37\n0.0380\n41.54\n41.56\n0.0379\n\nPSNR\n36.73\n41.69\n41.71\n42.67\n41.81\n42.76\n42.00\n43.47\n33.95\n42.62\n42.99\n43.00\n\nTest time\n\n0.0013s\n0.7391s\n0.3105s\n7.8637s\n35.3637s\n53.4776s\n52.2220s\n40.9114s\n0.6914s\n0.6964s\n0.7400s\n0.7911s\n\nIn Tab. 1, we compare our method to conventional compressive sensing MRI methods on brain data.\nThese methods include Zero-\ufb01lling [22], TV [2], RecPF [4], SIDWT 3, and also the state-of-the-art\nmethods such as PBDW [6], PANO [10], FDLCP [8] and BM3D-MRI [11]. For ADMM-Net, we\ninitialize the \ufb01lters in each stage to be eight 3 (cid:2) 3 DCT basis (the average DCT basis is discarded).\nCompared with the baseline methods such as Zero-\ufb01lling, TV, RecPF and SIDWT, our proposed\nmethod produces the best quality with comparable reconstruction speed. Compared with the state-\nof-the-art methods PBDW, PANO and FDLCP, our ADMM-Net has more accurate reconstruction\nresults with fastest computational speed. For the sampling ratio of 30%, our method (ADMM-\nNet15) outperforms the state-of-the-art methods PANO and FDLCP by 0.71 db. Moreover, our\nreconstruction speed is around 66 times faster. BM3D-MRI method relies on a well designed BM3D\ndenoiser, it produces higher accuracy, but runs around 50 times slower in computational time than\nours. The visual comparisons in Fig. 5 show that the proposed network can preserve the \ufb01ne image\ndetails without obvious artifacts. In Fig. 6(a), we compare the NMSEs and the average test time for\ndifferent methods using scatter plot. It is easy to observe that our method is the best considering\nthe reconstruction accuracy and running time. Examples of the learned nonlinear functions and the\n\ufb01lters are shown in Fig. 7.\n\nTable 2: Comparisons of NMSE and PSNR on chest data with 20% sampling ratio.\n\nMethod\nNMSE\nPSNR\n\nTV\n\n0.1019\n35.49\n\nRecPF\n0.1017\n35.51\n\nPANO FDLCP ADMM-Net15-B ADMM-Net15 ADMM-Net17\n0.0858\n37.01\n\n0.0775\n37.77\n\n0.0790\n37.68\n\n0.0775\n37.84\n\n0.0768\n37.92\n\nNetwork generalization ability: We test the generalization ability of ADMM-Net by applying the\nlearned net from brain data to chest data. Table 2 shows that our net learned from brain data (ADMM-\nNet15-B) still achieves competitive reconstruction accuracy on chest data, resulting in remarkable\na generalization ability. This might be due to that the learned \ufb01lters and nonlinear transforms are\nperformed over local patches, which are repetitive across different organs. Moreover, the ADMM-\nNet17 learned from chest data achieves the better reconstruction accuracy on test chest data.\nEffectiveness of network training: In Tab. 1, we also present the results of the initialized network\nfor ADMM-Net13. As discussed in Section 3.1, this initialized network (Init-Net13) is a realization\n\n2CAF Project: https://masi.vuse.vanderbilt.edu/workshop2013/index.php/Segmentation_Challenge_Details\n3Rice Wavelet Toolbox: http://dsp.rice.edu/software/rice-wavelet-toolbox\n\n7\n\n\fFigure 5: Examples of reconstruction results with 20% (the \ufb01rst row) and 30% (the second row)\nsampling ratios. The left four columns show results of ADMM-Net15, RecPF, PANO, BM3D-MRI.\n\nFigure 6: (a) Scatter plot of NMSEs and average test time for different methods; (b) The NMSEs of\nADMM-Net using different number of stages (20% sampling ratio for brain data).\n\nFigure 7: Examples of learned \ufb01lters in convolution layer and the corresponding nonlinear trans-\nforms (the \ufb01rst stage of ADMM-Net15 with 20% sampling ratio for brain data).\n\nof the ADMM optimizing Eqn. (12). The network after training produces signi\ufb01cantly improved\naccuracy, e.g., PNSR is increased from 32.71 db to 39.84 db with sampling ratio of 30%.\nEffect of the number of stages: To test the effect of the number of stages (i.e., Ns), we greedily train\ndeeper network by adding one stage at each time. Fig. 6(b) shows the average testing NMSE values\nusing different stages in ADMM-Net under the sampling ratio of 20%. The reconstruction error\ndecreases fast when Ns < 8 and marginally decreases when further increasing the number of stages.\nEffect of the \ufb01lter sizes: We also train ADMM-Net initialized by two gradient \ufb01lters with size of 1(cid:2)3\nand 3 (cid:2) 1 respectively for all convolution and reconstruction layers, the corresponding trained net\nwith 13 stages under 20% sampling ratio achieves NMSE value of 0.0899 and PSNR value of 36.52\ndb on brain data, compared with 0.0752 and 37.01 db using eight 3 (cid:2) 3 \ufb01lters as shown in Tab. 1.\nWe also learn ADMM-Net13 with 8 \ufb01lters sized 5 (cid:2) 5 initialized by DCT basis, the performance is\nnot signi\ufb01cantly improved, but the training and testing time are signi\ufb01cantly longer.\n\n5 Conclusions\n\nWe proposed a novel deep network for compressive sensing MRI. It is a novel deep architecture de-\n\ufb01ned over a data \ufb02ow graph determined by an ADMM algorithm. Due to its \ufb02exibility in parameter\nlearning, this deep net achieved high reconstruction accuracy while keeping the computational ef\ufb01-\nciency of the ADMM algorithm. As a general framework, the idea that models an ADMM algorithm\nas a deep network can be potentially applied to other applications in the future work.\n\n8\n\nNMSE:0.0564; PSNR:35.79NMSE:0.0727; PSNR:33.62NMSE:0.0489; PSNR:37.03NMSE:0.0612; PSNR:35.10NMSE:0.0660; PSNR:33.61NMSE:0.0843; PSNR:31.51NMSE:0.0726; PSNR:32.80NMSE:0.0614; PSNR:34.22Ground truth imageGround truth imageTest time in seconds(a)(b)NMSE15Stage number\fReferences\n[1] Michael Lustig, David L Donoho, Juan M Santos, and John M Pauly. Compressed sensing mri. IEEE\n\nJournal of Signal Processing, 25(2):72\u201382, 2008.\n\n[2] Michael Lustig, David Donoho, and John M Pauly. Sparse mri: The application of compressed sensing\n\nfor rapid mr imaging. Magnetic Resonance in Medicine, 58(6):1182\u20131195, 2007.\n\n[3] Kai Tobias Block, Martin Uecker, and Jens Frahm. Undersampled radial mri with multiple coils: Iterative\nimage reconstruction using a total variation constraint. Magnetic Resonance in Medicine, 57(6):1086\u2013\n1098, 2007.\n\n[4] Junfeng Yang, Yin Zhang, and Wotao Yin. A fast alternating direction method for tvl1-l2 signal recon-\nstruction from partial fourier data. IEEE Journal of Selected Topics in Signal Processing, 4(2):288\u2013297,\n2010.\n\n[5] Chen Chen and Junzhou Huang. Compressive sensing mri with wavelet tree sparsity. In Advances in\n\nNeural Information Processing Systems, pages 1115\u20131123, 2012.\n\n[6] Xiaobo Qu, Di Guo, Bende Ning, and et al. Undersampled mri reconstruction with patch-based directional\n\nwavelets. Magnetic resonance imaging, 30(7):964\u2013977, 2012.\n\n[7] Saiprasad Ravishankar and Yoram Bresler. Mr image reconstruction from highly undersampled k-space\n\ndata by dictionary learning. IEEE Transactions on Medical Imaging, 30(5):1028\u20131041, 2011.\n\n[8] Zhifang Zhan, Jian-Feng Cai, Di Guo, Yunsong Liu, Zhong Chen, and Xiaobo Qu. Fast multi-class\ndictionaries learning with geometrical directions in mri reconstruction. IEEE Transactions on Biomedical\nEngineering, 2016.\n\n[9] Sheng Fang, Kui Ying, Li Zhao, and Jianping Cheng. Coherence regularization for sense reconstruction\n\nwith a nonlocal operator (cornol). Magnetic Resonance in Medicine, 64(5):1413\u20131425, 2010.\n\n[10] Xiaobo Qu, Yingkun Hou, Fan Lam, Di Guo, Jianhui Zhong, and Zhong Chen. Magnetic resonance image\nreconstruction from undersampled measurements using a patch-based nonlocal operator. Medical Image\nAnalysis, 18(6):843\u2013856, 2014.\n\n[11] Ender M Eksioglu. Decoupled algorithm for mri reconstruction using nonlocal block matching model:\n\nBm3d-mri. Journal of Mathematical Imaging and Vision, pages 1\u201311, 2016.\n\n[12] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and\nstatistical learning via the alternating direction method of multipliers. Foundation and Trends in Machine\nLearning, 3(1):1\u2013122, 2011.\n\n[13] Huahua Wang, Arindam Banerjee, and Zhi-Quan Luo. Parallel direction method of multipliers. In Ad-\n\nvances in Neural Information Processing Systems, pages 181\u2013189, 2014.\n\n[14] Krishna M Kavi, Bill P Buckles, and U Narayan Bhat. A formal de\ufb01nition of data \ufb02ow graph models.\n\nIEEE Transactions on Computers, 100(11):940\u2013948, 1986.\n\n[15] Yann L\u00e9cun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to docu-\n\nment recognition. Proceedings of the IEEE, 86(11):2278\u20132324, 1998.\n\n[16] Karol Gregor and Yann LeCun. Learning fast approximations of sparse coding. In Proceedings of the\n\n27th International Conference on Machine Learning, pages 399\u2013406, 2010.\n\n[17] Uwe Schmidt and Stefan Roth. Shrinkage \ufb01elds for effective image restoration. In Proceedings of the\n\nIEEE Conference on Computer Vision and Pattern Recognition, pages 2774\u20132781, 2014.\n\n[18] Sun Jian and Xu Zongben. Color image denoising via discriminatively learned iterative shrinkage. IEEE\n\nTransactions on Image Processing, 24(11):4148\u20134159, 2015.\n\n[19] John R Hershey, Jonathan Le Roux, and Felix Weninger. Deep unfolding: Model-based inspiration of\n\nnovel deep architectures. arXiv preprint arXiv:1409.2574, 2014.\n\n[20] Francis Bach, Rodolphe Jenatton, Julien Mairal, and Guillaume Obozinski. Optimization with sparsity-\n\ninducing penalties. Foundations and Trends in Machine Learning, 4(1):1\u2013106, 2012.\n\n[21] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-\n\npropagating errors. Cognitive modeling, 5(3):1, 1988.\n\n[22] Matt A Bernstein, Sean B Fain, and Stephen J Riederer. Effect of windowing and zero-\ufb01lled reconstruction\nof mri data on spatial resolution and acquisition strategy. Magnetic Resonance Imaging, 14(3):270\u2013280,\n2001.\n\n9\n\n\f", "award": [], "sourceid": 6, "authors": [{"given_name": "yan", "family_name": "yang", "institution": "Xi'an Jiaotong University"}, {"given_name": "Jian", "family_name": "Sun", "institution": "Xi'an Jiaotong University"}, {"given_name": "Huibin", "family_name": "Li", "institution": "Xi'an Jiaotong University"}, {"given_name": "Zongben", "family_name": "Xu", "institution": null}]}