{"title": "Clone MCMC: Parallel High-Dimensional Gaussian Gibbs Sampling", "book": "Advances in Neural Information Processing Systems", "page_first": 5020, "page_last": 5028, "abstract": "We propose a generalized Gibbs sampler algorithm for obtaining samples approximately distributed from a high-dimensional Gaussian distribution. Similarly to Hogwild methods, our approach does not target the original Gaussian distribution of interest, but an approximation to it. Contrary to Hogwild methods, a single parameter allows us to trade bias for variance. We show empirically that our method is very flexible and performs well compared to Hogwild-type algorithms.", "full_text": "Clone MCMC: Parallel High-Dimensional Gaussian\n\nGibbs Sampling\n\nAndrei-Cristian B\u02d8arbos\n\nIMS Laboratory\n\nUniv. Bordeaux - CNRS - BINP\nandbarbos@u-bordeaux.fr\n\nJean-Fran\u00e7ois Giovannelli\n\nIMS Laboratory\n\nUniv. Bordeaux - CNRS - BINP\n\ngiova@ims-bordeaux.fr\n\nFran\u00e7ois Caron\n\nDepartment of Statistics\n\nUniversity of Oxford\n\ncaron@stats.ox.ac.uk\n\nArnaud Doucet\n\nDepartment of Statistics\n\nUniversity of Oxford\n\ndoucet@stats.ox.ac.uk\n\nAbstract\n\nWe propose a generalized Gibbs sampler algorithm for obtaining samples approx-\nimately distributed from a high-dimensional Gaussian distribution. Similarly to\nHogwild methods, our approach does not target the original Gaussian distribution\nof interest, but an approximation to it. Contrary to Hogwild methods, a single pa-\nrameter allows us to trade bias for variance. We show empirically that our method\nis very \ufb02exible and performs well compared to Hogwild-type algorithms.\n\n1\n\nIntroduction\n\nSampling high-dimensional distributions is notoriously dif\ufb01cult in the presence of strong dependence\nbetween the different components. The Gibbs sampler proposes a simple and generic approach, but\nmay be slow to converge, due to its sequential nature. A number of recent papers have advocated\nthe use of so-called \"Hogwild Gibbs samplers\", which perform conditional updates in parallel,\nwithout synchronizing the outputs. Although the corresponding algorithms do not target the correct\ndistribution, this class of methods has shown to give interesting empirical results, in particular for\nLatent Dirichlet Allocation models [1, 2] and Gaussian distributions [3].\nIn this paper, we focus on the simulation of high-dimensional Gaussian distributions. In numerous\napplications, such as computer vision, satellite imagery, medical imaging, tomography or weather\nforecasting, simulation of high-dimensional Gaussians is needed for prediction, or as part of a Markov\nchain Monte Carlo (MCMC) algorithm. For example, [4] simulate high dimensional Gaussian\nrandom \ufb01elds for prediction of hydrological and meteorological quantities. For posterior inference\nvia MCMC in a hierarchical Bayesian model, elementary blocks of a Gibbs sampler often require to\nsimulate high-dimensional Gaussian variables. In image processing, the typical number of variables\n(pixels/voxels) is of the order of 106/109. Due to this large size, Cholesky factorization is not\napplicable; see for example [5] or [6].\nIn [7, 8] the sampling problem is recast as an optimisation one: a sample is obtained by minimising a\nperturbed quadratic criterion. The cost of the algorithm depends on the choice of the optimisation\ntechnique. Exact resolution is prohibitively expensive so an iterative solver with a truncated number\nof iterations is typically used [5] and the distribution of the samples one obtains is unknown.\nIn this paper, we propose an alternative class of iterative algorithms for approximately sampling\nhigh-dimensional Gaussian distributions. The class of algorithms we propose borrows ideas from\noptimization and linear solvers. Similarly to Hogwild algorithms, our sampler does not target the\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fdistribution of interest but an approximation to this distribution. A single scalar parameter allows\nus to tune both the error and the convergence rate of the Markov chain, allowing to trade variance\nfor bias. We show empirically that the method is very \ufb02exible and performs well compared to\nHogwild algorithms. Its performance are illustrated on a large-scale image inpainting-deconvolution\napplication.\nThe rest of the article is organized as follows. In Section 2, we review the matrix splitting techniques\nthat have been used to propose novel algorithms to sample high-dimensional normals. In Section\n3, we present our novel methodology. Section 4 provides the intuition for such a scheme, which\nwe refer to as clone MCMC, and discusses some generalization of the idea to non-Gaussian target\ndistributions. We compare empirically Hogwild and our methodology on a variety of simulated\nexamples in Section 5. The application to image inpainting-deconvolution is developed in Section 6.\n\n2 Background on matrix splitting and Hogwild Gaussian sampling\n\nWe consider a d-dimensional Gaussian random variable X with mean \u00b5 and positive de\ufb01nite covari-\nance matrix \u03a3. The probability density function of X, evaluated at x = (x1 . . . , xd)T, is\n\n(cid:27)\n\n(cid:26)\n\n(cid:27)\n\n(cid:26)\n\n\u03c0(x) \u221d exp\n\n(x \u2212 \u00b5)T \u03a3\u22121 (x \u2212 \u00b5)\n\n\u2212 1\n2\n\n\u221d exp\n\n\u2212 1\n2\n\nxTJ x + hTx\n\nwhere J = \u03a3\u22121 is the precision matrix and h = J\u00b5 the potential vector. Typically, the pair (h, J)\nis available, and the objective is to estimate (\u00b5, \u03a3) or to simulate from \u03c0. For moderate-size or\nsparse precision matrices, the standard method for exact simulation from \u03c0 is based on the Cholesky\ndecomposition of \u03a3, which has computational complexity O(d3) in the most general case [9]. If\nd is very large, the cost of Cholesky decomposition becomes prohibitive and iterative methods are\nfavoured due to their smaller cost per iteration and low memory requirements. A principled iterative\napproach to draw samples approximately distributed from \u03c0 is the single-site Gibbs sampler, which\nsimulates a Markov chain (X (i))i=1,2,... with stationary distribution \u03c0 by updating each variable in\nturn from its conditional distribution. A complete update of the d variables can be written in matrix\nform as\n\nX (i+1) = \u2212(D + L)\u22121LTX (i) + (D + L)\u22121Z (i+1), Z (i+1) \u223c N (h, D)\n\n(1)\nwhere D is the diagonal part of J and L is is the strictly lower triangular part of J. Equation (1)\nhighlights the connection between the Gibbs sampler and linear iterative solvers as\n\nE[X (i+1)|X (i) = x] = \u2212(D + L)\u22121LTx + (D + L)\u22121h\n\nis the expression of the Gauss-Seidel linear iterative solver update to solve the system J\u00b5 = h for a\ngiven pair (h, J). The single-site Gaussian Gibbs sampler can therefore be interpreted as a stochastic\nversion of the Gauss-Seidel linear solver. This connection has been noted by [10] and [11], and later\nexploited by [3] to analyse the Hogwild Gibbs sampler and by [6] to derive a family of Gaussian\nGibbs samplers.\nThe Gauss-Seidel iterative solver is just a particular example of a larger class of matrix splitting\nsolvers [12]. In general, consider the linear system J\u00b5 = h and the matrix splitting J = M \u2212 N,\nwhere M is invertible. Gauss-Seidel corresponds to setting M = D + L and N = \u2212LT. More\ngenerally, [6] established that the Markov chain with transition\n\nX (i+1) = M\u22121N X (i) + M\u22121Z (i+1), Z (i+1) \u223c N (h, M T + N )\n\n(2)\n\nadmits \u03c0 as stationary distribution if and only if the associated iterative solver with update\n\nx(i+1) = M\u22121N x(i) + M\u22121h\n\nis convergent; that is if and only if \u03c1(M\u22121N ) < 1, where \u03c1 denotes the spectral radius. Using\nthis result, [6] built on the large literature on linear iterative solvers in order to derive generalized\nGibbs samplers with the correct Gaussian target distribution, extending the approaches proposed by\n[10, 11, 13].\nThe practicality of the iterative samplers with transition (2) and matrix splitting (M, N ) depends on\n\n\u2022 How easy it is to solve the system M x = r for any r,\n\n2\n\n\f\u2022 How easy it is to sample from N (0, M T + N ).\n\nAs noted by [6], there is a necessary trade-off here. The Jacobi splitting M = D would lead to a\nsimple solution to the linear system, but sampling from a Gaussian distribution with covariance matrix\nM T + N would be as complicated as solving the original sampling problem. The Gauss-Seidel\nsplitting M = D + L provides an interesting trade-off as M x = r can be solved by substitution\nand M T + N = D is diagonal. The method of successive over-relaxation (SOR) uses a splitting\nM = \u03c9\u22121D + L with an additional tuning parameter \u03c9 > 0. In both the SOR and Gauss-Seidel\ncases, the system M x = r can be solved by substitution in O(d2), but the resolution of the linear\nsystem cannot be parallelized.\nAll the methods discussed so far asymptotically sample from the correct target distribution. The\nHogwild Gaussian Gibbs sampler does not, but its properties can also be analyzed using techniques\nfrom the linear iterative solver literature as demonstrated by [3]. For simplicity of exposure, we focus\nhere on the Hogwild sampler with blocks of size 1. In this case, the Hogwild algorithm simulates a\nMarkov chain with transition\n\nX (i+1) = M\u22121\n\nHogNHogX (i) + M\u22121\n\nHogZ (i+1), Z (i+1) \u223c N (h, MHog)\n\nwhere MHog = D and NHog = \u2212(L + LT). This update is highly amenable to parallelization as\nMHog is diagonal thus one can easily solve the system MHogx = r and sample from N (0, MHog). [3]\nshowed that if \u03c1(M\u22121\n\nHogNHog) < 1, the Markov chain admits N (\u00b5,(cid:101)\u03a3) as stationary distribution where\n\n(cid:101)\u03a3 = (I + M\u22121\n\nHogNHog)\u22121\u03a3.\n\nThe above approach can be generalized to blocks of larger sizes. However, beyond the block size, the\nHogwild sampler does not have any tunable parameter allowing us to modify its incorrect stationary\ndistribution. Depending on the computational budget, we may want to trade bias for variance. In the\nnext Section, we describe our approach, which offers such \ufb02exibility.\n\n3 High-dimensional Gaussian sampling\nLet J = M \u2212 N be a matrix splitting, with M positive semi-de\ufb01nite. Consider the Markov chain\n(X (i))i=1,2,... with initial state X (0) and transition\n\nX (i+1) = M\u22121N X (i) + M\u22121Z (i+1), Z (i+1) \u223c N (h, 2M ).\n\n(3)\n\nThe following theorem shows that, if the corresponding iterative solver converges, the Markov chain\nconverges to a Gaussian distribution with the correct mean and an approximate covariance matrix.\nTheorem 1. If \u03c1(M\u22121N ) < 1, the Markov chain (X (i))i=1,2,... de\ufb01ned by (3) has stationary\n\ndistribution N (\u00b5,(cid:101)\u03a3) where\n\nProof. The equivalence between the convergence of the iterative linear solvers and their stochastic\n\ncounterparts was established in [6, Theorem 1]. The mean(cid:101)\u00b5 of the stationary distribution veri\ufb01es the\n\nrecurrence\n\nhence\n\n(cid:101)\u03a3 = 2(cid:0)I + M\u22121N(cid:1)\u22121\n\n\u03a3\n\n= (I \u2212 1\n2\n\nM\u22121\u03a3\u22121)\u22121\u03a3.\n\n(cid:101)\u00b5 = M\u22121N(cid:101)\u00b5 + M\u22121\u03a3\u22121\u00b5\n\n(I \u2212 M\u22121N )(cid:101)\u00b5 = M\u22121\u03a3\u22121\u00b5 \u21d4 (cid:101)\u00b5 = \u00b5\n(cid:19)\u22121(cid:33)\n(cid:18) Y1\n(cid:19)\n(cid:18) M/2 \u2212N/2\n\n(cid:32)(cid:18) \u00b5\n\n(cid:19)\n\n,\n\n\u00b5\n\n\u2212N/2 M/2\n\n= N\n\nY2\n\n3\n\nas \u03a3\u22121 = M \u2212 N. For the covariance matrix, consider the 2d-dimensional random variable\n\n(4)\n\n\fThen using standard manipulations of multivariate Gaussians and the inversion lemma on block\nmatrices we obtain\n\nand\n\nY1|Y2 \u223c N (M\u22121N Y2 + M\u22121h, 2M\u22121)\nY2|Y1 \u223c N (M\u22121N Y1 + M\u22121h, 2M\u22121)\n\nY1 \u223c N (\u00b5,(cid:101)\u03a3), Y2 \u223c N (\u00b5,(cid:101)\u03a3)\n\nThe above proof is not constructive, and we give in Section 4 the intuition behind the choice of the\ntransition and the name clone MCMC.\nWe will focus here on the following matrix splitting\n\nM = D + 2\u03b7I, N = 2\u03b7I \u2212 L \u2212 LT\n\n(5)\nwhere \u03b7 \u2265 0. Under this matrix splitting, M is a diagonal matrix and an iteration only involves a\nmatrix-vector multiplication of computational cost O(d2). This operation can be easily parallelized.\nEach update has thus the same computational complexity as the Hogwild algorithm. We have\n\n(D + 2\u03b7I)\u22121\u03a3\u22121)\u22121\u03a3.\n\n2\n\n\u03b7\u2192\u221e \u03c1(M\u22121N ) = 1.\n\nlim\n\nSince M\u22121 \u2192 0 and M\u22121N \u2192 I for \u03b7 \u2192 \u221e, we have\n\n(cid:101)\u03a3 = (I \u2212 1\n\u03b7\u2192\u221e(cid:101)\u03a3 = \u03a3,\n\nlim\n\nThe parameter \u03b7 is an easily interpretable tuning parameter for the method: as \u03b7 increases, the\nstationary distribution of the Markov chain becomes closer to the target distribution, but the samples\nbecome more correlated.\nFor example, consider the target precision matrix J = \u03a3\u22121 with Jii = 1, Jij = \u22121/(d + 1) for\n(cid:80)ns\ni (cid:54)= j and d = 1000. The proposed sampler is run for different values of \u03b7 in order to estimate the\n(cid:80)ns\ni=1 X (i) is the estimated mean. The Figure 1(a) reports the bias term ||\u03a3 \u2212(cid:101)\u03a3||,\ni=1(X (i) \u2212 \u02c6\u00b5)T(X (i) \u2212 \u02c6\u00b5) be the estimated covariance matrix\ncovariance matrix \u03a3. Let \u02c6\u03a3 = 1/ns\nthe variance term ||(cid:98)\u03a3 \u2212(cid:101)\u03a3|| and the overall error ||\u03a3 \u2212(cid:98)\u03a3|| as a function of \u03b7, using ns = 10000\nwhere \u02c6\u00b5 = 1/ns\nsamples and 100 replications, with || \u00b7 || the (cid:96)2 (Frobenius) norm. As \u03b7 increases, the bias term\ndecreases while the variance term increases, yielding an optimal value at \u03b7 (cid:39) 10. Figure 1(b-c) show\nthe estimation error for the mean and covariance matrix as a function of \u03b7, for different sample sizes.\nFigure 2 shows the estimation error as a function of the sample size for different values of \u03b7.\nThe following theorem gives a suf\ufb01cient condition for the Markov chain to converge for any value \u03b7.\nTheorem 2. Let M = D + 2\u03b7I, N = 2\u03b7I \u2212 L \u2212 LT. A suf\ufb01cient condition for \u03c1(M\u22121N ) < 1 for\nall \u03b7 \u2265 0 is that \u03a3\u22121 is strictly diagonally dominant.\n\nProof. M is non singular, hence\n\ndet(M\u22121N \u2212 \u03bbI) = 0 \u21d4 det(N \u2212 \u03bbM ) = 0.\n\n\u03a3\u22121 = M \u2212 N is diagonally dominant, hence \u03bbM \u2212 N = (\u03bb \u2212 1)M + M \u2212 N is also diagonally\ndominant for any \u03bb \u2265 1. From Gershgorin\u2019s theorem, a diagonally dominant matrix is nonsingular,\nso det(N \u2212 \u03bbM ) (cid:54)= 0 for all \u03bb \u2265 1. We conclude that \u03c1(M\u22121N ) < 1.\n\n4\n\n\f(a) ns = 20000\n\n(b) ||\u03a3 \u2212(cid:98)\u03a3||\n\n(c) ||\u00b5 \u2212(cid:98)\u00b5||\n\nFigure 1: In\ufb02uence of the tuning parameter \u03b7 on the estimation error\n\n(a) ||\u03a3 \u2212(cid:98)\u03a3||\n\n(b) ||\u00b5 \u2212(cid:98)\u00b5||\n\nFigure 2: In\ufb02uence of the sample size on the estimation error\n\n4 Clone MCMC\n\nWe now provide some intuition on the construction given in Section 3, and justify the name given to\nthe method. The joint pdf of (Y1, Y2) on R2d de\ufb01ned in (4) with matrix splitting (5) can be expressed\nas\n\n(cid:101)\u03c0\u03b7(y1, y2) \u221d exp{\u2212 \u03b7\n\n(y1 \u2212 y2)T(y1 \u2212 y2)}\n2\n(y1 \u2212 \u00b5)TD(y1 \u2212 \u00b5) \u2212 1\n\u00d7 exp{\u2212 1\n4\n4\n\u00d7 exp{\u2212 1\n(y2 \u2212 \u00b5)TD(y2 \u2212 \u00b5) \u2212 1\n4\n4\n\n(y1 \u2212 \u00b5)T(L + LT)(y2 \u2212 \u00b5)}\n(y2 \u2212 \u00b5)T(L + LT)(y1 \u2212 \u00b5)}\n\nWe can interpret the joint pdf above as having cloned the original random variable X into two\ndependent random variables Y1 and Y2. The parameter \u03b7 tunes the correlation between the two\nthe method. As \u03b7 \u2192 \u221e, the clones become more and more correlated, with corr(Y1, Y2) \u2192 1 and\n\nk=1(cid:101)\u03c0\u03b7(y1k|y2), which allows for straightforward parallelization of\n\nvariables, and(cid:101)\u03c0\u03b7(y1|y2) =(cid:81)d\n(cid:101)\u03c0\u03b7(y1) \u2192 \u03c0(y1).\n\nThe idea can be generalized further to pairwise Markov random \ufb01elds. Consider the target distribution\n\n\uf8eb\uf8ed\u2212 (cid:88)\n\n1\u2264i\u2264j\u2264d\n\n\uf8f6\uf8f8\n\n\u03c0(x) \u221d exp\n\n\u03c8ij(xi, xj)\n\nfor some potential functions \u03c8ij, 1 \u2264 i \u2264 j \u2264 d. The clone pdf is\n\n(cid:101)\u03c0(y1, y2) \u221d exp{\u2212 \u03b7\n\n(y1 \u2212 y2)T(y1 \u2212 y2) \u2212 1\n2\n\n2\n\n(\u03c8ij(y1i, y2i) + \u03c8ij(y2i, y1i))}\n\nwhere\n\n(cid:101)\u03c0(y1k|y2).\nAssuming(cid:101)\u03c0 is a proper pdf, we have(cid:101)\u03c0(y1) \u2192 \u03c0(y1) as \u03b7 \u2192 \u221e.\n\n(cid:101)\u03c0(y1|y2) =\n\nk=1\n\n(cid:88)\n\n1\u2264i\u2264j\u2264d\n\nd(cid:89)\n\n5\n\n\f(a) 10s\n\n(b) 80s\n\n(c) 120s\n\nFigure 3: Estimation error for the covariance matrix \u03a31 for \ufb01xed computation time, d = 1000.\n\n(a) 10s\n\n(b) 80s\n\n(c) 120s\n\nFigure 4: Estimation error for the covariance matrix \u03a32 for \ufb01xed computation time, d = 1000.\n\n5 Comparison with Hogwild and Gibbs sampling\n\nIn this section, we provide an empirical comparison of the proposed approach with the Gibbs sampler\nand Hogwild algorithm, using the splitting (5). Note that in order to provide a fair comparison\nbetween the algorithms, we only consider the single-site Gibbs sampling and block-1 Hogwild\nalgorithms, whose updates are respectively given in Equations (1) and (2). Versions of all three\nalgorithms could also be developed with blocks of larger sizes.\nWe consider the following two precision matrices.\n\n\uf8eb\uf8ec\uf8ec\uf8ec\uf8ec\uf8ed\n\n\u03a3\u22121\n1 =\n\n\u2212\u03b1\n\n...\n\n1\n\u2212\u03b1 1 + \u03b12 \u2212\u03b1\n...\n\u2212\u03b1 1 + \u03b12 \u2212\u03b1\n1\n\n...\n\u2212\u03b1\n\n\uf8f6\uf8f7\uf8f7\uf8f7\uf8f7\uf8f8 , \u03a3\u22121\n\n2 =\n\n\uf8eb\uf8ec\uf8ec\uf8ed...\n\n...\n0.15\n\n...\n0.3\n...\n\n...\n1\n...\n\n...\n0.3\n...\n\n0.15\n...\n\n...\n\n\uf8f6\uf8f7\uf8f7\uf8f8\n\nwhere for the \ufb01rst precision matrix we have \u03b1 = 0.95. Experiments are run on GPU with 2688\nCUDA cores. In order to compare the algorithms, we run each algorithm for a \ufb01xed execution time\n(10s, 80s and 120s). Computation time per iteration for Hogwild and Clone MCMC are similar, and\nthey return a similar number of samples. The computation time per iteration of the Gibbs sampling is\nmuch higher, due to the lack of parallelization, and it returns less samples. For Hogwild and Clone\n\nMCMC, we report both the approximation error ||\u03a3 \u2212(cid:101)\u03a3|| and the estimation error ||\u03a3 \u2212(cid:98)\u03a3||. For\n\nGibbs, only the estimation error is reported.\nFigures 3 and 4 show that, for a range of values of \u03b7, our method outperforms both Hogwild and\nGibbs, whatever the execution time. As the computational budget increases, the optimal value for \u03b7\nincreases.\n\n6 Application to image inpainting-deconvolution\n\nIn order to demonstrate the usefulness of the approach, we consider an application to image inpainting-\ndeconvolution. Let\n\nY = T HX + B, B \u223c N (0, \u03a3b)\n\n(6)\n\n6\n\n\f(a) True image\n\n(b) Observed Image\n\n(c) Posterior mean\n(optimization)\n\n(d) Posterior mean\n(clone MCMC)\n\nFigure 5: Deconvolution-Interpolation results\n\nbe the observation model where Y \u2208 Rn is the observed image, X \u2208 Rd is the true image, B \u2208 Rn\nis the noise component, H \u2208 Rd\u00d7d is the convolution matrix and T \u2208 Rn\u00d7d is the truncation matrix.\nThe observation noise is assumed to be independent of X with \u03a3\u22121\nb = \u03b3bI and \u03b3b = 10\u22122. Assume\n\nX \u223c N (0, \u03a3x)\n\nwith\n\n\u03a3\u22121\nx = \u03b301d1T\n\nd + \u03b31CC T\n\nwherein 1d is a column vector of size d having all elements equal to 1/d, C is the block-Toeplitz\nconvolution matrix corresponding to the 2D Laplacian \ufb01lter and \u03b30 = \u03b31 = 10\u22122.\nThe objective is to sample from the posterior distribution\n\nX|Y = y \u223c N (\u00b5x|y, \u03a3x|y)\n\nwhere\n\n\u03a3\u22121\nx|y = H TT T\u03a3\u22121\n\u00b5x|y = \u03a3x|yH TT T\u03a3\u22121\nb y.\n\nb T H + \u03a3\u22121\n\nx\n\nHogNHog is greater than 1.\n\nThe true unobserved image is of size 1000 \u00d7 1000, hence the posterior distribution corresponds to a\nrandom variable of size d = 106. We have considered that 20% of the pixels are not observed. The\ntrue image is given in Figure 5(a); the observed image is given in Figure 5(b).\nIn this high-dimensional setting with d = 106, direct sampling via Cholesky decomposition or\nstandard single-site Gibbs algorithm are not applicable. We have implemented the block-1 Hogwild\nalgorithm. However, in this scenario the algorithm diverges, which is certainly due to the fact that the\nspectral radius of M\u22121\nWe run our clone MCMC algorithm for ns = 19000 samples, out of which the \ufb01rst 4000 were\ndiscarded as burn-in samples, using as initialization the observed image, with missing entries padded\nwith zero. The tuning parameter \u03b7 is set to 1. Figure 5(c) contains the reconstructed image that was\nobtained by numerically maximizing the posterior distribution using gradient ascent. We shall take\nthis image as reference when evaluating the reconstructed image computed as the posterior mean\nfrom the drawn samples. The reconstructed image is given in Figure 5(d).\nIf we compare the restored image with the one obtained by the optimization approach we can\nimmediately see that the two images are visually very similar. This observation is further reinforced\nby the top plot from Figure 6 where we have depicted the same line of pixels from both images. The\nline of pixels that is displayed is indicated by the blue line segments in Figure 5(d). The traces in\ngrey represent the 99% credible intervals. We can see that for most of the pixels, if not for all for that\nmatter, the estimated value lies well within the 99% credible intervals. The bottom plot from Figure\n6 displays the estimated image together with the true image for the same line of pixels, showing an\naccurate estimation of the true image. Figure 7 shows traces of the Markov chains for 4 selected\npixels. Their exact position is indicated in Figure 5(b). The red marker corresponds to an observed\npixel from a region having a mid-grey tone. The green marker corresponds to an observed pixel from\na white tone region. The dark blue marker corresponds to an observed pixel from dark tone region.\n\n7\n\n\fFigure 6: Line of pixels from the restored image\n\nFigure 7: Markov chains for selected pixels, clone MCMC\n\nThe cyan marker corresponds to an observed pixel from a region having a tone between mid-grey and\nwhite.\nThe choice of \u03b7 can be a sensible issue for the practical implementation of the algorithm. We observed\nempirically convergence of our algorithm for any value \u03b7 greater than 0.075. This is a clear advantage\nover Hogwild, as our approach is applicable in settings where Hogwild is not as it diverges, and\noffers an interesting way of controlling the bias/variance trade-off. We plan to investigate methods to\nautomatically choose the tuning parameter \u03b7 in future work.\n\nReferences\n[1] D. Newman, P. Smyth, M. Welling, and A. Asuncion. Distributed inference for latent Dirichlet\n\nallocation. In Advances in neural information processing systems, pages 1081\u20131088, 2008.\n\n[2] R. Bekkerman, M. Bilenko, and J. Langford. Scaling up machine learning: Parallel and\n\ndistributed approaches. Cambridge University Press, 2011.\n\n[3] M. Johnson, J. Saunderson, and A. Willsky. Analyzing Hogwild parallel Gaussian Gibbs\nsampling. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger,\neditors, Advances in Neural Information Processing Systems 26, pages 2715\u20132723. Curran\nAssociates, Inc., 2013.\n\n[4] Y. Gel, A. E. Raftery, T. Gneiting, C. Tebaldi, D. Nychka, W. Briggs, M. S. Roulston, and V. J.\nBerrocal. Calibrated probabilistic mesoscale weather \ufb01eld forecasting: The geostatistical output\nperturbation method. Journal of the American Statistical Association, 99(467):575\u2013590, 2004.\n\n[5] C. Gilavert, S. Moussaoui, and J. Idier. Ef\ufb01cient Gaussian sampling for solving large-scale\ninverse problems using MCMC. Signal Processing, IEEE Transactions on, 63(1):70\u201380, January\n2015.\n\n8\n\n\f[6] C. Fox and A. Parker. Accelerated Gibbs sampling of normal distributions using matrix splittings\n\nand polynomials. Bernoulli, 23(4B):3711\u20133743, 2017.\n\n[7] G. Papandreou and A. L. Yuille. Gaussian sampling by local perturbations. In J. D. Lafferty,\nC. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, editors, Advances in Neural\nInformation Processing Systems 23, pages 1858\u20131866. Curran Associates, Inc., 2010.\n\n[8] F. Orieux, O. F\u00e9ron, and J. F. Giovannelli. Sampling high-dimensional Gaussian distributions\n\nfor general linear inverse problems. IEEE Signal Processing Letters, 19(5):251\u2013254, 2012.\n\n[9] H. Rue. Fast sampling of Gaussian Markov random \ufb01elds. Journal of the Royal Statistical\n\nSociety: Series B, 63(2):325\u2013338, 2001.\n\n[10] S.L. Adler. Over-relaxation method for the Monte Carlo evaluation of the partition function for\n\nmultiquadratic actions. Physical Review D, 23(12):2901, 1981.\n\n[11] P. Barone and A. Frigessi. Improving stochastic relaxation for Gaussian random \ufb01elds. Proba-\n\nbility in the Engineering and Informational sciences, 4(03):369\u2013389, 1990.\n\n[12] G. Golub and C. Van Loan. Matrix Computations. The John Hopkins University Press,\n\nBaltimore, Maryland 21218-4363, Fourth edition, 2013.\n\n[13] G.O. Roberts and S.K. Sahu. Updating schemes, correlation structure, blocking and parameteri-\nzation for the Gibbs sampler. Journal of the Royal Statistical Society: Series B, 59(2):291\u2013317,\n1997.\n\n9\n\n\f", "award": [], "sourceid": 2590, "authors": [{"given_name": "Andrei-Cristian", "family_name": "Barbos", "institution": "University of Bordeaux"}, {"given_name": "Francois", "family_name": "Caron", "institution": null}, {"given_name": "Jean-Fran\u00e7ois", "family_name": "Giovannelli", "institution": "University of Bordeaux"}, {"given_name": "Arnaud", "family_name": "Doucet", "institution": "Oxford"}]}