{"title": "Quality Aware Generative Adversarial Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 2948, "page_last": 2958, "abstract": "Generative Adversarial Networks (GANs) have become a very popular tool for im-\nplicitly learning high-dimensional probability distributions. Several improvements\nhave been made to the original GAN formulation to address some of its shortcom-\nings like mode collapse, convergence issues, entanglement, poor visual quality etc.\nWhile a significant effort has been directed towards improving the visual quality\nof images generated by GANs, it is rather surprising that objective image quality\nmetrics have neither been employed as cost functions nor as regularizers in GAN\nobjective functions. In this work, we show how a distance metric that is a variant\nof the Structural SIMilarity (SSIM) index (a popular full-reference image quality\nassessment algorithm), and a novel quality aware discriminator gradient penalty\nfunction that is inspired by the Natural Image Quality Evaluator (NIQE, a popular\nno-reference image quality assessment algorithm) can each be used as excellent\nregularizers for GAN objective functions. Specifically, we demonstrate state-of-\nthe-art performance using the Wasserstein GAN gradient penalty (WGAN-GP)\nframework over CIFAR-10, STL10 and CelebA datasets.", "full_text": "Quality Aware Generative Adversarial Networks\n\nParimala Kancharla, Sumohana S. Channappayya\n\n{ee15m17p100001, sumohana}@iith.ac.in\n\nDepartment of Electrical Engineering\n\nIndian Institute of Technology Hyderabad\n\nAbstract\n\nGenerative Adversarial Networks (GANs) have become a very popular tool for im-\nplicitly learning high-dimensional probability distributions. Several improvements\nhave been made to the original GAN formulation to address some of its shortcom-\nings like mode collapse, convergence issues, entanglement, poor visual quality etc.\nWhile a signi\ufb01cant effort has been directed towards improving the visual quality\nof images generated by GANs, it is rather surprising that objective image quality\nmetrics have neither been employed as cost functions nor as regularizers in GAN\nobjective functions. In this work, we show how a distance metric that is a variant\nof the Structural SIMilarity (SSIM) index (a popular full-reference image quality\nassessment algorithm), and a novel quality aware discriminator gradient penalty\nfunction that is inspired by the Natural Image Quality Evaluator (NIQE, a popular\nno-reference image quality assessment algorithm) can each be used as excellent\nregularizers for GAN objective functions. Speci\ufb01cally, we demonstrate state-of-\nthe-art performance using the Wasserstein GAN gradient penalty (WGAN-GP)\nframework over CIFAR-10, STL10 and CelebA datasets.\n\n1\n\nIntroduction\n\nGenerative Adversarial Networks (GANs) [Goo+14] have become a very popular tool for implicitly\nlearning high-dimensional probability distributions. A large number of very interesting and useful\napplications have emerged due to the ability of GANs to learn complex real-world distributions.\nSome of these include image translation [Iso+17], image super-resolution [Led+17], image saliency\ndetection [Pan+17] etc. While GANs have indeed become very popular, they suffer from drawbacks\nsuch as mode collapse, convergence issues, entanglement, poor visual quality etc. A signi\ufb01cant\namount of research effort has focused on addressing these shortcomings in the original GAN formula-\ntion. While the literature does consider the quality of the generated images as a performance metric,\nit is rather surprising that the use of objective image quality metrics in the GAN cost function has\nbeen very limited. We address this lacuna with our contributions as summarized below:\n\n\u2022 We make explicit use of objective image quality assessment (IQA) metrics and their variants\nfor regularizing WGAN with gradient penalty (WGAN-GP), and propose Quality Aware\nGANs (QAGANs).\n\n\u2022 We propose a novel quality aware discriminator gradient penalty function based on the local\n\nstatistical signature of natural images as in NIQE [MSB13].\n\n\u2022 We demonstrate state-of-the-art performance on CIFAR-10, STL 10 and CelebA datasets\n\nfor non-progressive GANs.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\f2 Background\n\nWe review relevant works on GANs and IQA algorithms to setup the background necessary to present\nour proposed quality aware GANs.\n\n2.1 Generative Adversarial Networks (GANs)\n\nGiven the explosive growth in the literature on GANs, we only review representative works in the\nfollowing. GANs [Goo+14] pose the the problem of learning a high-dimensional distribution from\ndata samples in a game-theoretic framework. A typical GAN architecture consists of a generator\nmodelled by a neural network and denoted by G and parameterized by \u03b8g, and a discriminator also\nmodelled by a neural network denoted by D and parameterized by \u03b8d. The goal of the generator is to\ngenerate samples denoted G(z) (where z is a noise random variable with prior Pz) that \u201cmimic\u201d true\ndata samples x drawn from a distribution Pr. The discriminator\u2019s goal is to maximize its ability to\ntell G(z) apart from x. The generator and discriminator engage in an adversarial combat or game to\ntrain each other (simultaneously). This game is formulated as\n\nmin\n\nG\n\nmax\n\nD\n\nV (D, G) = Ex\u223cPr [log(D(x))] + Ez\u223cPz [log(1 \u2212 D(G(z)))]\n\n(1)\n\nThe value function V (D, G) is de\ufb01ned so that the discriminator tries to maximize the probability\nof assigning the correct label to G(z) and x via the term Ex\u223cPr [log(D(x))] while the generator\nsimultaneously tries to minimize the term Ez\u223cPz [log(1 \u2212 D(G(z)))]. The model parameters \u03b8g, \u03b8d\nare learnt by solving this optimization in an iterative fashion. This formulation suffered from the\ndrawbacks mentioned earlier - mode collapse, convergence issues while training, poor visual quality\netc.\n\n2.1.1 Wasserstein GAN with Gradient Penalty (WGAN-GP)\n\nWasserstein GAN (WGAN) [ACB17] was one of the \ufb01rst works to address training issues in the\noriginal GAN formulation. It introduced the Wasserstein distance to compare distributions and showed\nthat it possesses a number of useful convergence and continuity properties. It also showed that 1-\nLipschitz functions can be used in practise to \ufb01nd the Wasserstein distance between distributions (at\nthe discriminator). These 1-Lipschitz functions are realized using a neural network and the Lipschitz\ncondition is enforced by clipping the network weights. Despite these improvements, Gulrajani et\nal. [Gul+17] showed that weight clipping in WGAN led to undesirable behaviour such as poor\nsamples or failing to converge. They propose a stabler solution called WGAN with Gradient Penalty\n(WGAN-GP) where they penalize the norm of the discriminator\u2019s gradient with respect to its input.\nThey showed that the norm of the discriminator gradient is in fact 1 almost everywhere for 1-Lipschitz\nfunctions. We will make use of the WGAN-GP in our work given its stability in training GANs.\n\n2.1.2 Banach Wasserstein GAN (BWGAN)\n\nAdler and Lunz [AL18] introduced Banach WGANs (BWGANs) that provide a framework for using\narbitrary norms to train Wasserstein GANs (WGANs). They note that WGANs and their variants\nonly consider l2 as the underlying metric. Speci\ufb01cally, they generalize the WGAN-GP framework\nto Banach spaces and show how WGANs can be trained with arbitrary norms. The motivation for\nBWGANs is to allow the generator to emphasize desired image features such as edges as demonstrated\nusing Sobolev and Lp norms. The motivation for our work comes from BWGANs and is similar in\nspirit. They also suggest that WGAN training can be extended to a general metric space (with metric\nd(X, Y )) by using a penalty term of the form\n\n(cid:20)(cid:18)|D(X) \u2212 D(Y )|\n\n(cid:19)\n\nd(X, Y )\n\n(cid:21)2\n\n\u2212 1\n\nEX\u223cPr,Y \u223cPG\n\n.\n\n(2)\n\n2.1.3 Zero-Centered Gradient Penalty Approaches\n\nOther recent approaches to improving the stability of GAN training revisit the original formulation in\n[Goo+14]. Roth et al. [Rot+17] present a regularization approach where the inputs to the discriminator\nare smoothed by convolving them with noise (realized by adding noise to the samples during training).\nThis is shown to result in a zero-centered gradient penalty regularizer. Mescheder et al. [MGN18]\n\n2\n\n\fprove that zero-centered gradient penalties make training more stable. Thanh-Tung et al. [TTV19]\npropose another variant of the zero-centered gradient penalty for improving the convergence and\ngeneralizing capability of GANs. These works are signi\ufb01cant in that they provide a clear theoretical\nexplanation to the issues in training GANs and how they can be overcome.\n\n2.2\n\nImage Quality Assessment\n\nObjective image quality assessment (IQA) metrics can be classi\ufb01ed into three classes depending on\ntheir use of the reference (or undistorted) image for quality assessment. Full-reference (FR) IQA\nmetrics make use of the complete reference image while reduced reference (RR) IQA metrics make\nuse of partial reference image information for quality prediction. No-reference (NR) IQA metrics on\nthe other hand predict the quality of an image in a reference-free or stand-alone fashion. It should be\nnoted that the IQA metrics assume that the images in question are of natural (photographic) scenes.\nIn this work, we show how an FR and an NR IQA algorithm can each be used for the quality aware\ndesign of GANs.\n\n2.2.1 The Structural SIMilarity (SSIM) index\n\nIt would not be an exaggeration to claim that the invention of the SSIM index [Wan+04] heralded a\nrevolution in the design of objective quality assessment algorithms. The SSIM index is an FR IQA\nmetric that is based on the premise that distortions lead to change in local image structure, and that the\nhuman visual system is sensitive to these structural changes. The SSIM index quanti\ufb01es the change\nin structural information in the test image relative to the reference and computes a quality score.\nSpeci\ufb01cally, the SSIM index computes changes to local mean, local variance and local structure (or\ncorrelation) and pools them to \ufb01nd the local quality score. These local scores are then averaged across\nthe image to \ufb01nd the image quality score. This is summarized as follows.\n\nSSIM(P(i,j), T(i,j)) = L(P(i,j), T(i,j)).C(P(i,j), T(i,j)).S(P(i,j), T(i,j)),\n\n(3)\n\nwhere P, T refer to the pristine and test image respectively, the subscript (i, j) is the pixel index,\nL(P(i,j), T(i,j)), C(P(i,j), T(i,j)), S(P(i,j), T(i,j)) are the local luminance, contrast and structure\nscores at pixel (i, j) respectively. Further,\n\nL(P(i,j), T(i,j)) =\n\n2\u00b5P (i, j)\u00b5T (i, j) + C1\n\u00b52\nP (i, j) + \u00b52\nT (, j) + C1\n\n, C(P(i,j), T(i,j)) =\n\n2\u03c3P (i, j)\u03c3T (i, j) + C2\n2(i, j) + \u03c3T\n\n2(i, j) + C2\n\n\u03c3P\n\n,\n\n(4)\n\nK(cid:80)\n\nK(cid:80)\n\nS(P(i,j), T(i,j)) =\n\n\u03c3P T (i, j) + C3\n\n\u03c3P (i, j)\u03c3T (i, j) + C3\n\n,\n\nK(cid:80)\n\nK(cid:80)\n\n1\n\n(2K+1)2\n\nm=\u2212K\n\nn=\u2212K\n\nP (i \u2212 m, j \u2212 n), \u03c32\n\n(P (i \u2212\nwhere \u00b5P (i, j) =\nm, j \u2212 n) \u2212 \u00b5P (i, j))2 are the local mean and variance of the pristine image patch of size (2K +\n1) \u00d7 (2K + 1) centered at (i, j). \u00b5T (i, j), \u03c32\nP (i, j) are de\ufb01ned similarly for the test image T . The\n(P (i \u2212 m, j \u2212 n) \u2212 \u00b5P (i, j)) \u00d7\ncross covariance is de\ufb01ned as \u03c3P T (i, j) =\n(T (i \u2212 m, j \u2212 n) \u2212 \u00b5T (i, j)). The constants C1, C2, C3 are used to avoid division-by-zero issues.\nFor simplicity, C3 = C2/2 in the standard implementation which leads to\n\nP (i, j) =\n\nK(cid:80)\n\nm=\u2212K\n\nn=\u2212K\n\nm=\u2212K\n\nn=\u2212K\n\n1\n\n(2K+1)2\n\n1\n\n(2K+1)2\n\nK(cid:80)\n\nwhere\n\nSSIM(P(i,j), T(i,j)) = L(P(i,j), T(i,j)).CS(P(i,j), T(i,j)),\n\nCS(P(i,j), T(i,j)) =\n\n2\u03c3P T (i, j) + C2\n\n\u03c3P\n\n2(i, j) + \u03c3T\n\n2(i, j) + C2\n\n.\n\nThe image level SSIM index is given by:\n\nSSIM(P, T ) =\n\n1\n\nM \u00d7 N\n\nM(cid:88)\n\nN(cid:88)\n\ni=1\n\nj=1\n\nSSIM(P(i,j), T(i,j)),\n\nwhere the images are of size M \u00d7 N (assuming appropriate boundary handling).\n\n(5)\n\n(6)\n\n(7)\n\n3\n\n\f2.2.2 Natural Image Quality Estimator (NIQE)\n\nNIQE [MSB13] is a popular NR IQA metric that is based on the statistics of mean subtracted and\ncontrast normalized (MSCN) natural scenes. An MSCN image \u02c6I is generated from an input image I\naccording to:\n\n\u02c6I(i, j) =\n\nI(i, j) \u2212 \u00b5(i, j)\n\n\u03c3(i, j) + 1\n\n,\n\n(8)\n\nwhere (i, j) is the pixel index and \u00b5(i, j) and \u03c3(i, j) are the local mean and standard deviation\ncomputed as in the case of SSIM index (see Section 2.2.1). The constant 1 in the denominator is\nto prevent division-by-zero issues. NIQE relies on the following observations about the statistics\nof MSCN naturals scenes: a) the statistics of MSCN natural images reliably follow a Gaussian\ndistribution [Rud94], b) the statistics of MSCN pristine and distorted images can be modeled well\nusing a generalized Gaussian distribution (GGD) [MB10]. NIQE, as the name suggests, quanti\ufb01es\nthe naturalness of an image. To do so, it proposes a statistical model for the class of pristine natural\nscenes and uses the model\u2019s parameters for quality estimation. Speci\ufb01cally, it models MSCN pristine\nimage coef\ufb01cients using a GGD, and models the products of neighbouring MSCN coef\ufb01cients\nusing an asymmetric GGD (AGGD). The parameters of these GGD and AGGD models are in turn\nmodeled using a Multivariate Gaussian (MVG) distribution whose parameters \u00b5P , \u03a3P are then used\nas representatives of the entire class of pristine natural images. The quality of a test image is measured\nin terms of the \u201cdistance\u201d of its MVG parameters \u00b5T , \u03a3T from the pristine MVG parameters. This is\nquanti\ufb01ed as:\n\nD(\u00b5P , \u00b5T , \u03a3P , \u03a3T ) =\n\n(\u00b5P \u2212 \u00b5T )T\n\n(\u00b5P \u2212 \u00b5T ),\n\n(9)\n\n(cid:115)\n\n(cid:18) \u03a3P + \u03a3T\n\n(cid:19)\u22121\n\n2\n\nassuming that the sum of the matrices is invertible.\n\n3 Quality Aware GANs (QAGANs)\n\nWhile signi\ufb01cant progress has been made in the design of objective IQA metrics, their usage as cost\nfunctions has been limited by their unwieldy mathematical formulation. Non-convexity, dif\ufb01culty\nin gradient computation, not satisfying the properties of distances and norms are some of the\nimpediments to their usage in formulating optimization problems. In the following, we identify\nvariants of the SSIM index and NIQE that are mathematically amenable and lend themselves to\nbeing used as regularizers in the WGAN-GP optimization framework. Importantly, we empirically\ndemonstrate the ability of our approach in augmenting the capability of the generator.\n\n3.1 Quality Aware BWGAN Regularization\n\nFrom the de\ufb01nitions in (3) and (7), the SSIM index is bounded in the interval [-1, 1] that im-\nmediately renders it an invalid distance metric (since it can take negative values). This makes\nthe direct application of the SSIM index in the \ufb02exible BWGAN framework infeasible. Brunet\net al. [BVW11] have analyzed the mathematical properties of SSIM index and show that valid\ndistance metrics can be derived from the components of the SSIM index de\ufb01ned in (4). For\n\ne.g., d1(P(i,j), T(i,j))) := (cid:112)1 \u2212 L(P(i,j), T(i,j)), d2(P(i,j), T(i,j))) := (cid:112)1 \u2212 CS(P(i,j), T(i,j))\n\nare shown to be valid normalized distance metrics. Importantly, they show that\n\ndQ(P(i,j), T(i,j)) =\n\n2 \u2212 L(P(i,j), T(i,j)) \u2212 CS(P(i,j), T(i,j))\n\n(10)\n\n(cid:113)\n\nis a valid distance metric that also preserves the quality discerning properties of the SSIM index.\nFurther, as in the SSIM index, the image level distance metric for an M \u00d7 N image is de\ufb01ned as\n\nM(cid:88)\n\nN(cid:88)\n\ni=1\n\nj=1\n\ndQ(P, T ) =\n\n1\n\nM \u00d7 N\n\ndQ(P(i,j), T(i,j)).\n\n(11)\n\nWe refer the reader to [BVW11] for a detailed exposition of the properties of this distance metric. We\ncall this a quality aware distance metric that serves as a good candidate for regularizing GANs. We\nhypothesize that by making the discriminator Lipschitz with respect to dQ(X, Y ) in the image space,\n\n4\n\n\fthe gradients computed from such a regularized discriminator emphasize the structural information in\nthe generated images. Further, since dQ(X, Y ) is bounded below and we are operating in the general\nmetric space of images, we are in a position to impose the Lipschitz constraint directly. In order\nto do so, we follow the approach suggested in BWGAN [AL18] and introduce a gradient penalty\nregularization term of the form\n\nSSIM GP = EX\u223cPr,Y \u223cPG\n\n.\n\n(12)\n\n(cid:20)(cid:18)|D(X) \u2212 D(Y )|\n\n(cid:19)\n\n(cid:21)2\n\n\u2212 1\n\ndQ(X, Y )\n\nIn addition to the SSIM GP regularizer, we also employ a 1-GP regularizer (i.e., WGAN-GP) to\nensure stable training. The overall discriminator loss function is\n\n(cid:18)\n(cid:19)\nEz\u223cPz D(G(z)) \u2212 EX\u223cPr D(X)\n\nLd = min\nD\u0001D\n\n+ \u03bb1E\u02c6x\u223cP\u02c6x (||\u2207\u02c6xD(\u02c6x)||2 \u2212 1)2+\n(cid:21)2\n\n(cid:20)(cid:18)|D(X) \u2212 D(G(z))|\n\n(cid:19)\n\n\u2212 1\n\n,\n\ndQ(X, G(z))\n\n(13)\n\n\u03bb2EX\u223cPr,G(z)\u223cPG\n\nwhere \u03bb1 and \u03bb2 are empirically chosen. Further, as in the WGAN-GP setting, \u02c6x is sampled from\na line joining the real and fake image distributions. Our coupled gradient penalty with respect to\ndQ(X, Y ) imposes a strong Lipschitz constraint. We believe that this quality aware discriminator\npenalty results in a quality aware GAN formulation.\n\n3.2 Quality Aware Gradient Penalty\n\nAs discussed in Section 2.2.2, the MSCN coef\ufb01cient distribution of pristine and distorted images of\nnatural scenes have a unique statistical signature. We claim that if the MSCN coef\ufb01cient statistics\npossess a unique and reliably consistent signature, then so must the MSCN coef\ufb01cient statistics\nof the spatial gradients and discriminator gradients of natural images. We empirically show that\nthis is indeed the case, and that a NIQE-like formulation works well in quantifying the naturalness\nof discriminator gradients as well. Fig. 1b shows the empirical histograms of the MSCN spatial\ngradient image of a representative natural scene and its distorted versions. Fig. 1c shows the empirical\nhistograms of the MSCN discriminator gradient image of a representative natural scene and its\ndistorted versions. We have chosen a minimally trained (\u223c 100 iterations) deep neural network for\n\ufb01nding the discriminator gradients. We make the following observations about the histograms: a)\nunimodal, b) Gaussian-like, c) distortions affect statistics. All these observations are identical to the\nMSCN coef\ufb01cient statistics of natural images shown in Fig. 1a. We believe that the discriminator\ngradients show this statistical behavior since discriminators are smooth functions and natural images\npossess a unique local statistical signature. Another visual example that clearly illustrates our\nobservations and motivates our work is shown in Fig. 2.\nBased on these observations, we propose a NIQE-like score for the naturalness of the discriminator\ngradients of natural scenes. As in NIQE, a GGD is used to model the MSCN coef\ufb01cients of pristine\nimage discriminator gradients, and an AGGD is used to model the product of these coef\ufb01cients.\nFinally, an MVG is used to model the parameters of the GGD and AGGD of MSCN of discriminator\ngradients from pristine images. The pristine discriminator gradient MVG model is characterized by\nits parameters \u00b5P , \u03a3P . These parameters represent the class of all discriminator gradients computed\nwith respect to pristine natural images.\nThe naturalness of a test discriminator gradient image T is computed to be its \u201cdistance\u201d from the\npristine image gradient class and is given by\n\n||(T|\u00b5P , \u03a3P )||NIQE :=\n\n(\u00b5P \u2212 \u00b5T )T\n\n(\u00b5P \u2212 \u00b5T ),\n\n(14)\n\n(cid:115)\n\n(cid:18) \u03a3P + \u03a3T\n\n(cid:19)\u22121\n\n2\n\nwhere \u00b5T , \u03a3T are the model parameters of the test image\u2019s MVG model. This function serves as a\nquality aware gradient penalty that we use to regularize a GAN as discussed next. As an aside, Fig.\n1b shows that our hypothesis would work even if a spatial gradient regularizer is used. As discussed\nin Section 2, several regularization approaches have been proposed in the literature to improve the\nimage quality, stability, generalization ability of GANs. These include regularizing with non-zero\nmean and zero mean gradient penalty, Sobolev norm penalty etc. While these approaches consider\n\n5\n\n\f(a) Natural scenes\n\n(b) Spatial gradient\n\n(c) Discriminator gradient\n\nFigure 1: The empirical histograms of MSCN coef\ufb01cients. (1a) Pristine natural scenes and their\ndistorted versions. (1b) Spatial gradient of pristine natural scenes and their distorted versions. (1c)\nDiscriminator gradient of pristine natural scenes and their distorted versions.\n\nFigure 2: Top row (L-R) shows a real image, its corresponding spatial gradient and discriminator\ngradient maps. Middle row (L-R) shows their corresponding mean subtracted contrast normalized\n(MSCN) coef\ufb01cients. Bottom row (L-R) shows the normalized histograms of the respective MSCN\ncoef\ufb01cients.\n\nthe norm of the discriminator gradient, they do not make use of the local correlation present in the\ngradient. Based on our hypothesis on the statistics of the discriminator gradient values of natural\nscenes presented in the previous section, we propose a novel regularization term that helps impose\nthese statistical constraints on the generated images. We have shown through Fig. 1c that \u2207xD(x)\ncontains useful local spatial statistics information. Our regularizer is designed to force the local\nstatistics of the discriminator gradient of \u02c6x to be as close to those of real images as possible. Our\nclaim is that such a regularization strategy results in improving visual quality of the generated images.\nThe NIQE \u201cdistance\u201d function in (14) serves as the statistics preserving regularizer. As mentioned\n(cid:18)\n(cid:19)\nearlier, we work in the WGAN-GP framework to demonstrate our method. The overall discriminator\ncost function includes the 1-GP regularizer and the NIQE function regularizer as de\ufb01ned in\n+ \u03bb1E\u02c6x\u223cP\u02c6x (||\u2207\u02c6xD(\u02c6x)||2 \u2212 1)2+\nEz\u223cPz D(G(z)) \u2212 EX\u223cPr D(X)\n\u03bb2E\u02c6x\u223cP\u02c6x(||(\u2207\u02c6xD(\u02c6x)|\u00b5P , \u03a3P )||NIQE),\n\nLd = min\nD\u0001D\n\n(15)\n\nwhere \u03bb1 and \u03bb2 are hyper parameters chosen empirically. As before, \u02c6x is sampled from a line joining\nthe real and fake image distributions.\n\n6\n\n\f4 Experiments\n\nDatasets: We have evaluated the ef\ufb01cacy of proposed regularizers on three datasets: 1) CIFAR-10\n[35] (60K images of 32 \u00d7 32 resolution), 2) CelebA [Liu+15](202.6K face images cropped and\nresized to resolution 64\u00d7 64. 3) STL-10 [CNL11] (100K images of resolution 96\u00d7 96 and 48\u00d7 48).\nNetwork Details: All our experiments are done using a residual architecture for discriminator and\ngenerator used in WGAN-GP [Gul+17]. Batch normalization is applied to each resnet layer in the\ngenerator. We have used Adam as the optimizer with the standard momentum parameters \u03b21 = 0.\nand \u03b22 = 0.9. The initial learning rate was set to 0.0002 for CIFAR-10 and STL-10 datasets and\n0.0001 for the CelebA dataset. The learning rate is decreased adaptively. We have empirically chosen\nthe hyper parameters \u03bb1 and \u03bb2 to be 1 and 0.1 respectively. All our models are trained for 100K\niterations with a batch size of 64. The discriminator is updated \ufb01ve times for every update of the\ngenerator.\nEvaluation: We evaluated our method using two quantitative measures: 1) Inception Score (IS)\n[Sal+16] and 2) Frechet Inception Distance (FID) [Heu+17]. IS measures the sample quality and\ndiversity by \ufb01nding the entropy of the predicted labels. Higher IS indicates a better model. FID score\nmeasures the similarity between real and fake samples by \ufb01tting a multi variate Gaussian (MVG)\nmodel to the intermediate representation for the real and fake samples respectively. In case of FID,\nlower scores indicate a better model. We have used 50K randomly generated samples for computing\nthe inception score and FID score. For comparison with previous models, we have computed the\nFID scores for CIFAR-10 and CelebA datasets using the of\ufb01cial Tensor\ufb02ow implementation, and\nfor computing the FID scores of STL-10 dataset, we have used the Chainer implementation used\nby SNGAN [Miy+18]. IS and FID are computed \ufb01ve times for the best model and the mean and\nvariance are reported.\nResults: Our primary motivation in this work is to formulate quality aware loss functions for GANs\nprimarily in the non-progressive setting. We present representative samples from the SSIM based\nQAGAN in Fig. 3 and the NIQE based QAGAN in Fig. 4. IS and FID are reported in Tables 1, 2, 3\nand 4.\n\n(a) CIFAR-10 dataset (32 \u00d7 32).\n\n(b) STL-10 dataset (48 \u00d7 48).\n\n(c) CelebA dataset (64 \u00d7 64).\n\nFigure 3: Randomly sampled images generated using QAGANs with quality aware distance metric\nregularizer (SSIM).\n\nFrom the \ufb01gures and tables, we see that QAGANs are very competitive with the state-of-the-art\nmethods on all three datasets. Importantly and interestingly, QAGANs deliver consistently good\nperformance with respect to both IS and FID, while other methods do well mostly with respect to IS.\nThis provides clear quantitative evidence of the improved quality of images generated by QAGANs.\nAlso, it underscores our claim that explicitly using objective IQA metrics in GAN cost functions is\nnot only a promising way forward but also long overdue.\nWe see that the NIQE-based regularizer shows inconsistent performance only with respect to IS on\nthe CIFAR-10 dataset. We attribute this inconsistency to the small image size (32\u00d7 32) of this dataset.\nThis would lead to poorer estimates of the model parameters (compared to other resolutions) which\nin turn reduces performances.\nFurther, we believe that the proposed quality aware regularizers can be applied to progressive\narchitectures as well. We demonstrate this by applying the proposed regularizers to the PGGAN\n\n7\n\n\f(a) CIFAR-10 dataset (32 \u00d7 32).\n\n(b) STL-10 dataset (48 \u00d7 48).\n\n(c) CelebA dataset (64 \u00d7 64).\n\nFigure 4: Randomly sampled images generated using QAGANs with quality aware gradient penalty\nregularizer (NIQE).\n\nTable 1: Inception Score (IS) and Fr\u00e9chet Inception Distance (FID) computed from 50,000 samples\nof the CIFAR-10 dataset (32 \u00d7 32). Scores that are unavailable are marked with a \u2018-\u2019.\n\nModel\nReal data\n\nDCGAN [RMC15]\nWGAN-GP [Gul+17]\nCTGAN [Kim+18]\nSNGAN [Miy+18]\n\n2 ,2 - Banach WGAN [AL18]\n\nW \u2212 3\nL10 - Banach WGAN [AL18]\n\nMMD GAN-rep-b [Li+17]\n\nQAGAN (SSIM)\nQAGAN (NIQE)\n\nIS\n\n11.24 \u00b1 0.12\n6.16 \u00b1 0.07\n7.86 \u00b1 0.10\n8.12 \u00b1 0.12\n8.12 \u00b1 0.12\n8.26 \u00b1 0.07\n8.31 \u00b1 0.07\n8.29 \u00b1 0.0\n8.37 \u00b1 0.04\n7.87 \u00b1 0.027\n\nFID\n7.80\n\n-\n\n40.2 \u00b1 0.0\n21.5 \u00b1 0.21\n\n-\n\n-\n-\n\n16.21 \u00b1 0,0\n13.91 \u00b1 0.105\n12.4697\u00b1 0.068\n\nTable 2: FID on the CelebA dataset (64 \u00d7 64).\n\nModel\n\nReal Faces (CelebA)\nWGAN-GP [Gul+17]\nBanach WGAN [AL18]\nMMD GAN-rep-b [Li+17]\n\nQAGAN (SSIM)\nQAGAN (NIQE)\n\nFID\n1.09\n12.89\n10.5\n6.79\n6.421\n6.504\n\nModel\n\nTable 3: IS and FID on the STL-10 dataset (48 \u00d7 48).\nFID\nReal Data (48 \u00d7 48)\n7.9\nWGAN-GP [Gul+17]\nSNGAN [Miy+18]\n\nIS\n\n26.08 \u00b1 0.26\n9.05\u00b1 0.12\n9.10 \u00b1 0.04\n9.36 \u00b1 0.0\n9.29 \u00b1 0.05\n9.1720 \u00b1 0.08\n\n55.1 \u00b1 0.0\n40.10 \u00b1 0.50\n36.67 \u00b1 0.0\n19.77 \u00b1 0.0091\n19.45 \u00b1 0.0013\n\nMMD GAN-rep [Li+17]\n\nQAGAN (SSIM)\nQAGAN (NIQE)\n\nTable 4: IS and FID on the STL-10 dataset (96 \u00d7 96).\n3.7155 \u00b1 0.004\n3.1951 \u00b1 0.013\n\nQAGAN (SSIM)\nQAGAN (NIQE)\n\n9.66 \u00b1 0.18\n8.948 \u00b1 0.01\n\nModel\n\nFID\n\nIS\n\n8\n\n\farchitecture [Kar+18] (both original and growing) at resolutions of 128 \u00d7 128 and 256 \u00d7 256 on the\nCelebA dataset. The qualitative results at an image resolution of 256 \u00d7 256 are shown in Fig. 5 and\nthe quantitative results are given in Table 5. These results are shown after 6K iterations. Interestingly,\nwe observed that the proposed regularizers resulted in faster convergence and improved visual quality\nof the generated images. The quantitative improvement in performance is clear from the FID values\nin Table 5.\n\nFigure 5: Randomly sampled images generated using QAGANs for CelebA dataset with a progres-\nsively growing architecture (256 \u00d7 256) Top row: PGGAN [Kar+18]. Middle row: PGGAN with\nSSIM. Bottom row: PGGAN with NIQE.\n\nTable 5: FID on the CelebA dataset for PGGAN.\n\nModel\n\nResolution (128 \u00d7 128)\nPGGAN [Kar+18]\n\nPGGAN + QAGAN (SSIM)\nPGGAN + QAGAN (NIQE)\nResolution (256 \u00d7 256)\nPGGAN [Kar+18]\n\nPGGAN + QAGAN (SSIM)\nPGGAN + QAGAN (NIQE)\n\nFID\n\n64.50\n47.46\n49.80\n\n62.86\n40.83\n47.27\n\n5 Conclusions\n\nBased on insights from both FR and NR IQA metrics, we have proposed two novel regularization\napproaches for the WGAN-GP framework. The key takeaway from our work is that the unique\nlocal structural and statistical signature of pristine natural images must be preserved in the generated\nimages. We demonstrated how the SSIM and NIQE based regularizers guide the generator towards the\nclass of pristine natural images and thereby ensure its unique local structural and statistical signature.\nThe performance of QAGANs was shown to be very competitive with the state-of-the-art methods\nover three popular datasets. We believe that this work opens up new and exciting directions in image\nand video generative modeling, given the plethora of excellent QA metrics. The challenge however\nlies in translating the QA metrics into a form that \ufb01ts the GAN framework.\n\n6 Acknowledgement\n\nWe would like to thank Dr. J. Balasubramaniam from the Math department at IIT Hyderabad for his\nvaluable suggestions and insights during this work. We would also like to thank NVIDIA for the\nGPU donation. SSC would like to acknowledge the sound track of Kavaludaari for inspiration while\nwriting.\n\n9\n\n\fReferences\n[Rud94]\n\nDaniel L Ruderman. \u201cThe statistics of natural images\u201d. In: Network: computation in\nneural systems 5.4 (1994), pp. 517\u2013548.\n\n[Wan+04] Zhou Wang et al. \u201cImage quality assessment: from error visibility to structural similarity\u201d.\n\n[MB10]\n\nIn: IEEE transactions on image processing 13.4 (2004), pp. 600\u2013612.\nAnush K Moorthy and Alan C Bovik. \u201cStatistics of natural image distortions\u201d. In: 2010\nIEEE International Conference on Acoustics, Speech and Signal Processing. IEEE. 2010,\npp. 962\u2013965.\n\n[BVW11] Dominique Brunet, Edward R Vrscay, and Zhou Wang. \u201cOn the mathematical properties\nof the structural similarity index\u201d. In: IEEE Transactions on Image Processing 21.4\n(2011), pp. 1488\u20131499.\n\n[CNL11] Adam Coates, Andrew Ng, and Honglak Lee. \u201cAn analysis of single-layer networks in un-\nsupervised feature learning\u201d. In: Proceedings of the fourteenth international conference\non arti\ufb01cial intelligence and statistics. 2011, pp. 215\u2013223.\n\n[Goo+14]\n\n[MSB13] Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. \u201cMaking a \u201ccompletely blind\u201d\nimage quality analyzer\u201d. In: IEEE Signal Processing Letters 20.3 (2013), pp. 209\u2013212.\nIan Goodfellow et al. \u201cGenerative adversarial nets\u201d. In: Advances in neural information\nprocessing systems. 2014, pp. 2672\u20132680.\nZiwei Liu et al. \u201cDeep learning face attributes in the wild\u201d. In: Proceedings of the IEEE\ninternational conference on computer vision. 2015, pp. 3730\u20133738.\n\n[Liu+15]\n\n[RMC15] Alec Radford, Luke Metz, and Soumith Chintala. \u201cUnsupervised representation learn-\ning with deep convolutional generative adversarial networks\u201d. In: arXiv preprint\narXiv:1511.06434 (2015).\nTim Salimans et al. \u201cImproved techniques for training gans\u201d. In: Advances in neural\ninformation processing systems. 2016, pp. 2234\u20132242.\n\n[Sal+16]\n\n[ACB17] Martin Arjovsky, Soumith Chintala, and L\u00e9on Bottou. \u201cWasserstein generative adver-\nsarial networks\u201d. In: International Conference on Machine Learning. 2017, pp. 214\u2013\n223.\nIshaan Gulrajani et al. \u201cImproved training of wasserstein gans\u201d. In: Advances in Neural\nInformation Processing Systems. 2017, pp. 5767\u20135777.\n\n[Gul+17]\n\n[Heu+17] Martin Heusel et al. \u201cGans trained by a two time-scale update rule converge to a\nlocal nash equilibrium\u201d. In: Advances in Neural Information Processing Systems. 2017,\npp. 6626\u20136637.\nPhillip Isola et al. \u201cImage-to-image translation with conditional adversarial networks\u201d.\nIn: Proceedings of the IEEE conference on computer vision and pattern recognition.\n2017, pp. 1125\u20131134.\n\n[Iso+17]\n\n[Led+17] Christian Ledig et al. \u201cPhoto-realistic single image super-resolution using a generative\nadversarial network\u201d. In: Proceedings of the IEEE conference on computer vision and\npattern recognition. 2017, pp. 4681\u20134690.\nChun-Liang Li et al. \u201cMmd gan: Towards deeper understanding of moment matching\nnetwork\u201d. In: Advances in Neural Information Processing Systems. 2017, pp. 2203\u20132213.\nJunting Pan et al. \u201cSalgan: Visual saliency prediction with adversarial networks\u201d. In:\nCVPR Scene Understanding Workshop (SUNw). 2017.\n\n[Pan+17]\n\n[Li+17]\n\n[AL18]\n\n[Rot+17] Kevin Roth et al. \u201cStabilizing training of generative adversarial networks through reg-\nularization\u201d. In: Advances in neural information processing systems. 2017, pp. 2018\u2013\n2028.\nJonas Adler and Sebastian Lunz. \u201cBanach wasserstein gan\u201d. In: Advances in Neural\nInformation Processing Systems. 2018, pp. 6754\u20136763.\nTero Karras et al. \u201cProgressive Growing of GANs for Improved Quality, Stability,\nand Variation\u201d. In: International Conference on Learning Representations. 2018. URL:\nhttps://openreview.net/forum?id=Hk99zCeAb.\n\n[Kar+18]\n\n[Kim+18] Sangpil Kim et al. \u201cCT-GAN: Conditional Transformation Generative Adversarial Net-\n\nwork for Image Attribute Modi\ufb01cation\u201d. In: arXiv preprint arXiv:1807.04812 (2018).\n[MGN18] Lars Mescheder, Andreas Geiger, and Sebastian Nowozin. \u201cWhich training methods for\n\nGANs do actually Converge?\u201d In: arXiv preprint arXiv:1801.04406 (2018).\n\n10\n\n\f[Miy+18] Takeru Miyato et al. \u201cSpectral normalization for generative adversarial networks\u201d. In:\n\narXiv preprint arXiv:1802.05957 (2018).\n\n[TTV19] Hoang Thanh-Tung, Truyen Tran, and Svetha Venkatesh. \u201cImproving generalization\nand stability of generative adversarial networks\u201d. In: arXiv preprint arXiv:1902.03984\n(2019).\n\n11\n\n\f", "award": [], "sourceid": 1693, "authors": [{"given_name": "KANCHARLA", "family_name": "PARIMALA", "institution": "Indian Institute of Technology, Hyderabad"}, {"given_name": "Sumohana", "family_name": "Channappayya", "institution": "Indian Institute of Technology Hyderabad"}]}