{"title": "A Gaussian Process Model of Quasar Spectral Energy Distributions", "book": "Advances in Neural Information Processing Systems", "page_first": 2494, "page_last": 2502, "abstract": "We propose a method for combining two sources of astronomical data, spectroscopy and photometry, that carry information about sources of light (e.g., stars, galaxies, and quasars) at extremely different spectral resolutions. Our model treats the spectral energy distribution (SED) of the radiation from a source as a latent variable that jointly explains both photometric and spectroscopic observations. We place a flexible, nonparametric prior over the SED of a light source that admits a physically interpretable decomposition, and allows us to tractably perform inference. We use our model to predict the distribution of the redshift of a quasar from five-band (low spectral resolution) photometric data, the so called ``photo-z'' problem. Our method shows that tools from machine learning and Bayesian statistics allow us to leverage multiple resolutions of information to make accurate predictions with well-characterized uncertainties.", "full_text": "A Gaussian Process Model of Quasar\n\nSpectral Energy Distributions\n\nAndrew Miller\u2217 , Albert Wu\n\nSchool of Engineering and Applied Sciences\n\nHarvard University\n\nacm@seas.harvard.edu, awu@college.harvard.edu\n\nJeffrey Regier, Jon McAuliffe\n\nDepartment of Statistics\n\nUniversity of California, Berkeley\n\n{jeff, jon}@stat.berkeley.edu\n\nPrabhat, David Schlegel\n\nLawrence Berkeley National Laboratory\n\n{prabhat, djschlegel}@lbl.gov\n\nDustin Lang\n\nMcWilliams Center for Cosmology\n\nCarnegie Mellon University\n\ndstn@cmu.edu\n\nRyan Adams \u2020\n\nSchool of Engineering and Applied Sciences\n\nHarvard University\n\nrpa@seas.harvard.edu\n\nAbstract\n\nWe propose a method for combining two sources of astronomical data, spec-\ntroscopy and photometry, that carry information about sources of light (e.g., stars,\ngalaxies, and quasars) at extremely different spectral resolutions. Our model treats\nthe spectral energy distribution (SED) of the radiation from a source as a latent\nvariable that jointly explains both photometric and spectroscopic observations.\nWe place a \ufb02exible, nonparametric prior over the SED of a light source that ad-\nmits a physically interpretable decomposition, and allows us to tractably perform\ninference. We use our model to predict the distribution of the redshift of a quasar\nfrom \ufb01ve-band (low spectral resolution) photometric data, the so called \u201cphoto-\nz\u201d problem. Our method shows that tools from machine learning and Bayesian\nstatistics allow us to leverage multiple resolutions of information to make accu-\nrate predictions with well-characterized uncertainties.\n\n1\n\nIntroduction\n\nEnormous amounts of astronomical data are collected by a range of instruments at multiple spectral\nresolutions, providing information about billions of sources of light in the observable universe [1,\n10]. Among these data are measurements of the spectral energy distributions (SEDs) of sources of\nlight (e.g. stars, galaxies, and quasars). The SED describes the distribution of energy radiated by a\nsource over the spectrum of wavelengths or photon energy levels. SEDs are of interesting because\nthey convey information about a source\u2019s physical properties, including type, chemical composition,\nand redshift, which will be an estimand of interest in this work.\nThe SED can be thought of as a latent function of which we can only obtain noisy measurements.\nMeasurements of SEDs, however, are produced by instruments at widely varying spectral resolu-\ntions \u2013 some instruments measure many wavelengths simultaneously (spectroscopy), while others\n\n\u2217http://people.seas.harvard.edu/~acm/\n\u2020http://people.seas.harvard.edu/~rpa/\n\n1\n\n\fFigure 1: Left: example of a BOSS-measured quasar SED with SDSS band \ufb01lters, Sb(\u03bb), b \u2208\n{u, g, r, i, z}, overlaid. Right: the same quasar\u2019s photometrically measured band \ufb02uxes. Spectro-\nscopic measurements include noisy samples at thousands of wavelengths, whereas SDSS photomet-\nric \ufb02uxes re\ufb02ect the (weighted) response over a large range of wavelengths.\n\naverage over large swaths of the energy spectrum and report a low dimensional summary (pho-\ntometry). Spectroscopic data describe a source\u2019s SED in \ufb01ner detail than broadband photometric\ndata. For example, the Baryonic Oscillation Spectroscopic Survey [5] measures SED samples at\nover four thousand wavelengths between 3,500 and 10,500 \u00c5. In contrast, the Sloan Digital Sky\nSurvey (SDSS) [1] collects spectral information in only 5 broad spectral bins by using broadband\n\ufb01lters (called u, g, r, i, and z), but at a much higher spatial resolution. Photometric preprocessing\nmodels can then aggregate pixel information into \ufb01ve band-speci\ufb01c \ufb02uxes and their uncertainties\n[17], re\ufb02ecting the weighted average response over a large range of the wavelength spectrum. The\ntwo methods of spectral information collection are graphically compared in Figure 1.\nDespite carrying less spectral information, broadband photometry is more widely available and ex-\nists for a larger number of sources than spectroscopic measurements. This work develops a method\nfor inferring physical properties sources by jointly modeling spectroscopic and photometric data.\nOne use of our model is to measure the redshift of quasars for which we only have photometric ob-\nservations. Redshift is a phenomenon in which the observed SED of a source of light is stretched to-\nward longer (redder) wavelengths. This effect is due to a combination of radial velocity with respect\nto the observer and the expansion of the universe (termed cosmological redshift) [8, 7]. Quasars, or\nquasi-stellar radio sources, are extremely distant and energetic sources of electromagnetic radiation\nthat can exhibit high redshift [16]. Accurate estimates and uncertainties of redshift measurements\nfrom photometry have the potential to guide the use of higher spectral resolution instruments to study\nsources of interest. Furthermore, accurate photometric models can aid the automation of identifying\nsource types and estimating physical characteristics of faintly observed sources in large photometric\nsurveys [14].\nTo jointly describe both resolutions of data, we directly model a quasar\u2019s latent SED and the process\nby which it generates spectroscopic and photometric observations. Representing a quasar\u2019s SED as\na latent random measure, we describe a Bayesian inference procedure to compute the marginal prob-\nability distribution of a quasar\u2019s redshift given observed photometric \ufb02uxes and their uncertainties.\nThe following section provides relevant application and statistical background. Section 3 describes\nour probabilistic model of SEDs and broadband photometric measurements. Section 4 outlines\nour MCMC-based inference method for ef\ufb01ciently computing statistics of the posterior distribu-\ntion. Section 5 presents redshift and SED predictions from photometric measurements, among other\nmodel summaries, and a quantitative comparison between our method and two existing \u201cphoto-z\u201d.\nWe conclude with a discussion of directions for future work.\n\n2 Background\n\nThe SEDs of most stars are roughly approximated by Planck\u2019s law for black body radiators and\nstellar atmosphere models [6]. Quasars, on the other hand, have complicated SEDs characterized by\nsome salient features, such as the Lyman-\u03b1 forest, which is the absorption of light at many wave-\nlengths from neutral hydrogen gas between the earth and the quasar [19]. One of the most interesting\nproperties of quasars (and galaxies) conveyed by the SED is redshift, which gives us insight into an\nobject\u2019s distance and age. Redshift affects our observation of SEDs by \u201cstretching\u201d the wavelengths,\n\u03bb \u2208 \u039b, of the quasar\u2019s rest frame SED, skewing toward longer (redder) wavelengths. Denoting the\n: \u039b \u2192 R+, the effect of redshift with value zn\nrest frame SED of a quasar n as a function, f (rest)\n\nn\n\n2\n\nugrizband012345678flux (nanomaggies)PSFFLUX\fFigure 2: Spectroscopic measurements of multiple quasars at different redshifts, z. The upper graph\ndepicts the sample spectrograph in the observation frame, intuitively thought of as \u201cstretched\u201d by a\nfactor (1 + z). The lower \ufb01gure depicts the \u201cde-redshifted\u201d (rest frame) version of the same quasar\nspectra, The two lines show the corresponding locations of the characteristic peak in each reference\nframe. Note that the x-axis has been changed to ease the visualization - the transformation is much\nmore dramatic. The appearance of translation is due to missing data; we don\u2019t observe SED samples\noutside the range 3,500-10,500 \u00c5.\n\n(typically between 0 and 7) on the observation-frame SED is described by the relationship\n\nf (obs)\nn\n\n(\u03bb) = f (rest)\n\nn\n\n.\n\n(1)\n\n(cid:18) \u03bb\n\n(cid:19)\n\n1 + zn\n\nSome observed quasar spectra and their \u201cde-redshifted\u201d rest frame spectra are depicted in Figure 2.\n\n3 Model\n\nThis section describes our probabilistic model of spectroscopic and photometric observations.\nSpectroscopic \ufb02ux model The SED of a quasar is a non-negative function f : \u039b \u2192 R+, where \u039b\ndenotes the range of wavelengths and R+ are non-negative real numbers representing \ufb02ux density.\nOur model speci\ufb01es a quasar\u2019s rest frame SED as a latent random function. Quasar SEDs are highly\nstructured, and we model this structure by imposing the assumption that each SED is a convex\nmixture of K latent, positive basis functions. The model assumes there are a small number (K) of\nlatent features or characteristics and that each quasar can be described by a short vector of mixing\nweights over these features.\nWe place a normalized log-Gaussian process prior on each of these basis functions (described in\nsupplementary material). The generative procedure for quasar spectra begins with a shared basis\n\n\u03b2k(\u00b7)\n\niid\u223c GP(0, K\u03b8), k = 1, . . . , K,\n\nwn \u223c p(w) , s.t. (cid:88)\n\nwhere K\u03b8 is the kernel and Bk is the exponentiated and normalized version of \u03b2k. For each quasar n,\n(3)\n\nmn \u223c p(m) , s.t. mn > 0,\n\nzn \u223c p(z),\n\nwk = 1,\n\n(cid:82)\n\nBk(\u00b7) =\n\nexp(\u03b2k(\u00b7))\n\n\u039b exp(\u03b2k(\u03bb)) d\u03bb\n\n,\n\n(2)\n\nwk\n\nwhere wn mixes over the latent types, mn is the apparent brightness, zn is the quasar\u2019s redshift,\nand distributions p(w), p(m), and p(z) are priors to be speci\ufb01ed later. As each positive SED basis\nfunction, Bk, is normalized to integrate to one, and each quasar\u2019s weight vector wn also sums to\none, the latent normalized SED is then constructed as\n\nf (rest)\nn\n\n(\u00b7) =\n\nwn,kBk(\u00b7)\n\n(4)\n\n(cid:88)\n\nand we de\ufb01ne the unnormalized SED \u02dcf (rest)\ninterpretation of f (rest)\n\n(\u00b7). This parameterization admits the\n(\u00b7) as a probability density scaled by mn. This interpretation allows us to\n\n(\u00b7) \u2261 mn \u00b7 f (rest)\n\nn\n\nn\n\nn\n\nk\n\n3\n\n\f(cid:96), \u03bd\n\nBk\n\nxn,\u03bb\n\n\u03c32\n\nn,\u03bb\n\n\u03bb \u2208 \u039b\nNspec\n\nwn\n\nmn\n\nzn\n\nK\n\nyn,b\n\n\u03c4 2\nn,b\n\nb \u2208 {u, g, r, i, z}\n\nNphoto\n\nFigure 3: Graphical model representation\nof the joint photometry and spectroscopy\nmodel. The left shaded variables represent\nspectroscopically measured samples and\ntheir variances. The right shaded variables\nrepresent photometrically measured \ufb02uxes\nand their variances. The upper box rep-\nresents the latent basis, with GP prior pa-\nrameters (cid:96) and \u03bd. Note that Nspec + Nphoto\nreplicates of wn, mn and zn are instanti-\nated.\n\nseparate out the apparent brightness, which is a function of distance and overall luminosity, from the\nSED itself, which carries information pertinent to the estimand of interest, redshift.\nFor each quasar with spectroscopic data, we observe noisy samples of the redshifted and scaled spec-\ntral energy distribution at a grid of P wavelengths \u03bb \u2208 {\u03bb1, . . . , \u03bbP}. For quasar n, our observation\nframe samples are conditionally distributed as\nxn,\u03bb|zn, wn,{Bk} ind\u223c N\n\n(cid:18) \u03bb\n\n(cid:18)\n\n(cid:19)\n\n(cid:19)\n\n, \u03c32\n\n(5)\n\n\u02dcf (rest)\nn\n\n1 + zn\n\nn,\u03bb\n\nn,\u03bb is known measurement variance from the instruments used to make the observations.\n\nwhere \u03c32\nThe BOSS spectra (and our rest frame basis) are stored in units 10\u221217 \u00b7 erg \u00b7 cm\u22122 \u00b7 s\u22121 \u00b7 \u00c5\n\n\u22121.\n\nPhotometric \ufb02ux model\nPhotometric data summarize the amount of energy observed over a\nlarge swath of the wavelength spectrum. Roughly, a photometric \ufb02ux measures (proportionally) the\nnumber of photons recorded by the instrument over the duration of an exposure, \ufb01ltered by a band-\nspeci\ufb01c sensitivity curve. We express \ufb02ux in nanomaggies [15]. Photometric \ufb02uxes and measure-\nment error derived from broadband imagery have been computed directly from pixels [17]. For each\nquasar n, SDSS photometric data are measured in \ufb01ve bands, b \u2208 {u, g, r, i, z}, yielding a vector of\n\ufb01ve \ufb02ux values and their variances, yn and \u03c4 2\nn,b. Each band, b, measures photon observations at each\nwavelength in proportion to a known \ufb01lter sensitivity, Sb(\u03bb). The \ufb01lter sensitivities for the SDSS\nugriz bands are depicted in Figure 1, with an example observation frame quasar SED overlaid. The\nactual measured \ufb02uxes can be computed by integrating the full object\u2019s spectrum, mn \u00b7 f (obs)\n(\u03bb)\nagainst the \ufb01lters. For a band b \u2208 {u, g, r, i, z}\n\nn\n\n(cid:90)\n\n\u00b5b(f (rest)\n\nn\n\n, zn) =\n\nf (obs)\nn\n\n(\u03bb) Sb(\u03bb) C(\u03bb) d\u03bb ,\n\n(6)\n\nwhere C(\u03bb) is a conversion factor to go from the units of fn(\u03bb) to nanomaggies (details of this\nconversion are available in the supplementary material). The function \u00b5b takes in a rest frame SED,\na redshift (z) and maps it to the observed b-band speci\ufb01c \ufb02ux. The results of this projection onto\nSDSS bands are modeled as independent Gaussian random variables with known variance\n\nyn,b | f (rest)\n\nn\n\n, zn\n\nind\u223c N (\u00b5b(f (rest)\n\nn\n\n, zn), \u03c4 2\n\nn,b) .\n\n(7)\n\nn\n\nn\n\nConditioned on the basis, B = {Bk}, we can represent f (rest)\nwith a low-dimensional vector. Note\nthat f (rest)\nis a function of wn, zn, mn, and B (see Equation 4), so we can think of \u00b5b as a function\nof wn, zn, mn, and B. We overload notation, and re-write the conditional likelihood of photometric\nobservations as\n\nyn,b | wn, zn, mn, B \u223c N (\u00b5b(wn, zn, mn, B), \u03c4 2\n\n(8)\nIntuitively, what gives us statistical traction in inferring the posterior distribution over zn is the struc-\nture learned in the latent basis, B, and weights w, i.e., the features that correspond to distinguishing\nbumps and dips in the SED.\nNote on priors For photometric weight and redshift inference, we use a \ufb02at prior on zn \u2208 [0, 8],\nand empirically derived priors for mn and wn, from the sample of spectroscopically measured\nsources. Choice of priors is described in the supplementary material.\n\nn,b) .\n\n4\n\n\f4\n\nInference\n\nBasis estimation For computational tractability, we \ufb01rst compute a maximum a posteriori (MAP)\nestimate of the basis, Bmap to condition on. Using the spectroscopic data, {xn,\u03bb, \u03c32\nn,\u03bb, zn}, we com-\npute a discretized MAP estimate of {Bk} by directly optimizing the unnormalized (log) posterior\nimplied by the likelihood in Equation 5, the GP prior over B, and diffuse priors over wn and mn,\n\np(cid:0){wn, mn},{Bk}|{xn,\u03bb, \u03c32\n\nn,\u03bb, zn}(cid:1) \u221d N(cid:89)\n\np(xn,\u03bb|zn, wn, mn,{Bk})p({Bk})p(wn)p(mn) .\n(9)\n\nn=1\n\nWe use gradient descent with momentum and LBFGS [12] directly on the parameters \u03b2k, \u03c9n,k, and\nlog(mn) for the Nspec spectroscopically measured quasars. Gradients were automatically computed\nusing autograd [9]. Following [18], we \ufb01rst resample the observed spectra into a common rest\nframe grid, \u03bb0 = (\u03bb0,1, . . . , \u03bb0,V ), easing computation of the likelihood. We note that although our\nmodel places a full distribution over Bk, ef\ufb01ciently integrating out those parameters is left for future\nwork.\n\nSampling wn, mn, and zn The Bayesian \u201cphoto-z\u201d task requires that we compute posterior\nmarginal distributions of z, integrating out w, and m. To compute these distributions, we con-\nstruct a Markov chain over the state space including z, w, and m that leaves the target posterior\ndistribution invariant. We treat the inference problem for each photometrically measured quasar,\nyn, independently. Conditioned on a basis Bk, k = 1, . . . , K, our goal is to draw posterior samples\nof wn, mn and zn for each n. The unnormalized posterior can be expressed\n\n(10)\n\n\u039b f (obs)\n\nn\n\np(wn, mn, zn|yn, B) \u221d p(yn|wn, mn, zn, B)p(wn, mn, zn)\n\nnumerically integrate expressions involving(cid:82)\n\nwhere the left likelihood term is de\ufb01ned in Equation 8. Note that due to analytic intractability, we\n(\u03bb)d\u03bb and Sb(\u03bb). Because the observation yn\ncan often be well explained by various redshifts and weight settings, the resulting marginal poste-\nrior, p(zn|X, yn, B), is often multi-modal, with regions of near zero probability between modes.\nIntuitively, this is due to the information loss in the SED-to-photometric \ufb02ux integration step.\nThis multi-modal property is problematic for many standard MCMC techniques. Single chain\nMCMC methods have to jump between modes or travel through a region of near-zero probabil-\nity, resulting in slow mixing. To combat this effect, we use parallel tempering [4], a method that is\nwell-suited to constructing Markov chains on multi-modal distributions. Parallel tempering instan-\ntiates C independent chains, each sampling from the target distribution raised to an inverse temper-\nature. Given a target distribution, \u03c0(x), the constructed chains sample \u03c0c(x) \u221d \u03c0(x)1/Tc, where Tc\ncontrols how \u201chot\u201d (i.e., how close to uniform) each chain is. At each iteration, swaps between\nchains are proposed and accepted with a standard Metropolis-Hastings acceptance probability\n\nPr(accept swap c, c(cid:48)) =\n\n\u03c0c(xc(cid:48))\u03c0c(cid:48)(xc)\n\u03c0c(xc)\u03c0c(cid:48)(xc(cid:48))\n\n.\n\n(11)\n\nWithin each chain, we use component-wise slice sampling [11] to generate samples that leave each\nchain\u2019s distribution invariant. Slice-sampling is a (relatively) tuning-free MCMC method, a conve-\nnient property when sampling from thousands of independent posteriors. We found parallel tem-\npering to be essential for convincing posterior simulations. MCMC diagnostics and comparisons to\nsingle-chain samplers are available in the supplemental material.\n\n5 Experiments and Results\n\nWe conduct three experiments to test our model, where each experiment measures redshift predictive\naccuracy for a different train/test split of spectroscopically measured quasars from the DR10QSO\ndataset [13] with con\ufb01rmed redshifts in the range z \u2208 (.01, 5.85). Our experiments split train/test\nin the following ways: (i) randomly, (ii) by r-band \ufb02uxes, (iii) by redshift values. In split (ii), we\ntrain on the brightest 90% of quasars, and test on a subset of the remaining. Split (iii) takes the\nlowest 85% of quasars as training data, and a subset of the brightest 15% as test cases. Splits (ii)\n\n5\n\n\fFigure 4: Top: MAP estimate of the\nlatent bases B = {Bk}K\nk=1. Note the\ndifferent ranges of the x-axis (wave-\nlength). Each basis function distributes\nits mass across different regions of the\nspectrum to explain different salient\nfeatures of quasar spectra in the rest\nframe. Bottom: model reconstruction\nof a training-sample SED.\n\nand (iii) are intended to test the method\u2019s robustness to different training and testing distributions,\nmimicking the discovery of fainter and farther sources. For each split, we \ufb01nd a MAP estimate of the\nbasis, B1, . . . , BK, and weights, wn to use as a prior for photometric inference. For computational\npurposes, we limit our training sample to a random subsample of 2,000 quasars. The following\nsections outline the resulting model \ufb01t and inferred SEDs and redshifts.\n\nBasis validation We examined multiple choices of K using out of sample likelihood on a valida-\ntion set. In the following experiments we set K = 4, which balances generalizability and computa-\ntional tradeoffs. Discussion of this validation is provided in the supplementary material.\n\nSED Basis We depict a MAP estimate of B1, . . . , BK in Figure 4. Our basis decomposition\nenjoys the bene\ufb01t of physical interpretability due to our density-estimate formulation of the problem.\nBasis B4 places mass on the Lyman-\u03b1 peak around 1,216 \u00c5, allowing the model to capture the co-\noccurrence of more peaked SEDs with a bump around 1,550 \u00c5. Basis B1 captures the H-\u03b1 emission\nline at around 6,500 \u00c5. Because of the \ufb02exible nonparametric priors on Bk our model is able to\nautomatically learn these features from data. The positivity of the basis and weights distinguishes\nour model from PCA-based methods, which sacri\ufb01ce physical interpretability.\n\nPhotometric measurements For each test quasar, we construct an 8-chain parallel tempering sam-\npler and run for 8,000 iterations, and discard the \ufb01rst 4,000 samples as burn-in. Given posterior sam-\nples of zn, we take the posterior mean as a point estimate. Figure 5 compares the posterior mean to\nspectroscopic measurements (for three different data-split experiments), where the gray lines denote\nposterior sample quantiles. In general there is a strong correspondence between spectroscopically\nmeasured redshift and our posterior estimate. In cases where the posterior mean is off, our distri-\nbution often covers the spectroscopically con\ufb01rmed value with probability mass. This is clear upon\ninspection of posterior marginal distributions that exhibit extreme multi-modal behavior. To combat\nthis multi-modality, it is necessary to inject the model with more information to eliminate plausible\nhypotheses; this information could come from another measurement (e.g., a new photometric band),\nor from structured prior knowledge over the relationship between zn, wn, and mn. Our method\nsimply \ufb01ts a mixture of Gaussians to the spectroscopically measured wn, mn sample to formulate\na prior distribution. However, incorporating dependencies between zn, wn and mn, similar to the\nXDQSOz technique, will be incorporated in future work.\n\n5.1 Comparisons\n\nWe compare the performance of our redshift estimator with two recent photometric redshift estima-\ntors, XDQSOz [2] and a neural network [3]. The method in [2] is a conditional density estimator\nthat discretizes the range of one \ufb02ux band (the i-band) and \ufb01ts a mixture of Gaussians to the joint\ndistribution over the remaining \ufb02uxes and redshifts. One disadvantage to this approach is there there\n\n6\n\n\fFigure 5: Comparison of spectroscopically (x-axis) and photometrically (y-axis) measured redshifts\nfrom the SED model for three different data splits. The left re\ufb02ects a random selection of 4,000\nquasars from the DR10QSO dataset. The right graph re\ufb02ects a selection of 4,000 test quasars from\nthe upper 15% (zcutof f \u2248 2.7), where all training was done on lower redshifts. The red estimates\nare posterior means.\n\nFigure 6: Left: inferred SEDs from photometric data. The black line is a smoothed approximation to\nthe \u201ctrue\u201d SED using information from the full spectral data. The red line is a sample from the pos-\n(\u03bb)|X, yn, B, which imputes the entire SED from only \ufb01ve \ufb02ux measurements. Note\nterior, f (obs)\nthat the bottom sample is from the left mode, which under-predicts redshift. Right: correspond-\ning posterior predictive distributions, p(zn|X, yn, B). The black line marks the spectroscopically\ncon\ufb01rmed redshift; the red line marks the posterior mean. Note the difference in scale of the x-axis.\n\nn\n\nis no physical signi\ufb01cance to the mixture of Gaussians, and no model of the latent SED. Further-\nmore, the original method trains and tests the model on a pre-speci\ufb01ed range of i-magnitudes, which\nis problematic when predicting redshifts on much brighter or dimmer stars. The regression approach\nfrom [3] employs a neural network with two hidden layers, and the SDSS \ufb02uxes as inputs. More\nfeatures (e.g., more photometric bands) can be incorporated into all models, but we limit our exper-\niments to the \ufb01ve SDSS bands for the sake of comparison. Further detail on these two methods and\na broader review of \u201cphoto-z\u201d approaches are available in the supplementary material.\n\nAverage error and test distribution We compute mean absolute error (MAE), mean absolute\npercentage error (MAPE), and root mean square error (RMSE) to measure predictive performance.\nTable 1 compares prediction errors for the three different approaches (XD, NN, Spec). Our ex-\nperiments show that accurate redshift measurements are attainable even when the distribution of\ntraining set is different from test set by directly modeling the SED itself. Our method dramatically\noutperforms [2] and [3] in split (iii), particularly for very high redshift \ufb02uxes. We also note that\nour training set is derived from only 2,000 examples, whereas the training set for XDQSOz and the\nneural network were \u2248 80,000 quasars and 50,000 quasars, respectively. This shortcoming can be\novercome with more sophisticated inference techniques for the non-negative basis. Despite this, the\n\n7\n\n\fsplit\nrandom (all)\n\ufb02ux (all)\nredshift (all)\nrandom (z > 2.35)\n\ufb02ux (z > 2.33)\nredshift (z > 3.20)\nrandom (z > 3.11)\n\ufb02ux (z > 2.86)\nredshift (z > 3.80)\n\nXD\n0.359\n0.308\n0.841\n0.247\n0.292\n1.327\n0.171\n0.373\n2.389\n\nMAE\nNN\n0.773\n0.483\n0.736\n0.530\n0.399\n1.149\n0.418\n0.493\n2.348\n\nSpec\n0.485\n0.497\n0.619\n0.255\n0.326\n0.806\n0.289\n0.334\n0.829\n\nXD\n0.293\n0.188\n0.237\n0.091\n0.108\n0.357\n0.050\n0.112\n0.582\n\nMAPE\n\nNN\n0.533\n0.283\n0.214\n0.183\n0.143\n0.317\n0.117\n0.144\n0.569\n\nSpec\n0.430\n0.339\n0.183\n0.092\n0.124\n0.226\n0.082\n0.103\n0.198\n\nXD\n0.519\n0.461\n1.189\n0.347\n0.421\n1.623\n0.278\n0.606\n2.504\n\nRMSE\n\nNN\n0.974\n0.660\n0.923\n0.673\n0.550\n1.306\n0.540\n0.693\n2.405\n\nSpec\n0.808\n0.886\n0.831\n0.421\n0.531\n0.997\n0.529\n0.643\n1.108\n\nTable 1: Prediction error for three train-test splits, (i) random, (ii) \ufb02ux-based, (iii) redshift-based,\ncorresponding to XDQSOz [2] (XD), the neural network approach [3] (NN), our SED-based model\n(Spec). The middle and lowest sections correspond to test redshifts in the upper 50% and 10%,\nrespectively. The XDQSOz and NN models were trained on (roughly) 80,000 and 50,000 example\nquasars, respectively, while the Spec models were trained on 2,000.\n\nSED-based predictions are comparable. Additionally, because we are directly modeling the latent\nSED, our method admits a posterior estimate of the entire SED. Figure 6 displays posterior SED\nsamples and their corresponding redshift marginals for test-set quasars inferred from only SDSS\nphotometric measurements.\n\n6 Discussion\n\nWe have presented a generative model of two sources of information at very different spectral res-\nolutions to form an estimate of the latent spectral energy distribution of quasars. We also described\nan ef\ufb01cient MCMC-based inference algorithm for computing posterior statistics given photometric\nobservations. Our model accurately predicts and characterizes uncertainty about redshifts from only\nphotometric observations and a small number of separate spectroscopic examples. Moreover, we\nshowed that we can make reasonable estimates of the unobserved SED itself, from which we can\nmake inferences about other physical properties informed by the full SED.\nWe see multiple avenues of future work. Firstly, we can extend the model of SEDs to incorporate\nmore expert knowledge. One such augmentation would include a \ufb01xed collection of features, cu-\nrated by an expert, corresponding to physical properties already known about a class of sources.\nFurthermore, we can also extend our model to directly incorporate photometric pixel observations,\nas opposed to preprocessed \ufb02ux measurements. Secondly, we note that our method is more more\ncomputationally burdensome than XDQSOz and the neural network approach. Another avenue of\nfuture work is to \ufb01nd accurate approximations of these posterior distributions that are cheaper to\ncompute. Lastly, we can extend our methodology to galaxies, whose SEDs can be quite compli-\ncated. Galaxy observations have spatial extent, complicating their SEDs. The combination of SED\nand spatial appearance modeling and computationally ef\ufb01cient inference procedures is a promising\nroute toward the automatic characterization of millions of sources from the enormous amounts of\ndata available in massive photometric surveys.\n\nAcknowledgments\n\nThe authors would like to thank Matthew Hoffman and members of the HIPS lab for helpful dis-\ncussions. This work is supported by the Applied Mathematics Program within the Of\ufb01ce of Science\nAdvanced Scienti\ufb01c Computing Research of the U.S. Department of Energy under contract No.\nDE-AC02-05CH11231. This work used resources of the National Energy Research Scienti\ufb01c Com-\nputing Center (NERSC). We would like to thank Tina Butler, Tina Declerck and Yushu Yao for their\nassistance.\n\nReferences\n[1] Shadab Alam, Franco D Albareti, Carlos Allende Prieto, F Anders, Scott F Anderson, Brett H\nAndrews, Eric Armengaud, \u00c9ric Aubourg, Stephen Bailey, Julian E Bautista, et al. The\n\n8\n\n\feleventh and twelfth data releases of the Sloan digital sky survey: Final data from SDSS-III.\narXiv preprint arXiv:1501.00963, 2015.\n\n[2] Jo Bovy, Adam D Myers, Joseph F Hennawi, David W Hogg, Richard G McMahon, David\nSchiminovich, Erin S Sheldon, Jon Brinkmann, Donald P Schneider, and Benjamin A Weaver.\nPhotometric redshifts and quasar probabilities from a single, data-driven generative model. The\nAstrophysical Journal, 749(1):41, 2012.\n\n[3] M Brescia, S Cavuoti, R D\u2019Abrusco, G Longo, and A Mercurio. Photometric redshifts for\n\nquasars in multi-band surveys. The Astrophysical Journal, 772(2):140, 2013.\n\n[4] Steve Brooks, Andrew Gelman, Galin Jones, and Xiao-Li Meng. Handbook of Markov Chain\n\nMonte Carlo. CRC press, 2011.\n\n[5] Kyle S Dawson, David J Schlegel, Christopher P Ahn, Scott F Anderson, \u00c9ric Aubourg,\nStephen Bailey, Robert H Barkhouser, Julian E Bautista, Alessandra Bei\ufb01ori, Andreas A\nBerlind, et al. The baryon oscillation spectroscopic survey of SDSS-III. The Astronomical\nJournal, 145(1):10, 2013.\n\n[6] RO Gray, PW Graham, and SR Hoyt. The physical basis of luminosity classi\ufb01cation in the late\na-, f-, and early g-type stars. ii. basic parameters of program stars and the role of microturbu-\nlence. The Astronomical Journal, 121(4):2159, 2001.\n\n[7] Edward Harrison. The redshift-distance and velocity-distance laws. The Astrophysical Journal,\n\n403:28\u201331, 1993.\n\n[8] David W Hogg. Distance measures in cosmology. arXiv preprint astro-ph/9905116, 1999.\n[9] Dougal Maclaurin, David Duvenaud, and Ryan P. Adams. Autograd: Reverse-mode differen-\n\ntiation of native python. ICML workshop on Automatic Machine Learning, 2015.\n\n[10] D Christopher Martin, James Fanson, David Schiminovich, Patrick Morrissey, Peter G Fried-\nman, Tom A Barlow, Tim Conrow, Robert Grange, Patrick N Jelinksy, Bruno Millard, et al.\nThe galaxy evolution explorer: A space ultraviolet survey mission. The Astrophysical Journal\nLetters, 619(1), 2005.\n\n[11] Radford M Neal. Slice sampling. Annals of statistics, pages 705\u2013741, 2003.\n[12] Jorge Nocedal. Updating quasi-newton matrices with limited storage. Mathematics of compu-\n\ntation, 35(151):773\u2013782, 1980.\n\n[13] Isabelle P\u00e2ris, Patrick Petitjean, \u00c9ric Aubourg, Nicholas P Ross, Adam D Myers, Alina\nStreblyanska, Stephen Bailey, Patrick B Hall, Michael A Strauss, Scott F Anderson, et al.\nThe Sloan digital sky survey quasar catalog: tenth data release. Astronomy & Astrophysics,\n563:A54, 2014.\n\n[14] Jeffrey Regier, Andrew Miller, Jon McAuliffe, Ryan Adams, Matt Hoffman, Dustin Lang,\nDavid Schlegel, and Prabhat. Celeste: Variational inference for a generative model of astro-\nnomical images. In Proceedings of The 32nd International Conference on Machine Learning,\n2015.\n\n[15] SDSSIII. Measures of \ufb02ux and magnitude. 2013. https://www.sdss3.org/dr8/\n\nalgorithms/magnitudes.php.\n\n[16] Joseph Silk and Martin J Rees. Quasars and galaxy formation. Astronomy and Astrophysics,\n\n1998.\n\n[17] Chris Stoughton, Robert H Lupton, Mariangela Bernardi, Michael R Blanton, Scott Burles,\nFrancisco J Castander, AJ Connolly, Daniel J Eisenstein, Joshua A Frieman, GS Hennessy,\net al. Sloan digital sky survey: early data release. The Astronomical Journal, 123(1):485,\n2002.\n\n[18] Jakob Walcher, Brent Groves, Tam\u00e1s Budav\u00e1ri, and Daniel Dale. Fitting the integrated spectral\n\nenergy distributions of galaxies. Astrophysics and Space Science, 331(1):1\u201351, 2011.\n\n[19] David H Weinberg, Romeel Dav\u2019e, Neal Katz, and Juna A Kollmeier. The Lyman-alpha forest\nas a cosmological tool. Proceedings of the 13th Annual Astrophysica Conference in Maryland,\n666, 2003.\n\n9\n\n\f", "award": [], "sourceid": 1485, "authors": [{"given_name": "Andrew", "family_name": "Miller", "institution": "Harvard"}, {"given_name": "Albert", "family_name": "Wu", "institution": "Harvard"}, {"given_name": "Jeff", "family_name": "Regier", "institution": "Berkeley"}, {"given_name": "Jon", "family_name": "McAuliffe", "institution": null}, {"given_name": "Dustin", "family_name": "Lang", "institution": null}, {"given_name": "Mr.", "family_name": "Prabhat", "institution": "LBL/NERSC"}, {"given_name": "David", "family_name": "Schlegel", "institution": null}, {"given_name": "Ryan", "family_name": "Adams", "institution": "Harvard"}]}