{"title": "Sequential Neural Processes", "book": "Advances in Neural Information Processing Systems", "page_first": 10254, "page_last": 10264, "abstract": "Neural Processes combine the strengths of neural networks and Gaussian processes to achieve both flexible learning and fast prediction in stochastic processes. However, a large class of problems comprise underlying temporal dependency structures in a sequence of stochastic processes that Neural Processes (NP) do not explicitly consider. In this paper, we propose Sequential Neural Processes (SNP) which incorporates a temporal state-transition model of stochastic processes and thus extends its modeling capabilities to dynamic stochastic processes. In applying SNP to dynamic 3D scene modeling, we introduce the Temporal Generative Query Networks. To our knowledge, this is the first 4D model that can deal with the temporal dynamics of 3D scenes. In experiments, we evaluate the proposed methods in dynamic (non-stationary) regression and 4D scene inference and rendering.", "full_text": "Sequential Neural Processes\n\nGautam Singh\u2217\nRutgers University\n\nsingh.gautam@rutgers.edu\n\nJaesik Yoon\u2217\n\nSAP\n\njaesik.yoon01@sap.com\n\nysson@etri.re.kr\n\nYoungsung Son\n\nETRI\n\nSungjin Ahn\n\nRutgers University\n\nsungjin.ahn@rutgers.edu\n\nAbstract\n\nNeural Processes combine the strengths of neural networks and Gaussian pro-\ncesses to achieve both \ufb02exible learning and fast prediction in stochastic processes.\nHowever, a large class of problems comprises underlying temporal dependency\nstructures in a sequence of stochastic processes that Neural Processes (NP) do not\nexplicitly consider. In this paper, we propose Sequential Neural Processes (SNP)\nwhich incorporates a temporal state-transition model of stochastic processes and\nthus extends its modeling capabilities to dynamic stochastic processes. In applying\nSNP to dynamic 3D scene modeling, we introduce the Temporal Generative Query\nNetworks. To our knowledge, this is the \ufb01rst 4D model that can deal with the tem-\nporal dynamics of 3D scenes. In experiments, we evaluate the proposed methods\nin dynamic (non-stationary) regression and 4D scene inference and rendering.\n\n1\n\nIntroduction\n\nNeural networks consume all training data and computation through a costly training phase to engrave\na single function into its weights. While this makes us entertain fast prediction on the learned function,\nunder this rigid regime changing the target function means costly retraining of the network. This lack\nof \ufb02exibility thus plays as a major obstacle in tasks such as meta-learning and continual learning where\nthe function needs to be changed over time or on-demand. Gaussian processes (GP) do not suffer from\nthis problem. Conditioning on observations, it directly performs inference on the target stochastic\nprocess. Consequently, Gaussian processes show the opposite properties to neural networks: it is\n\ufb02exible in making predictions because of its non-parametric nature, but this \ufb02exibility comes at a\ncost of having slow prediction. GPs can also capture the uncertainty on the estimated function.\nNeural Processes (NP) (Garnelo et al., 2018b) are a new class of methods that combine the strengths\nof both worlds. By taking the meta-learning framework, Neural Processes learn to learn a stochastic\nprocess quickly from observations while experiencing multiple tasks of stochastic process modeling.\nThus, in Neural Processes, unlike typical neural networks, learning a function is fast and uncertainty-\naware while, unlike Gaussian processes, prediction at test time is still ef\ufb01cient.\nAn important aspect for which Neural Processes can be extended is that in many cases, certain\ntemporal dynamics underlies in a sequence of stochastic processes. This covers a broad range of\nproblems from learning RL agents being exposed to increasingly more challenging tasks to modeling\ndynamic 3D scenes. For instance, Eslami et al. (2018) proposed a variant of Neural Processes, called\nthe Generative Query Networks (GQN), to learn representation and rendering of 3D scenes. Although\nthis was successful in modeling static scenes like \ufb01xed objects in a room, we argue that to handle\n\n\u2217Equal contribution\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fmore general cases such as dynamic scenes where objects can move or interact over time, we need to\nexplicitly incorporate a temporal transition model into Neural Processes.\nIn this paper, we introduce Sequential Neural Processes (SNP) to incorporate the temporal state-\ntransition model into Neural Processes. The proposed model extends the potential of Neural Processes\nfrom modeling a stochastic process to modeling a dynamically changing sequence of stochastic\nprocesses. That is, SNP can model a (sequential) stochastic process of stochastic processes. We also\npropose to apply SNP for dynamic 3D scene modeling by developing the Temporal Generative Query\nNetworks (TGQN). In experiments, we show that TGQN outperforms GQN in terms of capturing\ntransition stochasticity, generation quality, generalization to time-horizons longer than those used\nduring training.\nOur main contributions are: We introduce Sequential Neural Processes (SNP), a meta-transfer\nlearning framework for a sequence of stochastic processes. We realize SNP for dynamic 3D scene\ninference by introducing Temporal Generative Query Networks (TGQN). To our knowledge, this is\nthe \ufb01rst 4D generative model that models dynamic 3D scenes. We describe the training challenge\nof transition-collapse unique to SNP modeling and resolve it by introducing the posterior-dropout\nELBO. We demonstrate the generalization capability of TGQN beyond the sequence lengths used\nduring training. We also demonstrate meta-transfer learning and improved generation quality in\ncontrast to Consistent Generative Query Networks (Kumar et al., 2018) gained from the decoupling\nof temporal dynamics from the scene representations.\n\n2 Background\n\n(cid:90)\n\nP (Y |X, C) =\n\nP (Y |X, z)P (z|C)dz\n\nIn this section, we introduce notations and foundational concepts that underlie the design of our\nproposed model as well as motivating applications.\nNeural Processes. Neural Processes (NP) model a stochastic process mapping an input x \u2208 Rdx to a\nrandom variable Y \u2208 Rdy. In particular, an NP is de\ufb01ned as a conditional latent variable model where\na set of context observations C = (XC, YC) = {(xi, yi)}i\u2208I(C) is given to model a conditional\nprior on the latent variable P (z|C), and the target observations D = (X, Y ) = {(xi, yi)}i\u2208I(D) are\nmodeled by the observation model p(yi|xi, z). Here, I(S) stands for the set of data-point indices in\na dataset S. This generative process can be written as follows:\n\nwhere P (Y |X, z) =(cid:81)\n\n(1)\ni\u2208I(D) P (yi|xi, z). The dataset {(Ci, Di)}i\u2208Idataset as a whole contains multi-\nple pairs of context and target sets. Each such pair (C, D) is associated with its own stochastic process\nfrom which its observations are drawn. Therefore NP \ufb02exibly models multiple tasks i.e. stochastic\nprocesses and this results in a meta-learning framework.\nIt is sometimes useful to condition the\nobservation model directly on the context C as well, i.e., p(yi|xi, sC, z) where sC = fs(C) with fs\na deterministic context encoder invariant to the ordering of the contexts. A similar encoder is also\nused for the conditional prior giving p(z|C) = p(z|rC) with fr(C). In this case, the observation\nmodel uses the context in two ways: a noisy latent path via z and a deterministic path via sC.\nThe design principle underlying this modeling is to infer the target stochastic process from contexts\nin such a way that sampling z from P (z|C) corresponds to a function which is a realization of a\nstochastic process. Because the true posterior is intractable, the model is trained via variational\napproximation which gives the following evidence lower bound (ELBO) objective:\n\nlog P\u03b8(Y |X, C) \u2265 EQ\u03c6(z|C,D) [log P\u03b8(Y |X, z)] \u2212 KL(Q\u03c6(z|C, D) (cid:107) P\u03b8(z|C)).\n\n(2)\n\nThe ELBO is optimized using the reparameterization trick (Kingma & Welling, 2013).\nGenerative Query Networks. The Generative Query Network (GQN) can be seen as an application\nof the Neural Processes speci\ufb01cally geared towards 3D scene inference and rendering. In GQN,\nquery x corresponds to a camera viewpoint in a 3D space, and output y is an image taken from the\ncamera viewpoint. Thus, the problem in GQN is cast as: given a context set of viewpoint-image\npairs, (i) to infer the representation of the full 3D space and then (ii) to generate an observation image\ncorresponding to a given query viewpoint.\nIn the original GQN, the prior is conditioned also on the query viewpoint in addition to the context,\ni.e., P (z|x, rC), and thus results in inconsistent samples across different viewpoints when modeling\n\n2\n\n\funcertainty in the scene. The Consistent GQN (Kumar et al., 2018) (CGQN) resolved this by\nremoving the dependency on the query viewpoint from the prior. This resulted in z to be a summary\nof a full 3D scene independent of the query viewpoint. Hence, it is consistent across viewpoints and\nmore similar to the original Neural Processes. For the remainder of the paper, we use the abbreviation\nGQN for CGQN unless stated otherwise.\nFor inferring representations of 3D scenes, a more complex modeling of latents is needed. For\nthis, GQN uses ConvDRAW (Gregor et al., 2016), an auto-regressive density estimator performing\nl=1 P (zl|z