{"title": "Hamiltonian Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 15379, "page_last": 15389, "abstract": "Even though neural networks enjoy widespread use, they still struggle to learn the basic laws of physics. How might we endow them with better inductive biases? In this paper, we draw inspiration from Hamiltonian mechanics to train models that learn and respect exact conservation laws in an unsupervised manner. We evaluate our models on problems where conservation of energy is important, including the two-body problem and pixel observations of a pendulum. Our model trains faster and generalizes better than a regular neural network. An interesting side effect is that our model is perfectly reversible in time.", "full_text": "Hamiltonian Neural Networks\n\nSam Greydanus\n\nGoogle Brain\n\nsgrey@google.com\n\nMisko Dzamba\n\nPetCube\n\nmouse9911@gmail.com\n\nJason Yosinski\nUber AI Labs\n\nyosinski@uber.com\n\nAbstract\n\nEven though neural networks enjoy widespread use, they still struggle to learn the\nbasic laws of physics. How might we endow them with better inductive biases? In\nthis paper, we draw inspiration from Hamiltonian mechanics to train models that\nlearn and respect exact conservation laws in an unsupervised manner. We evaluate\nour models on problems where conservation of energy is important, including the\ntwo-body problem and pixel observations of a pendulum. Our model trains faster\nand generalizes better than a regular neural network. An interesting side effect is\nthat our model is perfectly reversible in time.\n\nIdeal mass-spring system\n\nBaseline NN\n\nPrediction\n\nNoisy observations\n\nHamiltonian NN\n\nPrediction\n\nFigure 1: Learning the Hamiltonian of a mass-spring system. The variables q and p correspond to\nposition and momentum coordinates. As there is no friction, the baseline\u2019s inner spiral is due to\nmodel errors. By comparison, the Hamiltonian Neural Network learns to exactly conserve a quantity\nthat is analogous to total energy.\n\n1\n\nIntroduction\n\nNeural networks have a remarkable ability to learn and generalize from data. This lets them excel\nat tasks such as image classi\ufb01cation [21], reinforcement learning [45, 26, 37], and robotic dexterity\n[1, 22]. Even though these tasks are diverse, they all share the same underlying physical laws. For\nexample, a notion of gravity is important for reasoning about objects in an image, training an RL agent\nto walk, or directing a robot to manipulate objects. Based on this observation, researchers have become\nincreasingly interested in \ufb01nding physics priors that transfer across tasks [43, 34, 17, 10, 6, 40].\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fUntrained neural networks do not have physics priors; they learn approximate physics knowledge\ndirectly from data. This generally prevents them from learning exact physical laws. Consider the\nfrictionless mass-spring system shown in Figure 1. Here the total energy of the system is being\nconserved. More speci\ufb01cally, this particular system conserves a quantity proportional to q2 + p2,\nwhere q is the position and p is the momentum of the mass. The baseline neural network in Figure 1\nlearns an approximation of this conservation law, and yet the approximation is imperfect enough that\na forward simulation of the system drifts over time to higher or lower energy states. Can we de\ufb01ne a\nclass of neural networks that will precisely conserve energy-like quantities over time?\nIn this paper, we draw inspiration from Hamiltonian mechanics, a branch of physics concerned with\nconservation laws and invariances, to de\ufb01ne Hamiltonian Neural Networks, or HNNs. We begin with\nan equation called the Hamiltonian, which relates the state of a system to some conserved quantity\n(usually energy) and lets us simulate how the system changes with time. Physicists generally use\ndomain-speci\ufb01c knowledge to \ufb01nd this equation, but here we try a different approach:\n\nInstead of crafting the Hamiltonian by hand, we propose parameterizing it with a\nneural network and then learning it directly from data.\n\nSince almost all physical laws can be expressed as conservation laws, our approach is quite general\n[27]. In practice, our model trains quickly and generalizes well1. Figure 1, for example, shows the\noutcome of training an HNN on the same mass-spring system. Unlike the baseline model, it learns to\nconserve an energy-like quantity.\n\n2 Theory\n\nPredicting dynamics. The hallmark of a good physics model is its ability to predict changes in a\nsystem over time. This is the challenge we now turn to. In particular, our goal is to learn the dynamics\nof a system using a neural network. The simplest way of doing this is by predicting the next state\nof a system given the current one. A variety of previous works have taken this path and produced\nexcellent results [41, 14, 43, 34, 17, 6]. There are, however, a few problems with this approach.\nThe \ufb01rst problem is its notion of discrete \u201ctime steps\u201d that connect neighboring states. Since time is\nactually continuous, a better approach would be to express dynamics as a set of differential equations\nand then integrate them from an initial state at t0 to a \ufb01nal state at t1. Equation 1 shows how this\nmight be done, letting S denote the time derivatives of the coordinates of the system2. This approach\nhas been under-explored so far, but techniques like Neural ODEs take a step in the right direction [7].\n\n(q1, p1) = (q0, p0) + Z t1\n\nt0\n\nS(q, p) dt\n\n(1)\n\nThe second problem with existing methods is that they tend not to learn exact conservation laws\nor invariant quantities. This often causes them to drift away from the true dynamics of the system\nas small errors accumulate. The HNN model that we propose ameliorates both of these problems.\nTo see how it does this \u2014 and to situate our work in the proper context \u2014 we \ufb01rst brie\ufb02y review\nHamiltonian mechanics.\nHamiltonian Mechanics. William Hamilton introduced Hamiltonian mechanics in the 19th century\nas a mathematical reformulation of classical mechanics. Its original purpose was to express classical\nmechanics in a more uni\ufb01ed and general manner. Over time, though, scientists have applied it to\nnearly every area of physics from thermodynamics to quantum \ufb01eld theory [29, 32, 39].\nIn Hamiltonian mechanics, we begin with a set of coordinates (q, p). Usually, q = (q1, ..., qN )\nrepresents the positions of a set of objects whereas p = (p1, ..., pN ) denotes their momentum. Note\nhow this gives us N coordinate pairs (q1, p1)...(qN , pN ). Taken together, they offer a complete\ndescription of the system. Next, we de\ufb01ne a scalar function, H(q, p) called the Hamiltonian so that\n(2)\n\n=\n\n.\n\ndq\ndt\n\n@H\n@p\n\n,\n\ndp\ndt\n\n@H\n@q\n\n= \n\n1We make our code available at github.com/greydanus/hamiltonian-nn.\n2Any coordinates that describe the state of the system. Later we will use position and momentum (p, q).\n\n2\n\n\fEquation 2 tells us that moving coordinates in the direction SH = @H@p , @H@q gives us the time\nevolution of the system. We can think of S as a vector \ufb01eld over the inputs of H. In fact, it is a\nspecial kind of vector \ufb01eld called a \u201csymplectic gradient\u201d. Whereas moving in the direction of the\ngradient of H changes the output as quickly as possible, moving in the direction of the symplectic\ngradient keeps the output exactly constant. Hamilton used this mathematical framework to relate\nthe position and momentum vectors (q, p) of a system to its total energy Etot = H(q, p). Then,\nhe found SH using Equation 2 and obtained the dynamics of the system by integrating this \ufb01eld\naccording to Equation 1. This is a powerful approach because it works for almost any system where\nthe total energy is conserved.\nHamiltonian mechanics, like Newtonian mechanics, can predict the motion of a mass-spring system\nor a single pendulum. But its true strengths only become apparent when we tackle systems with\nmany degrees of freedom. Celestial mechanics, which are chaotic for more than two bodies, are a\ngood example. A few other examples include many-body quantum systems, \ufb02uid simulations, and\ncondensed matter physics [29, 32, 39, 33, 9, 12].\nHamiltonian Neural Networks. In this paper, we propose learning a parametric function for H\ninstead of SH. In doing so, we endow our model with the ability to learn exactly conserved quantities\nfrom data in an unsupervised manner. During the forward pass, it consumes a set of coordinates and\noutputs a single scalar \u201cenergy-like\u201d value. Then, before computing the loss, we take an in-graph\ngradient of the output with respect to the input coordinates (Figure A.1). It is with respect to this\ngradient that we compute and optimize an L2 loss (Equation 3).\n@H\u2713\n@q\n\n@H\u2713\n@p \n\n(3)\n\n+\n\nFor a visual comparison between this approach and the baseline, refer to Figure 1 or Figure 1(b).\nThis training procedure allows HNNs to learn conserved quantities analogous to total energy straight\nfrom data. Apart from conservation laws, HNNs have several other interesting and potentially useful\nproperties. First, they are perfectly reversible in that the mapping from (q, p) at one time to (q, p)\nat another time is bijective. Second, we can manipulate the HNN-conserved quantity (analogous\nto total energy) by integrating along the gradient of H, giving us an interesting counterfactual tool\n(e.g. \u201cWhat would happen if we added 1 Joule of energy?\u201d). We\u2019ll discuss these properties later in\nSection 6.\n\n@p\n\n@t2\n\nLHN N =\n\n@q\n\n@t2\n\n+\n\n3 Learning a Hamiltonian from Data\n\nOptimizing the gradients of a neural network is a rare approach. There are a few previous works\nwhich do this [42, 35, 28], but their scope and implementation details diverge from this work and\nfrom one another. With this in mind, our \ufb01rst step was to investigate the empirical properties of\nHNNs on three simple physics tasks.\nTask 1: Ideal Mass-Spring. Our \ufb01rst task was to model the dynamics of the frictionless mass-spring\nsystem shown in Figure 1. The system\u2019s Hamiltonian is given in Equation 4 where k is the spring\nconstant and m is the mass constant. For simplicity, we set k = m = 1. Then we sampled initial\ncoordinates with total energies uniformly distributed between [0.2, 1]. We constructed training and\ntest sets of 25 trajectories each and added Gaussian noise with standard deviation 2 = 0.1 to every\ndata point. Each trajectory had 30 observations; each observation was a concatenation of (q, p).\n\nH =\n\n1\n2\n\nkq2 +\n\np2\n2m\n\n(4)\n\nTask 2: Ideal Pendulum. Our second task was to model a frictionless pendulum. Pendulums are\nnonlinear oscillators so they present a slightly more dif\ufb01cult problem. Writing the gravitational\nconstant as g and the length of the pendulum as l, the general Hamiltonian is\n\nH = 2mgl(1 cos q) +\n\nl2p2\n2m\n\n(5)\n\nOnce again we set m = l = 1 for simplicity. This time, we set g = 3 and sampled initial coordinates\nwith total energies in the range [1.3, 2.3]. We chose these numbers in order to situate the dataset along\nthe system\u2019s transition from linear to nonlinear dynamics. As with Task 1, we constructed training\nand test sets of 25 trajectories each and added the same amount of noise.\n\n3\n\n\fTask 3: Real Pendulum. Our third task featured the position and momentum readings from a real\npendulum. We used data from a Science paper by Schmidt & Lipson [35] which also tackled the\nproblem of learning conservation laws from data. This dataset was noisier than the synthetic ones and\nit did not strictly obey any conservation laws since the real pendulum had a small amount of friction.\nOur goal here was to examine how HNNs fared on noisy and biased real-world data.\n\n3.1 Methods\n\nIn all three tasks, we trained our models with a learning rate of 103 and used the Adam optimizer\n[20]. Since the training sets were small, we set the batch size to be the total number of examples.\nOn each dataset we trained two fully-connected neural networks: the \ufb01rst was a baseline model that,\ngiven a vector input (q, p) output the vector (@q/@t, @p/@t) directly. The second was an HNN\nthat estimated the same vector using the derivative of a scalar quantity as shown in Equation 2 (also\nsee Figure A.1). Where possible, we used analytic time derivatives as the targets. Otherwise, we\ncalculated \ufb01nite difference approximations. All of our models had three layers, 200 hidden units, and\ntanh activations. We trained them for 2000 gradient steps and evaluated them on the test set.\nWe logged three metrics: L2 train loss, L2 test loss, and mean squared error (MSE) between the true\nand predicted total energies. To determine the energy metric, we integrated our models according\nto Equation 1 starting from a random test point. Then we used MSE to measure how much a given\nmodel\u2019s dynamics diverged from the ground truth. Intuitively, the loss metrics measure our model\u2019s\nability to \ufb01t individual data points while the energy metric measures its stability and conservation\nof energy over long timespans. To obtain dynamics, we integrated our models with the fourth-order\nRunge-Kutta integrator in scipy.integrate.solve_ivp and set the error tolerance to 109 [30].\n\n3.2 Results\n\nPredictions\n\nMSE between coordinates\n\nTotal HNN-conserved quantity\n\nTotal energy\n\ng\nn\ni\nr\np\ns\n-\ns\ns\na\nm\n\n \nl\n\na\ne\nd\n\nI\n\nl\n\nm\nu\nu\nd\nn\ne\np\n\n \nl\n\na\ne\nd\n\nI\n\nl\n\nm\nu\nu\nd\nn\ne\np\n\n \nl\n\na\ne\nR\n\nFigure 2: Analysis of models trained on three simple physics tasks. In the \ufb01rst column, we observe\nthat the baseline model\u2019s dynamics gradually drift away from the ground truth. The HNN retains a\nhigh degree of accuracy, even obscuring the black baseline in the \ufb01rst two plots. In the second column,\nthe baseline\u2019s coordinate MSE error rapidly diverges whereas the HNN\u2019s does not. In the third\ncolumn, we plot the quantity conserved by the HNN. Notice that it closely resembles the total energy\nof the system, which we plot in the fourth column. In consequence, the HNN roughly conserves total\nenergy whereas the baseline does not.\n\n4\n\n\fWe found that HNNs train as quickly as baseline models and converge to similar \ufb01nal losses. Table 1\nshows their relative performance over the three tasks. But even as HNNs tied with the baseline on\non loss, they dramatically outperformed it on the MSE energy metric. Figure 2 shows why this is\nthe case: as we integrate the two models over time, various errors accumulate in the baseline and it\neventually diverges. Meanwhile, the HNN conserves a quantity that closely resembles total energy\nand diverges more slowly or not at all.\nIt\u2019s worth noting that the quantity conserved by the HNN is not equivalent to the total energy; rather,\nit\u2019s something very close to the total energy. The third and fourth columns of Figure 2 provide a\nuseful comparison between the HNN-conserved quantity and the total energy. Looking closely at the\nspacing of the y axes, one can see that the HNN-conserved quantity has the same scale as total energy,\nbut differs by a constant factor. Since energy is a relative quantity, this is perfectly acceptable3.\nThe total energy plot for the real pendulum shows another interesting pattern. Whereas the ground\ntruth data does not quite conserve total energy, the HNN roughly conserves this quantity. This, in fact,\nis a fundamental limitation of HNNs: they assume a conserved quantity exists and thus are unable to\naccount for things that violate this assumpation, such as friction. In order to account for friction, we\nwould need to model it separately from the HNN.\n\n4 Modeling Larger Systems\n\nHaving established baselines on a few simple tasks, our next step was to tackle a larger system\ninvolving more than one pair of (p, q) coordinates. One well-studied problem that \ufb01ts this description\nis the two-body problem, which requires four (p, q) pairs.\n+ |p1|2 + |p2|2\n\n+ g\n\n(6)\n\nH = |pCM|2\n\nm1 + m2\n\n2\u00b5\n\nm1m2\n|q1 q2|2\n\nTask 4: Two-body problem. In the two-body problem, point particles interact with one another via\nan attractive force such as gravity. Once again, we let g be the gravitational constant and m represent\nmass. Equation 6 gives the Hamiltonian of the system where \u00b5 is the reduced mass and pCM is the\nmomentum of the center of mass. As in previous tasks, we set m1 = m2 = g = 1 for simplicity.\nFurthermore, we restricted our experiments to systems where the momentum of the center of mass\nwas zero. Even so, with eight degrees of freedom (given by the x and y position and momentum\ncoordinates of the two bodies) this system represented an interesting challenge.\n\n4.1 Methods\n\nOur \ufb01rst step was to generate a dataset of 1000 near-circular, two-body trajectories. We initialized\nevery trajectory with center of mass zero, total momentum zero, and radius r = kq2 q1k in the\nrange [0.5, 1.5]. In order to control the level of numerical stability, we chose initial velocities that\ngave perfectly circular orbits and then added Gaussian noise to them. We found that scaling this noise\nby a factor of 2 = 0.05 produced trajectories with a good balance between stability and diversity.\nWe used fourth-order Runge-Kutta integration to \ufb01nd 200 trajectories of 50 observations each and\nthen performed an 80/20% train/test set split over trajectories. Our models and training procedure\nwere identical to those described in Section 3 except this time we trained for 10,000 gradient steps\nand used a batch size of 200.\n\n4.2 Results\n\nThe HNN model scaled well to this system. The \ufb01rst row of Figure 3 suggests that it learned to\nconserve a quantity nearly equal to the total energy of the system whereas the baseline model did not.\nThe second row of Figure 3 gives a qualitative comparison of trajectories. After one orbit, the\nbaseline dynamics have completely diverged from the ground truth whereas the HNN dynamics have\nonly accumulated a small amount of error. As we continue to integrate up to t = 50 and beyond\n(Figure B.1), both models diverge but the HNN does so at a much slower rate. Even as the HNN\n\n3To see why energy is relative, imagine a cat that is at an elevation of 0 m in one reference frame and 1 m in\nanother. Its potential energy (and total energy) will differ by a constant factor depending on frame of reference.\n\n5\n\n\fdiverges from the ground truth orbit, its total energy remains stable rather than decaying to zero or\nspiraling to in\ufb01nity. We report quantitative results for this task in Table 1. Both train and test losses\nof the HNN model were about an order of magnitude lower than those of the baseline. The HNN did\na better job of conserving total energy, with an energy MSE that was several orders of magnitude\nbelow the baseline.\nHaving achieved success on the two-body\nproblem, we ran the same set of experi-\nments on the chaotic three-body problem.\nWe show preliminary results in Appendix\nB where once again the HNN outperforms\nits baseline by a considerable margin. We\nopted to focus on the two-body results\nhere because the three-body results still\nneed improvement.\n\nHamiltonian NN\n\ny\ng\nr\ne\nn\nE\n\nGround truth\n\nBaseline NN\n\nTime\n\n5 Learning\na Hamiltonian from Pixels\n\ny\nr\no\nt\nc\ne\na\nr\nT\n\nj\n\ny\n\nx\n\nOne of the key strengths of neural net-\nworks is that they can learn abstract repre-\nsentations directly from high-dimensional\ndata such as pixels or words. Having\ntrained HNN models on position and mo-\nmentum coordinates, we were eager to see\nwhether we could train them on arbitrary\ncoordinates like the latent vectors of an autoencoder.\nTask 5: Pixel Pendulum. With this in mind, we constructed a dataset of pixel observations of a\npendulum and then combined an autoencoder with an HNN to model its dynamics. To our knowledge\nthis is the \ufb01rst instance of a Hamiltonian learned directly from pixel data.\n\nFigure 3: Analysis of an example 2-body trajectory. The\ndynamics of the baseline model do not conserve total\nenergy and quickly diverge from ground truth. The HNN,\nmeanwhile, approximately conserves total energy and\naccrues a small amount of error after one full orbit.\n\n5.1 Methods\n\nIn recent years, OpenAI Gym has been widely adopted by the machine learning community as a\nmeans for training and evaluating reinforcement learning agents [5]. Some works have even trained\nworld models on these environments [15, 16]. Seeing these efforts as related and complimentary to\nour work, we used OpenAI Gym\u2019s Pendulum-v0 environment in this experiment.\nFirst, we generated 200 trajectories of 100 frames each4. We required that the maximum absolute\ndisplacement of the pendulum arm be \u21e1\n6 radians. Starting from 400 x 400 x 3 RGB pixel observations,\nwe cropped, desaturated, and downsampled them to 28 x 28 x 1 frames and concatenated each frame\nwith its successor so that the input to our model was a tensor of shape batch x 28 x 28 x 2. We\nused two frames so that velocity would be observable from the input. Without the ability to observe\nvelocity, an autoencoder without recurrence would be unable to ascertain the system\u2019s full state space.\nIn designing the autoencoder portion of the model, our main objective was simplicity and trainability.\nWe chose to use fully-connected layers in lieu of convolutional layers because they are simpler.\nFurthermore, convolutional layers sometimes struggle to extract even simple position information\n[23]. Both the encoder and decoder were composed of four fully-connected layers with relu\nactivations and residual connections. We used 200 hidden units on all layers except the latent vector\nz, where we used two units. As for the HNN component of this model, we used the same architecture\nand parameters as described in Section 3. Unless otherwise speci\ufb01ed, we used the same training\nprocedure as described in Section 4.1. We found that using a small amount of weight decay, 105 in\nthis case, was bene\ufb01cial.\nLosses. The most notable difference between this experiment and the others was the loss function.\nThis loss function was composed of three terms: the \ufb01rst being the HNN loss, the second being a clas-\nsic autoencoder loss (L2 loss over pixels), and the third being an auxiliary loss on the autoencoder\u2019s\n\n4Choosing the \u201cno torque\u201d action at every timestep.\n\n6\n\n\flatent space:\n\np (zt\n\nq zt+1\n\n(7)\n\nLCC =zt\n\nq )2\n\nThe purpose of the auxiliary loss term, given in Equation 7, was to make the second half of z,\nwhich we\u2019ll label zp, resemble the derivatives of the \ufb01rst half of z, which we\u2019ll label zq. This loss\nencouraged the latent vector (zq, zp) to have roughly same properties as canonical coordinates (q, p).\nThese properties, measured by the Poisson bracket relations, are necessary for writing a Hamiltonian.\nWe found that the auxiliary loss did not degrade the autoencoder\u2019s performance. Furthermore, it is\nnot domain-speci\ufb01c and can be used with any autoencoder with an even-sized latent space.\n\nFigure 4: Predicting the dynamics of the pixel pendulum. We train an HNN and its baseline to predict\ndynamics in the latent space of an autoencoder. Then we project to pixel space for visualization. The\nbaseline model rapidly decays to lower energy states whereas the HNN remains close to ground truth\neven after hundreds of frames. It mostly obscures the ground truth line in the bottom plot.\n\n5.2 Results\n\nUnlike the baseline model, the HNN learned to conserve a scalar quantity analogous to the total\nenergy of the system. This enabled it to predict accurate dynamics for the system over much longer\ntimespans. Figure 4 shows a qualitative comparison of trajectories predicted by the two models. As in\nprevious experiments, we computed these dynamics using Equation 2 and a fourth-order Runge-Kutta\nintegrator. Unlike previous experiments, we performed this integration in the latent space of the\nautoencoder. Then, after integration, we projected to pixel space using the decoder network. The\nHNN and its baseline reached comparable train and test losses, but once again, the HNN dramatically\noutperformed the baseline on the energy metric (Table 1).\n\nTable 1: Quantitative results across all \ufb01ve tasks. Whereas the HNN is competitive with the baseline\non train/test loss, it dramatically outperforms the baseline on the energy metric. All values are\nmultiplied by 103 unless noted otherwise. See Appendix A for a note on train/test split for Task 3.\n\nTask\n1: Ideal mass-spring\n2: Ideal pendulum\n3: Real pendulum\n4: Two body (\u21e5106)\n5: Pixel pendulum\n\nTrain loss\nBaseline\n37 \u00b1 2\n33 \u00b1 2\n2.7 \u00b1 .2\n33 \u00b1 1\n18 \u00b1 .2\n\nHNN\n37 \u00b1 2\n33 \u00b1 2\n9.2 \u00b1 .5\n3.0 \u00b1 .1\n19 \u00b1 .2\n\nTest loss\nBaseline HNN\n36 \u00b1 2\n37 \u00b1 2\n35 \u00b1 2\n36 \u00b1 2\n2.2 \u00b1 .3\n6.0 \u00b1 .6\n2.8 \u00b1 .1\n30 \u00b1 .1\n17 \u00b1 .3\n18 \u00b1 .3\n\nEnergy\nBaseline\n170 \u00b1 20\n42 \u00b1 10\n390 \u00b1 7\n6.3e4 \u00b1 3e4\n9.3 \u00b1 1\n\nHNN\n.38 \u00b1 .1\n25 \u00b1 5\n14 \u00b1 5\n39 \u00b1 5\n.15 \u00b1 .01\n\n7\n\n\f6 Useful properties of HNNs\n\nWhile the main purpose of HNNs is to endow neural networks with better physics priors, in this\nsection we ask what other useful properties these models might have.\nAdding and removing energy. So far, we have seen that\nintegrating the symplectic gradient of the Hamiltonian can\ngive us the time evolution of a system but we have not\n\ntried following the Riemann gradient RH = @H@q , @H@p.\n\nIntuitively, this corresponds to adding or removing some\nof the HNN-conserved quantity from the system.\nIt\u2019s\nespecially interesting to alternate between integrating RH\nand SH. Figure 5 shows how we can take advantage of\nthis effect to \u201cbump\u201d the pendulum to a higher energy\nlevel. We could imagine using this technique to answer\ncounterfactual questions e.g. \u201cWhat would have happened\nif we applied a torque?\u201d\nFigure 5: Visualizing integration in\nPerfect reversibility. As neural networks have grown in\nthe latent space of the Pixel Pendulum\nsize, the memory consumption of transient activations, the\nmodel. We alternately integrate SH at\nintermediate activations saved for backpropagation, has\nlow energy (blue circle), RH (purple\nbecome a notable bottleneck. Several works propose semi-\nline), and then SH at higher energy (red\nreversible models that construct one layer\u2019s activations\ncircle).\nfrom the activations of the next [13, 25, 19]. Neural ODEs\nalso have this property [7]. Many of these models are only approximately reversible: their mappings\nare not quite bijective. Unlike those methods, our approach is guaranteed to produce trajectories that\nare perfectly reversible through time. We can simply refer to a result from Hamiltonian mechanics\ncalled Liouville\u2019s Theorem: the density of particles in phase space is constant. What this implies is\nthat any mapping (q0, p0) ! (q1, p1) is bijective/invertible.\n7 Related work\nLearning physical laws from data. Schmidt & Lipson [35] used a genetic algorithm to search a\nspace of mathematical functions for conservation laws and recovered the Lagrangians and Hamiltoni-\nans of several real systems. We were inspired by their approach, but used a neural neural network to\navoid constraining our search to a set of hand-picked functions. Two recent works are similar to this\npaper in that the authors sought to uncover physical laws from data using neural networks [18, 4].\nUnlike our work, they did not explicitly parameterize Hamiltonians.\nPhysics priors for neural networks. A wealth of previous works have sought to furnish neural\nnetworks with better physics priors. Many of these works are domain-speci\ufb01c: the authors used\ndomain knowledge about molecular dynamics [31, 38, 8, 28], quantum mechanics [36], or robotics\n[24] to help their models train faster or generalize. Others, such as Interaction Networks or Relational\nNetworks were meant to be fully general [43, 34, 2]. Here, we also aimed to keep our approach fully\ngeneral while introducing a strong and theoretically-motivated prior.\nModeling energy surfaces. Physicists, particularly those studying molecular dynamics, have seen\nsuccess using neural networks to model energy surfaces [3, 11, 36, 44]. In particular, several works\nhave shown dramatic computation speedups compared to density functional theory [31, 38, 8].\nMolecular dynamics researchers integrate the derivatives of energy in order to obtain dynamics,\njust as we did in this work. A key difference between these approaches and our own is that 1) we\nemphasize the Hamiltonian formalism 2) we optimize the gradients of our model (though some works\ndo optimize the gradients of a molecular dynamics model [42, 28]).\n\n8 Discussion\n\nWhereas Hamiltonian mechanics is an old and well-established theory, the science of deep learning is\nstill in its infancy. Whereas Hamiltonian mechanics describes the real world from \ufb01rst principles,\ndeep learning does so starting from data. We believe that Hamiltonian Neural Networks, and models\nlike them, represent a promising way of bringing together the strengths of both approaches.\n\n8\n\n\f9 Acknowledgements\n\nSam Greydanus would like to thank the Google AI Residency Program for providing extraordinary\nmentorship and resources. The authors would like to thank Nic Ford, Trevor Gale, Rapha Gontijo\nLopes, Keren Gu, Ben Caine, Mark Woodward, Stephan Hoyer, Jascha Sohl-Dickstein, and many\nothers for insightful conversations and support.\nSpecial thanks to James and Judy Greydanus for their feedback and support from beginning to end.\n\nReferences\n[1] Andrychowicz, M., Baker, B., Chociej, M., Jozefowicz, R., McGrew, B., Pachocki, J., Petron,\nA., Plappert, M., Powell, G., Ray, A., et al. Learning dexterous in-hand manipulation. arXiv\npreprint arXiv:1808.00177, 2018.\n\n[2] Battaglia, P., Pascanu, R., Lai, M., Rezende, D. J., et al. Interaction networks for learning\nabout objects, relations and physics. In Advances in neural information processing systems, pp.\n4502\u20134510, 2016.\n\n[3] Behler, J. Neural network potential-energy surfaces in chemistry: a tool for large-scale simula-\n\ntions. Physical Chemistry Chemical Physics, 13(40):17930\u201317955, 2011.\n\n[4] Bondesan, R. and Lamacraft, A. Learning symmetries of classical integrable systems. arXiv\n\npreprint arXiv:1906.04645, 2019.\n\n[5] Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba,\n\nW. Openai gym. arXiv preprint arXiv:1606.01540, 2016.\n\n[6] Chang, M. B., Ullman, T., Torralba, A., and Tenenbaum, J. B. A compositional object-based\n\napproach to learning physical dynamics. arXiv preprint arXiv:1612.00341, 2016.\n\n[7] Chen, T. Q., Rubanova, Y., Bettencourt, J., and Duvenaud, D. K. Neural ordinary\ndifferential equations. pp. 6571\u20136583, 2018. URL http://papers.nips.cc/paper/\n7892-neural-ordinary-differential-equations.pdf.\n\n[8] Chmiela, S., Tkatchenko, A., Sauceda, H. E., Poltavsky, I., Sch\u00fctt, K. T., and M\u00fcller, K.-R.\nMachine learning of accurate energy-conserving molecular force \ufb01elds. Science advances, 3(5):\ne1603015, 2017.\n\n[9] Cohen-Tannoudji, C., Dupont-Roc, J., and Grynberg, G. Photons and atoms-introduction to\nquantum electrodynamics. Photons and Atoms-Introduction to Quantum Electrodynamics, by\nClaude Cohen-Tannoudji, Jacques Dupont-Roc, Gilbert Grynberg, pp. 486. ISBN 0-471-18433-\n0. Wiley-VCH, February 1997., pp. 486, 1997.\n\n[10] de Avila Belbute-Peres, F., Smith, K., Allen, K., Tenenbaum, J., and Kolter, J. Z. End-to-end\ndifferentiable physics for learning and control. In Advances in Neural Information Processing\nSystems, pp. 7178\u20137189, 2018.\n\n[11] Gastegger, M. and Marquetand, P. High-dimensional neural network potentials for organic\nreactions and an improved training algorithm. Journal of chemical theory and computation, 11\n(5):2187\u20132198, 2015.\n\n[12] Girvin, S. M. and Yang, K. Modern condensed matter physics. Cambridge University Press,\n\n2019.\n\n[13] Gomez, A. N., Ren, M., Urtasun, R., and Grosse, R. B. The reversible residual network:\nBackpropagation without storing activations. In Advances in neural information processing\nsystems, pp. 2214\u20132224, 2017.\n\n[14] Grzeszczuk, R. NeuroAnimator: fast neural network emulation and control of physics-based\n\nmodels. University of Toronto.\n\n[15] Ha, D. and Schmidhuber, J. Recurrent world models facilitate policy evolution. In Advances in\n\nNeural Information Processing Systems, pp. 2450\u20132462, 2018.\n\n9\n\n\f[16] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., and Davidson, J. Learning\n\nlatent dynamics for planning from pixels. arXiv preprint arXiv:1811.04551, 2018.\n\n[17] Hamrick, J. B., Allen, K. R., Bapst, V., Zhu, T., McKee, K. R., Tenenbaum, J. B., and Battaglia,\nP. W. Relational inductive bias for physical construction in humans and machines. arXiv\npreprint arXiv:1806.01203, 2018.\n\n[18] Iten, R., Metger, T., Wilming, H., Del Rio, L., and Renner, R. Discovering physical concepts\n\nwith neural networks. arXiv preprint arXiv:1807.10300, 2018.\n\n[19] Jacobsen, J.-H., Smeulders, A., and Oyallon, E. i-revnet: Deep invertible networks. arXiv\n\npreprint arXiv:1802.07088, 2018.\n\n[20] Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. International Conference\n\non Learning Representations, 2014.\n\n[21] Krizhevsky, A., Sutskever, I., and Hinton, G. Imagenet classi\ufb01cation with deep convolutional\nneural networks. In Advances in Neural Information Processing Systems 25, pp. 1106\u20131114,\n2012.\n\n[22] Levine, S., Pastor, P., Krizhevsky, A., Ibarz, J., and Quillen, D. Learning hand-eye coordination\nfor robotic grasping with deep learning and large-scale data collection. The International\nJournal of Robotics Research, 37(4-5):421\u2013436, 2018.\n\n[23] Liu, R., Lehman, J., Molino, P., Such, F. P., Frank, E., Sergeev, A., and Yosinski, J. An intriguing\nfailing of convolutional neural networks and the coordconv solution. In Advances in Neural\nInformation Processing Systems, pp. 9605\u20139616, 2018.\n\n[24] Lutter, M., Ritter, C., and Peters, J. Deep lagrangian networks: Using physics as model prior\n\nfor deep learning. International Conference on Learning Representations, 2019.\n\n[25] MacKay, M., Vicol, P., Ba, J., and Grosse, R. B. Reversible recurrent neural networks. In\n\nAdvances in Neural Information Processing Systems, pp. 9029\u20139040, 2018.\n\n[26] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller,\n\nM. Playing Atari with Deep Reinforcement Learning. ArXiv e-prints, December 2013.\n\n[27] Noether, E.\n\nInvariant variation problems. Transport Theory and Statistical Physics, 1(3):\n\n186\u2013207, 1971.\n\n[28] Pukrittayakamee, A., Malshe, M., Hagan, M., Raff, L., Narulkar, R., Bukkapatnum, S., and\nKomanduri, R. Simultaneous \ufb01tting of a potential-energy surface and its corresponding force\n\ufb01elds using feedforward neural networks. The Journal of chemical physics, 130(13):134101,\n2009.\n\n[29] Reichl, L. E. A modern course in statistical physics. AAPT, 1999.\n[30] Runge, C. \u00dcber die numerische au\ufb02\u00f6sung von differentialgleichungen. Mathematische Annalen,\n\n46(2):167\u2013178, 1895.\n\n[31] Rupp, M., Tkatchenko, A., M\u00fcller, K.-R., and Von Lilienfeld, O. A. Fast and accurate modeling\nof molecular atomization energies with machine learning. Physical review letters, 108(5):\n058301, 2012.\n\n[32] Sakurai, J. J. and Commins, E. D. Modern quantum mechanics, revised edition. AAPT, 1995.\n[33] Salmon, R. Hamiltonian \ufb02uid mechanics. Annual review of \ufb02uid mechanics, 20(1):225\u2013256,\n\n1988.\n\n[34] Santoro, A., Raposo, D., Barrett, D. G., Malinowski, M., Pascanu, R., Battaglia, P., and Lillicrap,\nT. A simple neural network module for relational reasoning. In Advances in neural information\nprocessing systems, pp. 4967\u20134976, 2017.\n\n[35] Schmidt, M. and Lipson, H. Distilling free-form natural laws from experimental data. Science,\n\n324(5923):81\u201385, 2009.\n\n10\n\n\f[36] Sch\u00fctt, K. T., Arbabzadah, F., Chmiela, S., M\u00fcller, K. R., and Tkatchenko, A. Quantum-\nchemical insights from deep tensor neural networks. Nature communications, 8:13890, 2017.\n[37] Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T.,\nBaker, L., Lai, M., Bolton, A., et al. Mastering the game of go without human knowledge.\nNature, 550(7676):354, 2017.\n\n[38] Smith, J. S., Isayev, O., and Roitberg, A. E. Ani-1: an extensible neural network potential with\n\ndft accuracy at force \ufb01eld computational cost. Chemical science, 8(4):3192\u20133203, 2017.\n\n[39] Taylor, J. R. Classical mechanics. University Science Books, 2005.\n[40] Tenenbaum, J. B., De Silva, V., and Langford, J. C. A global geometric framework for nonlinear\n\ndimensionality reduction. science, 290(5500):2319\u20132323, 2000.\n\n[41] Tompson, J., Schlachter, K., Sprechmann, P., and Perlin, K. Accelerating eulerian \ufb02uid\nsimulation with convolutional networks. In Proceedings of the 34th International Conference\non Machine Learning-Volume 70, pp. 3424\u20133433. JMLR. org, 2017.\n\n[42] Wang, J., Olsson, S., Wehmeyer, C., Perez, A., Charron, N. E., de Fabritiis, G., Noe, F., and\nClementi, C. Machine learning of coarse-grained molecular dynamics force \ufb01elds. ACS Central\nScience, 2018.\n\n[43] Watters, N., Zoran, D., Weber, T., Battaglia, P., Pascanu, R., and Tacchetti, A. Visual interaction\nIn Advances in neural information\n\nnetworks: Learning a physics simulator from video.\nprocessing systems, pp. 4539\u20134547, 2017.\n\n[44] Yao, K., Herr, J. E., Toth, D. W., Mckintyre, R., and Parkhill, J. The tensormol-0.1 model\nchemistry: A neural network augmented with long-range physics. Chemical science, 9(8):\n2261\u20132269, 2018.\n\n[45] Yosinski, J., Clune, J., Hidalgo, D., Nguyen, S., Zagal, J. C., and Lipson, H. Evolving robot gaits\nin hardware: the hyperneat generative encoding vs. parameter optimization. In Proceedings of\nthe 20th European Conference on Arti\ufb01cial Life, pp. 890\u2013897, August 2011.\n\n11\n\n\f", "award": [], "sourceid": 8850, "authors": [{"given_name": "Samuel", "family_name": "Greydanus", "institution": "Oregon State University"}, {"given_name": "Misko", "family_name": "Dzamba", "institution": "Freenome"}, {"given_name": "Jason", "family_name": "Yosinski", "institution": "Uber AI; Recursion"}]}