{"title": "End-to-end Symmetry Preserving Inter-atomic Potential Energy Model for Finite and Extended Systems", "book": "Advances in Neural Information Processing Systems", "page_first": 4436, "page_last": 4446, "abstract": "Machine learning models are changing the paradigm of molecular modeling, which is a fundamental tool for material science, chemistry, and computational biology. Of particular interest is the inter-atomic potential energy surface (PES). Here we develop Deep Potential - Smooth Edition (DeepPot-SE), an end-to-end machine learning-based PES model, which is able to efficiently represent the PES for a wide variety of systems with the accuracy of ab initio quantum mechanics models. By construction, DeepPot-SE is extensive and continuously differentiable, scales linearly with system size, and preserves all the natural symmetries of the system. Further, we show that DeepPot-SE describes finite and extended systems including organic molecules, metals, semiconductors, and insulators with high fidelity.", "full_text": "End-to-end Symmetry Preserving Inter-atomic\nPotential Energy Model for Finite and Extended\n\nSystems\n\nLinfeng Zhang1, Jiequn Han1, Han Wang2,3,\u2217, Wissam A. Saidi4,\u2020,\n\nRoberto Car1,5,6, Weinan E1,7,8,\u2021\n\n1 Program in Applied and Computational Mathematics, Princeton University, USA\n\n2 Institute of Applied Physics and Computational Mathematics, China\n\n3 CAEP Software Center for High Performance Numerical Simulation, China\n\n4 Department of Mechanical Engineering and Materials Science, University of Pittsburgh, USA\n\n5 Department of Chemistry and Department of Physics, Princeton University, USA\n\n6 Princeton Institute for the Science and Technology of Materials, Princeton University, USA\n\n7 Department of Mathematics, Princeton University, USA\n\n8 Beijing Institute of Big Data Research, China\n\n\u2217wang_han@iapcm.ac.cn, \u2020alsaidi@pitt.edu, \u2021weinan@math.princeton.edu\n\nAbstract\n\nMachine learning models are changing the paradigm of molecular modeling, which\nis a fundamental tool for material science, chemistry, and computational biology.\nOf particular interest is the inter-atomic potential energy surface (PES). Here we\ndevelop Deep Potential - Smooth Edition (DeepPot-SE), an end-to-end machine\nlearning-based PES model, which is able to ef\ufb01ciently represent the PES of a\nwide variety of systems with the accuracy of ab initio quantum mechanics models.\nBy construction, DeepPot-SE is extensive and continuously differentiable, scales\nlinearly with system size, and preserves all the natural symmetries of the system.\nFurther, we show that DeepPot-SE describes \ufb01nite and extended systems including\norganic molecules, metals, semiconductors, and insulators with high \ufb01delity.\n\n1\n\nIntroduction\n\nRepresenting the inter-atomic potential energy surface (PES), both accurately and ef\ufb01ciently, is\none of the most challenging problems in molecular modeling. Traditional approaches have either\nresorted to direct application of quantum mechanics models such as density functional theory (DFT)\nmodels [1, 2], or empirically constructed atomic potential models such as the embedded atomic\nmethod (EAM) [3]. The former approach is severely limited by the size of the system that one can\nhandle, while as the latter class of methods are limited by the accuracy and the transferability of\nthe model. This dilemma has confronted the molecular modeling community for several decades.\nIn recent years, machine learning (ML) methods tackled this classical problem and a large body of\nwork has been published in this area [4\u201317]. These studies have clearly demonstrated the potential of\nusing ML methods and particularly neural network models to represent the PES. Considering the\nimportance of the PES in molecular modeling, more work is needed to provide a general framework\nfor an ML-based PES that can equally describe different systems with high \ufb01delity.\nBefore proceeding further, let us list the requirements of the PES models that we consider to be\nfundamental: 1) The model should have the potential to be as accurate as quantum mechanics for both\n\ufb01nite and extended systems. By \ufb01nite system we mean that the system is isolated and surrounded by\nvacuum, e.g., gas-state molecules; by extended system we mean that the system is in a simulation cell\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fsubject to periodic boundary conditions. 2) The only input for a PES model should be the chemical\nspecies and the atomic coordinates. Use of other input information should be avoided. 3) The PES\nmodel should be size extensive, i.e., if a system is composed of A and B subsystems, its energy\nshould be close to the sum of A\u2019s and B\u2019s energies. This property is essential for handling different\nbulk systems with varying sizes. 4) The PES model should preserve the natural symmetries of the\nsystem, such as translational, rotational, and permutational symmetries. 5) Human intervention\nshould be minimized. In other words, the model should be end-to-end. This is particularly relevant\nfor multi-component or multi-phase systems, since we typically have limited knowledge about\nsuitable empirical descriptors for these systems. 6) The model should be reasonably smooth, typically\ncontinuously differentiable such that forces are properly de\ufb01ned for molecular dynamics simulation.\nIn other words, from the viewpoint of a practitioner, the model should be comparable to \ufb01rst-principles\nquantum mechanical models in its ease-to-use and accuracy but at a signi\ufb01cantly lesser computational\ncost.\nExisting ML models generally satisfy only a subset of the above requirements. The Bonds-in-\nMolecules Neural Network method (BIM-NN) [15], for example, uses empirical information on the\nchemical bonds as input, violating requirement 2). The Gradient Domain Machine Learning (GDML)\nscheme [11] uses a global descriptor for the whole molecular pattern, violating 3). The Deep Potential\nmodel [16, 17] represents the PES as a sum of \"atomic\" energies that depend on the coordinates of\nthe atoms in each atomic environment in a symmetry-preserving way. This is achieved, however, at\nthe price of introducing discontinuities in the model, thus violating 6). The Behler-Parrinello Neural\nNetwork (BPNN) model [4] uses hand-crafted local symmetry functions as descriptors. These require\nhuman intervention, violating 5).\nFrom the viewpoint of supervised learning, there have been many interesting and challenging large-\nscale examples for classi\ufb01cation tasks, but relatively few for regression. In this regard, the PES\nprovides a natural candidate for a challenging regression task.\nThe main contributions of this paper are twofolds. First, we propose and test a new PES model\nthat satis\ufb01es all the requirements listed above. We call this model Deep Potential \u2013 Smooth Edition\n(DeepPot-SE). We believe that the methodology proposed here is also applicable to other ML tasks\nthat require a symmetry-preserving procedure. Second, we test the DeepPot-SE model on various\nsystems, which extend previous studies by incorporating DFT data for challenging materials such as\nhigh entropy alloys (HEAs). We used the DeePMD-kit package [18] for all training and testing tasks.\nThe corresponding code1 and data2 are released online.\n\n2 Related Work\n\nSpherical CNN and DeepSets. From the viewpoint of preserving symmetries, the Spherical CNN [19]\nand DeepSets [20] models are the most relevant to our work. The spherical CNN model incor-\nporates the de\ufb01nition of S2 and SO(3) cross-correlations and has shown impressive performance\nin preserving rotational invariance. The DeepSets model provides a family of functions to which\nany permutation invariant objective function must belong and has been tested on several different\ntasks, including population statistic estimation, point cloud classi\ufb01cation, set expansion, and outlier\ndetection.\nML-based PES models. In addition to the previously mentioned BIM-NN, BPNN, DeepPot, and\nGDML approaches, some other ML models for representing the PES include: The Smooth Overlap\nof Atomic Positions model (SOAP) [21] uses a kernel method based on a smooth similarity measure\nof two neighboring densities. The Deep Tensor Neural Network (DTNN) model [10] uses as input a\nvector of nuclear charges and an inter-atomic distance matrix, and introduces a sequence of interaction\npasses where \u201cthe atom representations in\ufb02uence each other in a pair-wise fashion\u201d. Recently, the\nSchNet model [12] proposed a new continuous-\ufb01lter convolutional layer to model the local atomic\ncorrelations and successfully modeled quantum interactions in small molecules.\n\n1https://github.com/deepmodeling/deepmd-kit\n2http://www.deepmd.org/database/deeppot-se-data/\n\n2\n\n\f3 Theory\n\n3.1 Preliminaries\nConsider a system of N atoms, r = {r1, r2, ..., rN}, in a 3-dimensional Euclidean space. We de\ufb01ne\nthe coordinate matrix R \u2208 RN\u00d73, whose ith column contains the 3 Cartesian coordinates of ri, i.e.,\n(1)\nThe PES E(R) \u2261 E is a function that maps the atomic coordinates and their chemical characters to\na real number. Using the energy function E, we de\ufb01ne the force matrix F(R) \u2261 F \u2208 RN\u00d73 and the\n3 \u00d7 3 virial tensor \u039e(R) \u2261 \u039e by:\n\nN}T , ri = (xi, yi, zi).\n\n1 ,\u00b7\u00b7\u00b7 , rT\n\ni ,\u00b7\u00b7\u00b7 , rT\n\nR = {rT\n\nF = \u2212\u2207RE (cid:0)Fij = \u2212\u2207Rij E(cid:1) , and \u039e = tr[R \u2297 F]\n\nRkiFkj\n\n,\n\n(2)\n\n(cid:32)\n\n\u039eij =\n\nN(cid:88)\n\nk=1\n\n(cid:33)\n\nrespectively. Finally, we denote the full parameter set used to parametrize E by w, and we write\nthe corresponding PES model as Ew(R) \u2261 Ew. The force F w and the virial \u039ew can be directly\ncomputed from Ew.\nAs illustrated in Fig. 1, in the DeepPot-SE model, the extensive property of the total energy is\npreserved by decomposing it into \u201catomic contributions\u201d that are represented by the so-called sub-\nnetworks, i.e.:\n\n(cid:88)\n\nEw\u03b1i (Ri) \u2261(cid:88)\n\nEw(R) =\n\nEi,\n\n(3)\n\ni\n\ni\n\nwhere \u03b1i denotes the chemical species of atom i. We use the subscript (...)w\u03b1i to show that the\nparameters used to represent the \u201catomic energy\u201d Ei depend only on the chemical species \u03b1i of atom\ni. Let rc be a pre-de\ufb01ned cut-off radius. For each atom i, we consider its neighbors {j|j \u2208 Nrc(i)},\nwhere Nrc(i) denotes the atom indices j such that rji < rc, with rji being the Euclidean distance\nbetween atoms i and j. We de\ufb01ne Ni = |Nrc (i)|, the cardinality of the set Nrc(i), and use\nRi \u2208 RNi\u00d73 to denote the local environment of atom i in terms of Cartesian coordinates:\n\nRi = {rT\n\nji,\u00b7\u00b7\u00b7 , rT\n\n1i,\u00b7\u00b7\u00b7 , rT\n\nNi,i}T , rji = (xji, yji, zji).\n(4)\nNote that here rji \u2261 rj \u2212 ri are de\ufb01ned as relative coordinates and the index j (1 \u2264 j \u2264 Ni) is used\nto denote the neighbors of the ith atom. Correspondingly, we have rji = (cid:107)rji(cid:107).\nThe construction in Eq. (3) is shared by other empirical potential models such as the EAM method [3],\nand by many size-extensive ML models like the BPNN method [4]. However, these approaches differ\nin the representation of Ei.\nThe sub-network for Ei consists of an encoding and a \ufb01tting neural network. The encoding network\nis specially designed to map the local environment Ri to an embedded feature space, which preserves\nthe translational, rotational, and permutational symmetries of the system. The \ufb01tting network is a\nfairly standard fully-connected feedforward neural network with skip connections, which maps the\nembedded features to an \u201catomic energy\". The optimal parameters for both the encoding and \ufb01tting\nnetworks are obtained by a single end-to-end training process to be speci\ufb01ed later.\n\n3.2 Construction of symmetry preserving functions\n\nBefore going into the details of the sub-network for Ei, we consider how to represent a scalar function\nf (r), which is invariant under translation, rotation, and permutation, i.e.:\n\n\u02c6Tbf (r) = f (r + b), \u02c6RU f (r) = f (rU), \u02c6P\u03c3f (r) = f (r\u03c3(1), r\u03c3(2), ..., r\u03c3(N )),\n\n(5)\nrespectively. Here b \u2208 R3 is an arbitrary 3-dimensional translation vector, U \u2208 R3\u00d73 is an orthogonal\nrotation matrix, and \u03c3 denotes an arbitrary permutation of the set of indices.\nGranted the \ufb01tting ability of neural networks, the key to a general representation is an embedding\nprocedure that maps the original input r to symmetry preserving components. The embedding\ncomponents should be faithful in the sense that their pre-image should be equal to r up to a symmetry\noperation. We draw inspiration from the following two observations.\n\n3\n\n\fmapped, through a sub-network, to a local \u201catomic\u201d energy Ei. Finally, E =(cid:80)\n\nFigure 1: Schematic plot of the DeepPot-SE model. (a) The mapping from the coordinate matrix\nR to the PES E. First, R is transformed to local environment matrices {Ri}N\ni=1. Then each Ri is\ni Ei. (b) The zoom-in\nof a sub-network. (b1) The transformation from Ri to the generalized local environment matrix\n\u02dcRi; (b2) The radial part of \u02dcRi is mapped, through an encoding network, to the embedding matrix\nGi1 \u2208 RNi\u00d7M1 and Gi2 \u2208 RNi\u00d7M2; (b3) The M1 \u00d7 M2 symmetry preserving features, contained\nin Di, are given by the matrix product of (Gi1)T , \u02dcRi, ( \u02dcRi)T , and Gi2. (c) Illustrative plot of the\nembedding function Gi, taking Cu as an example. (c1) radial distribution function g of the training\ndata; (c2) M2 (=4) axis \ufb01lters, de\ufb01ned as the product of Gi2 and s(r), as functions of r; (c3) 6 out of\nM1 (=80) coordinate \ufb01lters, de\ufb01ned as the product of Gi1 and s(r), as functions of r.\n\nTranslation and Rotation. For each object i, the symmetric matrix\n\n\u2126i \u2261 Ri(Ri)T\n\n(6)\n\nis an over-complete array of invariants with respect to translation and rotation [21, 22], i.e., it contains\nthe complete information of the neighboring point pattern of atom i. However, this symmetric matrix\nswitches rows and columns under a permutational operation.\nPermutation. Theorem 2 of Ref. [20] states that any permutation symmetric function f (r) can\ni \u03c6(ri)), where \u03c6(ri) is a multidimensional function, and \u03c1(...) is\n\nbe represented in the form \u03c1((cid:80)\nanother general function. For example, (cid:88)\n\ng(ri)ri\n\n(7)\n\nis invariant under permutation for any scalar function g.\n\ni\n\n3.3 The DeepPot-SE sub-networks\n\nAs shown in Fig. 1, we construct the sub-networks in three steps. First, the relative coordinates\nRi \u2208 RNi\u00d73 are mapped onto generalized coordinates \u02dcRi \u2208 RNi\u00d74. In this mapping, each row of\nRi, {xji, yji, zji}, is transformed into a row of \u02dcRi:\n\n{xji, yji, zji} (cid:55)\u2192 {s(rji), \u02c6xji, \u02c6yji, \u02c6zji},\n\n(8)\n\n4\n\n\f\uf8f1\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f3\n\n1\nrji\n1\nrji\n0,\n\nwhere \u02c6xji = s(rji)xji\ndifferentiable scalar weighting function applied to each component, de\ufb01ned as:\n\n, \u02c6yji = s(rji)yji\n\n, \u02c6zji = s(rji)zji\n\nrji\n\nrji\n\nrji\n\n, and s(rji) : R (cid:55)\u2192 R is a continuous and\n\ns(rji) =\n\n(cid:104)\n\ncos\n\n,\n\n(cid:110) 1\n\n2\n\n(rji \u2212 rcs)\n(rc \u2212 rcs)\n\n\u03c0\n\n(cid:105)\n\n(cid:111)\n\n,\n\n+\n\n1\n2\n\nrji < rcs.\n\nrcs < rji < rc.\n\n(9)\n\nrji > rc.\n\nHere rcs is a smooth cutoff parameter that allows the components in \u02dcRi to smoothly go to zero at the\nboundary of the local region de\ufb01ned by rc. The weighting function s(rji) reduces the weight of the\nparticles that are more distant from atom i. In addition, it removes from the DeepPot-SE model the\ndiscontinuity introduced by the cut-off radius rc.\nNext, we de\ufb01ne the local embedding network G\u03b1j ,\u03b1i(s(rji)), shorthanded as G(s(rji)), a neural\nnetwork mapping from a single value s(rji), through multiple hidden layers, to M1 outputs. Note\nthat the network parameters of G depend on the chemical species of both atom i and its neighbor\natom j. The local embedding matrix Gi \u2208 RNi\u00d7M1 is the matrix form of G(s(rji)):\n\n(Gi)jk = (G(s(rji)))k.\n\n(10)\nObserve that \u02dcRi( \u02dcRi)T is a generalization of the symmetry matrix \u2126i in Eq. (6) that preserves\nrotational symmetry, and (Gi)T \u02dcRi is a special realization of the permutation invariant operations in\nEq. (7). This motivates us to de\ufb01ne, \ufb01nally, the encoded feature matrix Di \u2208 RM1\u00d7M2 of atom i:\n\nDi = (Gi1)T \u02dcRi( \u02dcRi)TGi2\n\n(11)\nthat preserves both the rotation and permutation symmetry. Here Gi1 and Gi2 are matrices of the\nform (10). Apparently the translation symmetry is meanwhile preserved in (11).\nIn practice, we take Gi1 = Gi and take the \ufb01rst M2 (< M1) columns of Gi to form Gi2 \u2208 RNi\u00d7M2.\nLastly, the M1 \u00d7 M2 components contained in the feature matrix Di are reshaped into a vector to\nserve as the input of the \ufb01tting network, and yield the \u201catomic energy\" Ei. In the Supplementary\nMaterials, we show explicitly that Di, and hence the DeepPot-SE model, preserves all the necessary\nsymmetries. Moreover, DeepPot-SE model has a linear scaling with respect to N in computational\ncomplexity. Suppose there are at most Nc neighboring atoms within the cut-off radius of each atom\nand the complexity in evaluating the atomic energy Ei is f (Nc), then according to the local energy\ndecomposition of PES, the total complexity of the model is \u223c f (Nc)N. No matter how large N is,\nNc only depends on Rc and is essentially bounded due to physical constraints.\nWe remark that, considering the explanation of the Deep Potential [16] and the fact that M1 is much\nlarger than M2 in practice, we view the role of (Gi1)T \u02dcRi as being the mapping from the atomic\npoint pattern to a feature space that preserves permutation symmetry. The role of ( \u02dcRi)TGi2 is to\nselect symmetry-preserving axes onto which (Gi1)T \u02dcRi is projected . Therefore, we call Gi1 the\ncoordinate \ufb01lters and Gi2 the axis \ufb01lters. More speci\ufb01cally, each output of the embedding network Gi\ncan be thought of as a distance-and chemical-species-dependent \ufb01lter, which adds a weight to the\nneighboring atoms. To provide an intuitive idea of G1 and G2, we show in Fig. 1(c) the results of\nthese \ufb01lters obtained after training to model crystalline Cu at \ufb01nite temperatures. To help understand\nthese results, we also display the radial distribution function, g(r), of Cu. It is noted that unlike the\n\ufb01xed \ufb01lters such as Gaussians, these embedded \ufb01lters are adaptive in nature. Generally, we have seen\nthat choosing M1 \u223c 100, which is of the order of the number of neighbors of each atom within the\ncutoff radius rc, and M2 \u223c 4, gives good empirical performance. As shown by Fig. 1(c), for Cu, the\nM2 = 4 outputs of Gi2 mainly give weights to neighbors within the \ufb01rst two shells, i.e., the \ufb01rst two\npeaks of g(r), while the shapes of other \ufb01lters, as outputs of Gi1, are more diversi\ufb01ed and general.\n\n3.4 The training process\n\nL(p\u0001, pf , p\u03be) =\n\n1\n|B|\n\n(cid:88)\n\nl\u2208B\n\nThe parameters w contained in the encoding and \ufb01tting networks are obtained by a training process\nwith the Adam stochastic gradient descent method [23]. We de\ufb01ne a family of loss functions,\n\np\u0001|El \u2212 Ew\n\nl |2 + pf|Fl \u2212 F w\n\nl\n\n|2 + p\u03be||\u039el \u2212 \u039ew\n\nl ||2.\n\n(12)\n\n5\n\n\fHere B denotes the minibatch, |B| is the batch size, l denotes the index of the training data, which\ntypically consists of the snapshot of the atomic con\ufb01guration (given by the atomic coordinates, the\natomic species, and the cell tensor), and the labels (the energy, the force, and the virial). In Eq. (12),\np\u0001, pf , and p\u03be are tunable prefactors. When one or two labels are missing from the data, we set the\ncorresponding prefactor(s) to zero. It is noted that the training process is trying to maximize the usage\nof the training data. Using only the energy for training should, in principle, gives a good PES model.\nHowever, the use of forces in the training process signi\ufb01cantly reduces the number of snapshots\nneeded to train a good model.\n\n4 Data and Experiments\n\nWe test the DeepPot-SE model on a wide variety of systems comprising molecular and extended\nsystems. The extended systems include single- and multi-element metallic, semi-conducting, and\ninsulating materials. We also include supported nanoparticles and HEAs, which constitute very\nchallenging systems to model. See Table 2 for a general view of the data. The data of molecular\nsystems are from Refs. [10, 11] and are available online 3. The data of C5H5N (pyridine) are from\nRef. [24]. We generated the rest of the data using the CP2K package [25]. For each system, we used\na large super cell constructed from the optimized unit cell. The atomic structures are collected from\ndifferent ab initio molecular trajectories obtained from NVT ensemble simulations with temperature\nranging from 100 to 2000 K. To minimize correlations between the atomic con\ufb01gurations in the\nab initio MD trajectories, we swapped atomistic con\ufb01gurations between different temperatures or\nrandomly displaced the atomic positions after 1 ps. Furthermore, to enhance the sampling of the\ncon\ufb01guration space, we used a relatively large time step of 10 fs, even though this increased the\nnumber of steps to achieve self-consistency for solving the Kohn-Sham equations [1] at each step.\nMore details of each extended system are introduced in Section 4.2 and the corresponding data\ndescription is available online in the data reservoir4.\nFor clari\ufb01cation, we use the term system to denote a set of data on which a uni\ufb01ed DeepPot-SE model\nis \ufb01tted, and use the term sub-system to denote data with different composition of atoms or different\nphases within a system. For all systems, we also test the DeePMD model for comparison, which is\nmore accurate and robust than the original Deep Potential model [16]. The network structure and the\ntraining scheme (learning rate, decay step, etc.) are summarized in the Supplementary Materials.\n\n4.1 Small organic molecules\n\nDeepPot-SE\n\n6.7, 12.1 (10.2, 19.4)\n\nDeePMD [17]\n\n8.7, 19.1\n2.4, 8.3\n4.0, 12.7\n4.1, 7.1\n4.6, 10.9\n3.7, 8.5\n3.7, 9.8\n\nGDML [11]\n11.7, 42.9\n6.5, 34.3\n6.9, 34.7\n5.2, 10.0\n5.2, 12.1\n5.2, 18.6\n4.8, 10.4\n\n2.2, 3.1 (3.1, 7.7)\n3.3, 4.4 (4.7, 9.7)\n5.2, 5.5 (6.5, 13.1)\n5.0, 6.6 (6.3, 13.0)\n4.4, 5.8 (7.8, 13.3)\n4.7, 2.8 (5.0, 9.2)\n\nmolecule\nAspirin\nEthanol\nMalonaldehyde\nNaphthalene\nSalicylic acid\nToluene\nUracil\nTable 1: Mean absolute errors (MAEs) for energy and force predictions in meV and meV/\u00c5, respec-\ntively, denoted by a pair of numbers in the table. Results obtained by the DeepPot-SE, DeePMD,\nGDML, and SchNet methods are summarized. Using the DeepPot-SE method, we trained both a\nuni\ufb01ed model (results in brackets) that describes the seven molecular systems, and individual models\nthat treat each molecule alone. The GDML and SchNet benchmarks are from Ref. [12]. SchNet,\nDeepPot-SE and DeePMD used 50,000 structures for training obtained from a molecular dynamics\ntrajectory of small organic molecules. As explained in Ref. [12], GDML does not scale well with the\nnumber of atoms and training structures, and therefore used only 1000 structures for training. Best\nresults among the considered models for each molecule are displayed in bold.\n\nSchNet [12]\n\n5.2, 14.3\n2.2, 2.2\n3.5, 3.5\n4.8, 4.8\n4.3, 8.2\n3.9, 3.9\n4.3, 4.8\n\nThe small molecular system consists of seven different sub-systems, namely aspirin, ethanol, mal-\nonaldehyde, naphthalene, sallcylic acid, toluene, and uracil. The dataset has been benchmarked by\n\n3See http://www.quantum-machine.org\n4http://www.deepmd.org/database/deeppot-se-data/\n\n6\n\n\fFigure 2: Comparison of the DFT energies and the DeepPot-SE predicted energies on the testing\nsnapshots. The range of DFT energies of different systems is large. Therefore, for illustrative purpose,\nfor each sub-system, we calculate the average \u00b5E and standard deviation \u03c3E of DFT energies, and\nstandardize both the DFT energies and the DeepPot-SE predicted energies by subtracting \u00b5E from\nthem and then dividing them by \u03c3E. Then we plot the standardized energies within \u00b14.5\u03c3E. (a)\nThe uni\ufb01ed DeepPot-SE model for the small molecular system. These molecules contain up to 4\ntypes of atoms, namely C, H, O, and N. Therefore, essentially 4 atomic sub-networks are learned and\nthe corresponding parameters are shared by different molecules. (b) The DeepPot-SE model for the\nMoS2 and Pt system. To make it robust for a real problem of structural optimization for Pt clusters\non MoS2 slabs, this model learn different sub-systems, in particular Pt clusters of various sizes on\nMoS2 slabs. 6 representative sub-systems are selected in this \ufb01gure. (c) The DeepPot-SE model for\nthe CoCrFeMnNi HEA system. The sub-systems are different in random occupations of the elements\non the lattice sites. 2 out of 48 sub-systems are selected in this \ufb01gure. (d) The DeepPot-SE model for\nthe TiO2 system, which contains 3 different polymorphs. (e) The DeepPot-SE model for the pyridine\n(C5H5N) system, which contains 2 different polymorphs. (f) Other systems: Al2O3, Cu, Ge, and Si.\n\nGDML, SchNet, and DeePMD [11, 12, 17]. Unlike previous models, our emphasis here is to train one\nuni\ufb01ed model for all such molecules. A uni\ufb01ed model can be used to study chemical reactions and\ncould be transferable to unknown molecules. Therefore, it would be interesting and highly desirable\nto train a uni\ufb01ed model for all of these sub-systems. The molecules in the dataset contain at most 4\ndifferent types of atoms, namely C, H, O, and N. Therefore, we need 4 sub-networks corresponding\nto the four types of atoms with different environments. We also compare the results of the uni\ufb01ed\nmodel with the model trained individually for each sub-system. As shown in Table 1, all the methods\nshow good performance in \ufb01tting both energies and forces of the small organic molecules. The\nMAEs of the total energy are in all cases below chemical accuracy (0.04 eV), a commonly used\nbenchmark. The performance of the uni\ufb01ed model is slightly worse than the individual models, but is\nstill generally comparable.\n\n7\n\nstandardized DFT energystandardized DeepPot-SE energy(a) small molecules(b) MoS2 + Pt(c) CoCrFeMnNi HEA(d) TiO2(e) pyridine (f) others\fSystem\nbulk Cu\nbulk Ge\nbulk Si\nbulk Al2O3\nbulk C5H5N\n\nbulk TiO2\n\nMoS2+Pt\n\nCoCrFeMnNi HEA\n\nsub-system\nFCC solid\ndiamond solid\ndiamond solid\nTrigonal solid\nPyridine-I\nPyridine-II\nRutile\nAnatase\nBrookite\nMoS2 slab\nbulk Pt\nPt surface\nPt cluster\na\nPt on MoS2\nrand. occ. I b\nrand. occ. II c\n\n# snapshot\n\n3250\n4468\n6027\n5624\n20121\n18103\n2779\n2371\n4877\n555\n1717\n2468\n927\n46915\n13910\n958\n\nEnergy [meV]\n\n0.18 (0.25)\n0.35 (0.60)\n0.24 (0.51)\n0.23 (0.48)\n0.38 (0.25)\n0.65 (0.43)\n0.96 (1.97)\n1.78 (3.37)\n0.59 (1.97)\n5.26 (17.2)\n2.00 (1.85)\n6.77 (7.12)\n30.6 (35.4)\n2.62 (5.89)\n1.68 (6.99)\n5.29 (21.7)\n\nForce [meV/\u00c5]\n\n90 (90)\n38 (35)\n36 (31)\n49 (55)\n25 (25)\n39 (39)\n137 (163)\n181 (216)\n94 (109)\n23 (34)\n84 (226)\n105 (187)\n201 (255)\n94 (127)\n394 (481)\n410 (576)\n\naSince Pt clusters have different sizes, this case contains more than one sub-system. The reported values are\n\nbThis case includes 40 different random occupations of the elements on the lattice sites of the HEA system\n\naverages of all the sub-systems.\n\nwithin the training dataset.\n\ncThis case includes 16 other random occupations that are different from the training dataset.\n\nTable 2: The number of snapshots and the root mean square error (RMSE) of the DeepPot-SE\nprediction for various systems in terms of energy and forces. The RMSEs of the energies are\nnormalized by the number of atoms in the system. The numbers in parentheses are the DeePMD\nresults. For all sub-systems, 90% randomly selected snapshots are used for training, and the remaining\n10% are used for testing. Moreover, for the HEA system, more data corresponding to 16 random\noccupations that are signi\ufb01cantly different from the training dataset are added into the test dataset.\nBetter results are in bold.\n\n4.2 Bulk systems\n\nBulk systems are more challenging ML tasks due to their extensive character. In addition, in many\ncases, dif\ufb01culties also come from the complexity of the system under consideration. For example, for\nsystems containing many different phases or many different atomic components, physical/chemical\nintuition can hardly be ascertained. This is an essential obstacle for constructing hand-crafted features\nor kernels. Here we prepare two types of systems for the dataset and present results obtained from\nboth DeepPot-SE and DeePMD methods. The \ufb01rst type of systems includes Cu, Ge, Si, Al2O3,\nC5H5N, and TiO2. These datasets serve as moderately challenging tasks for a general end-to-end\nmethod. For the second type of systems, we include supported (Pt)n (n \u2264 155) nano-clusters on\nMoS2 and a high entropy 5-element alloy. These are more challenging systems due to the different\ncomponents of the atoms in the system. See Fig. 2 for illustration.\nGeneral systems. As shown in Table 2, the \ufb01rst type of systems Cu, Ge, Si, and Al2O3 only contain\none single solid phase and are relatively easy. For these systems both the DeePMD and the DeeMD-\nSE methods yield good results. The cases of C5H5N (pyridine) and TiO2 are more challenging. There\nare two polymorphs, or phases, of crystalline C5H5N called pyridine-I and pyridine-II, respectively\n(See their structures in Ref. [24]). There are three phases of TiO2, namely rutile, anatase, and brookite.\nBoth rutile and anatase have a tetragonal unit cell, while brookite has an orthorhombic unit cell.\nGrand-canonical-like system: Supported Pt clusters on a MoS2 slab. Supported noble metal nanome-\nter clusters (NCs) play a pivotal role in different technologies such as nano-electronics, energy\nstorage/conversion, and catalysis. Here we investigate supported Pt clusters on a MoS2 substrate,\nwhich have been the subject of intense investigations recently [26\u201331]. The sub-systems include\npristine MoS2 substrate, bulk Pt, Pt (100), (110) and (111) surfaces, Pt clusters, and supported Pt\nclusters on a MoS2 substrate. The size of the supported Pt clusters ranges from 6 to 20, and 30, 55,\n82, 92, 106, 134, and 155 atoms. The multi-component nature of this system, the extended character\nof the substrate, and the different sizes of the supported clusters with grand-canonical-like features,\n\n8\n\n\fmake this system very challenging for an end-to-end framework. Yet as shown in Table 2 and Fig. 2,\na uni\ufb01ed DeepPot-SE model is able to capture these effects with satisfactory accuracy.\nThe CoCrFeMnNi HEA system. HEA is a new class of emerging advanced materials with novel\nalloy design concept. In the HEA, \ufb01ve or more equi-molar or near equi-molar alloying elements\nare deliberately incorporated into a single lattice with random site occupancy [32, 33]. Given\nthe extremely large number of potential con\ufb01gurations of the alloy, entropic contributions to the\nthermodynamic landscape dictate the stability of the system in place of the cohesive energy. The\nHEA poses a signi\ufb01cant challenge for ab initio calculations due to the chemical disorder and the\nlarge number of spatial con\ufb01gurations. Here we focus on a CoCrFeMnNi HEA assuming equi-molar\nalloying element distribution. We employ a 3x3x5 supercell based on the FCC unit cell with different\nrandom distributions of the elements at the lattice sites. In our calculations we used the experimental\nlattice constant reported in Ref. [34]. Traditionally it has been hard to obtain a PES model even for\nalloy systems containing less than 3 components. As shown by Table 2, the DeepPot-SE model not\nonly is able to \ufb01t snapshots with random allocations of atoms in the training data, but also show great\npromise in transferring to systems with random locations that seem signi\ufb01cantly different from the\ntraining data.\n\n5 Summary\n\nIn this paper, we developed DeepPot-SE, an end-to-end, scalable, symmetry preserving, and accurate\npotential energy model. We tested this model on a wide variety of systems, both molecular and\nperiodic. For extended periodic systems, we show that this model can describe cases with diverse\nelectronic structure such as metals, insulators, and semiconductors, as well as diverse degrees of\ncomplexity such as bulk crystals, surfaces, and high entropy alloys. In the future, it will be of interest\nto expand the datasets for more challenging scienti\ufb01c and engineering studies, and to seek strategies\nfor easing the task of collecting training data. In addition, an idea similar to the feature matrix has\nbeen recently employed to solve many-electron Schr\u00f6dinger equation [35]. It will be of interest to\nsee the application of similar ideas to other ML-related tasks for which invariance under translation,\nrotation, and/or permutation plays a central role.\n\nAcknowledgments\n\nWe thank the anonymous reviewers for their careful reading of our manuscript and insightful com-\nments and suggestions. The work of L. Z., J. H., and W. E is supported in part by ONR grant\nN00014-13-1-0338, DOE grants DE-SC0008626 and DE-SC0009248, and NSFC grants U1430237\nand 91530322. The work of R. C. is supported in part by DOE grant DE-SC0008626. The work of H.\nW. is supported by the National Science Foundation of China under Grants 11501039 and 91530322,\nthe National Key Research and Development Program of China under Grants 2016YFB0201200 and\n2016YFB0201203, and the Science Challenge Project No. JCKY2016212A502. W.A.S. acknowl-\nedges \ufb01nancial support from National Science Foundation (DMR-1809085). We are grateful for\ncomputing time provided in part by the Extreme Science and Engineering Discovery Environment\n(XSEDE), which is supported by National Science Foundation (# NSF OCI-1053575), the Argonne\nLeadership Computing Facility, which is a DOE Of\ufb01ce of Science User Facility supported under Con-\ntract DE-AC02-06CH11357, the National Energy Research Scienti\ufb01c Computing Center (NERSC),\nwhich is supported by the Of\ufb01ce of Science of the U.S. Department of Energy under Contract No.\nDE-AC02-05CH11231, and the Terascale Infrastructure for Groundbreaking Research in Science\nand Engineering (TIGRESS) High Performance Computing Center and Visualization Laboratory at\nPrinceton University.\n\nReferences\n[1] Kohn, W. & Sham, L. J. Self-consistent equations including exchange and correlation effects.\n\nPhysical Review 140, A1133 (1965).\n\n[2] Car, R. & Parrinello, M. Uni\ufb01ed approach for molecular dynamics and density-functional\n\ntheory. Physical Review Letters 55, 2471 (1985).\n\n[3] Daw, M. S. & Baskes, M. I. Embedded-atom method: Derivation and application to impurities,\n\nsurfaces, and other defects in metals. Physical Review B 29, 6443 (1984).\n\n9\n\n\f[4] Behler, J. & Parrinello, M. Generalized neural-network representation of high-dimensional\n\npotential-energy surfaces. Physical Review Letters 98, 146401 (2007).\n\n[5] Morawietz, T., Singraber, A., Dellago, C. & Behler, J. How van der Waals interactions determine\nthe unique properties of water. Proceedings of the National Academy of Sciences 201602375\n(2016).\n\n[6] Bart\u00f3k, A. P., Payne, M. C., Kondor, R. & Cs\u00e1nyi, G. Gaussian approximation potentials: The\naccuracy of quantum mechanics, without the electrons. Physical Review Letters 104, 136403\n(2010).\n\n[7] Rupp, M., Tkatchenko, A., M\u00fcller, K.-R. & VonLilienfeld, O. A. Fast and accurate modeling of\nmolecular atomization energies with machine learning. Physical Review Letters 108, 058301\n(2012).\n\n[8] Montavon, G. et al. Machine learning of molecular electronic properties in chemical compound\n\nspace. New Journal of Physics 15, 095003 (2013).\n\n[9] Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural message passing for\n\nquantum chemistry. In International Conference on Machine Learning (ICML) (2017).\n\n[10] Sch\u00fctt, K. T., Arbabzadah, F., Chmiela, S., M\u00fcller, K. R. & Tkatchenko, A. Quantum-chemical\n\ninsights from deep tensor neural networks. Nature Communications 8, 13890 (2017).\n\n[11] Chmiela, S. et al. Machine learning of accurate energy-conserving molecular force \ufb01elds.\n\nScience Advances 3, e1603015 (2017).\n\n[12] Sch\u00fctt, K. et al. Schnet: A continuous-\ufb01lter convolutional neural network for modeling quantum\n\ninteractions. In Advances in Neural Information Processing Systems (NIPS) (2017).\n\n[13] Bart\u00f3k, A. P. et al. Machine learning uni\ufb01es the modeling of materials and molecules. Science\n\nadvances 3, e1701816 (2017).\n\n[14] Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1: an extensible neural network potential with\n\ndft accuracy at force \ufb01eld computational cost. Chemical Science 8, 3192\u20133203 (2017).\n\n[15] Yao, K., Herr, J. E., Brown, S. N. & Parkhill, J. Intrinsic bond energies from a bonds-in-\n\nmolecules neural network. Journal of Physical Chemistry Letters 8, 2689\u20132694 (2017).\n\n[16] Han, J., Zhang, L., Car, R. & E, W. Deep Potential: a general representation of a many-body\n\npotential energy surface. Communications in Computational Physics 23, 629\u2013639 (2018).\n\n[17] Zhang, L., Han, J., Wang, H., Car, R. & E, W. Deep potential molecular dynamics: A scalable\nmodel with the accuracy of quantum mechanics. Physical Review Letters 120, 143001 (2018).\n[18] Wang, H., Zhang, L., Han, J. & E, W. DeePMD-kit: A deep learning package for many-body\npotential energy representation and molecular dynamics. Computer Physics Communications\n228, 178 \u2013 184 (2018).\n\n[19] Cohen, T. S., Geiger, M., K\u00f6hler, J. & Welling, M. Spherical CNNs. In International Conference\n\non Learning Representations (ICLR) (2018).\n\n[20] Zaheer, M. et al. Deep sets. In Advances in Neural Information Processing Systems (NIPS)\n\n[23] Kingma, D. & Ba, J. Adam: a method for stochastic optimization. In International Conference\n\non Learning Representations (ICLR) (2015).\n\n[24] Ko, H.-Y., DiStasio, R. A., Santra, B. & Car, R. Thermal expansion in dispersion-bound\n\nmolecular crystals. Physical Review Materials 2, 055603 (2018).\n\n[25] Hutter, J., Iannuzzi, M., Schiffmann, F. & Vandevondele, J. cp2k: atomistic simulations of\ncondensed matter systems. Wiley Interdisciplinary Reviews Computational Molecular Science\n4, 15\u201325 (2014).\n\n[26] Huang, X. et al. Solution-phase epitaxial growth of noble metal nanostructures on dispersible\n\nsingle-layer molybdenum disul\ufb01de nanosheets. Nature Communications 4, 1444 (2013).\n\n[27] Saidi, W. A. In\ufb02uence of strain and metal thickness on metal-MoS2 contacts. The Journal of\n\nChemical Physics 141, 094707 (2014).\n\n[28] Saidi, W. A. Trends in the adsorption and growth morphology of metals on the MoS2(001)\n\nsurface. Crysal Growth & Design 15, 3190\u20133200 (2015).\n\n[29] Saidi, W. A. Density functional theory study of nucleation and growth of pt nanoparticles on\n\nMoS2(001) surface. Crysal Growth & Design 15, 642\u2013652 (2015).\n\n[30] Gong, C. et al. Metal contacts on physical vapor deposited monolayer MoS2. ACS Nano 7,\n\n11350\u20137 (2013).\n\n10\n\n[21] Bart\u00f3k, A. P., Kondor, R. & Cs\u00e1nyi, G. On representing chemical environments. Physical\n\n[22] Weyl, H. The Classical Groups: Their Invariants and Representations (Princeton university\n\nReview B 87, 184115 (2013).\n\n(2017).\n\npress, 2016).\n\n\f[31] Shi, Y., Song, B., Shahbazian-Yassar, R., Zhao, J. & Saidi, W. A. Experimentally validated\ninterface structures of metal nanoclusters on MoS2. The Journal of Physical Chemistry Letters\n9, 2972\u20132978 (2018).\n\n[32] Yeh, J.-W. et al. Nanostructured high-entropy alloys with multiple principal elements: novel\n\nalloy design concepts and outcomes. Advanced Engineering Materials 6, 299\u2013303 (2004).\n\n[33] Cantor, B., Chang, I., Knight, P. & Vincent, A. Microstructural development in equiatomic\n\nmulticomponent alloys. Materials Science and Engineering: A 375, 213\u2013218 (2004).\n\n[34] Zhang, F. et al. Polymorphism in a high-entropy alloy. Nature Communications 8, 15687\n\n(2017).\n\n[35] Han, J., Zhang, L. & E, W. Solving many-electron Schr\u00f6dinger equation using deep neural\n\nnetworks. arXiv preprint arXiv:1807.07014 (2018).\n\n11\n\n\f", "award": [], "sourceid": 2187, "authors": [{"given_name": "Linfeng", "family_name": "Zhang", "institution": "Princeton University"}, {"given_name": "Jiequn", "family_name": "Han", "institution": "Princeton University"}, {"given_name": "Han", "family_name": "Wang", "institution": "Institute of Applied Physics and Computational Mathematics"}, {"given_name": "Wissam", "family_name": "Saidi", "institution": "University of Pittsburgh"}, {"given_name": "Roberto", "family_name": "Car", "institution": "Princeton University"}, {"given_name": "Weinan", "family_name": "E", "institution": "Princeton University"}]}