{"title": "A Model to Search for Synthesizable Molecules", "book": "Advances in Neural Information Processing Systems", "page_first": 7937, "page_last": 7949, "abstract": "Deep generative models are able to suggest new organic molecules by generating strings, trees, and graphs representing their structure. While such models allow one to generate molecules with desirable properties, they give no guarantees that the molecules can actually be synthesized in practice. We propose a new molecule generation model, mirroring a more realistic real-world process, where (a) reactants are selected, and (b) combined to form more complex molecules. More specifically, our generative model proposes a bag of initial reactants (selected from a pool of commercially-available molecules) and uses a reaction model to predict how they react together to generate new molecules. We first show that the model can generate diverse, valid and unique molecules due to the useful inductive biases of modeling reactions. Furthermore, our model allows chemists to interrogate not only the properties of the generated molecules but also the feasibility of the synthesis routes. We conclude by using our model to solve retrosynthesis problems, predicting a set of reactants that can produce a target product.", "full_text": "A Model to Search for Synthesizable Molecules\n\nJohn Bradshaw\n\nUniversity of Cambridge\nMPI for Intelligent Systems\n\njab255@cam.ac.uk\n\nBrooks Paige\n\nUniversity of Cambridge\nThe Alan Turing Institute\nbpaige@turing.ac.uk\n\nMatt J. Kusner\n\nUniversity College London\nThe Alan Turing Institute\nm.kusner@ucl.ac.uk\n\nMarwin H. S. Segler\n\nBenevolentAI\n\nWestf\u00e4lische Wilhelms-Universit\u00e4t M\u00fcnster\n\nmarwin.segler@benevolent.ai\n\nJos\u00e9 Miguel Hern\u00e1ndez-Lobato\n\nUniversity of Cambridge\nThe Alan Turing Institute\n\nMicrosoft Research Cambridge\n\njmh233@cam.ac.uk\n\nAbstract\n\nDeep generative models are able to suggest new organic molecules by generating\nstrings, trees, and graphs representing their structure. While such models allow\none to generate molecules with desirable properties, they give no guarantees that\nthe molecules can actually be synthesized in practice. We propose a new molecule\ngeneration model, mirroring a more realistic real-world process, where (a) reactants\nare selected, and (b) combined to form more complex molecules. More speci\ufb01cally,\nour generative model proposes a bag of initial reactants (selected from a pool of\ncommercially-available molecules) and uses a reaction model to predict how they\nreact together to generate new molecules. We \ufb01rst show that the model can generate\ndiverse, valid and unique molecules due to the useful inductive biases of modeling\nreactions. Furthermore, our model allows chemists to interrogate not only the\nproperties of the generated molecules but also the feasibility of the synthesis routes.\nWe conclude by using our model to solve retrosynthesis problems, predicting a set\nof reactants that can produce a target product.\n\n1\n\nIntroduction\n\nThe ability of machine learning to generate structured objects has progressed dramatically in the\nlast few years. One particularly successful example of this is the \ufb02urry of developments devoted to\ngenerating small molecules [14, 46, 27, 10, 49, 11, 20, 30, 55, 41, 2]. These models have been shown\nto be extremely effective at \ufb01nding molecules with desirable properties: drug-like molecules [14],\nbiological target activity molecules [46], and soluble molecules [11].\nHowever, these improvements in molecule discovery come at a cost: these methods do not describe\nhow to synthesize such molecules, a prerequisite for experimental testing. Traditionally, in computer-\naided molecular design, this has been addressed by virtual screening [48], where molecule data sets\n|D| \u2248 108, are \ufb01rst generated via the expensive combinatorial enumeration of molecular fragments\nstitched together using hand-crafted bonding rules, and then are scored in an O(|D|) step.\nIn this paper we propose a generative model for molecules (shown in Figure 1) that describes how to\nmake such molecules from a set of commonly-available reactants. Our model \ufb01rst generates a set of\nreactant molecules, and second maps them to a predicted product molecule via a reaction prediction\nmodel. It allows one to simultaneously search for better molecules and describe how such molecules\ncan be made. By closely mimicking the real-world process of designing new molecules, we show\nthat our model: 1. Is able to generate a wide range of molecules not seen in the training data; 2.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fFigure 1: An overview of approaches used to \ufb01nd molecules with desirable properties. Left: Virtual\nscreening [48] aims to \ufb01nd novel molecules by the (computationally expensive) enumeration over\nall possible combinations of fragments. Center: More recent ML approaches, eg [14], aim to \ufb01nd\nuseful, novel molecules by optimizing in a continuous latent space; however, there are no clues\nto whether (and how) these molecules can be synthesized. Right: We approach the generation of\nmolecules through a multistage process mirroring how complex molecules are created in practice,\nwhile maintaining a continuous latent space to use for optimization. Our model, MOLECULE CHEF,\n\ufb01rst \ufb01nds suitable reactants which then react together to create a \ufb01nal molecule.\n\nAddresses practical synthesis concerns such as reaction stability and toxicity; and 3. Allows us to\npropose new reactants for given target molecules that may be more practical to manage.\n\n2 Background\n\nWe start with an overview of traditional computational techniques to discover novel molecules with\ndesirable properties. We then review recent work in machine learning (ML) that seeks to improve\nparts of this process. We then identify aspects of molecule discovery we believe deserve much more\nattention from the ML community. We end by laying out our contributions to address these concerns.\n\n2.1 Virtual Screening\n\nTo discover new molecules with certain properties, one popular technique is virtual screening (VS)\n[48, 17, 37, 8, 34]. VS works by (a) enumerating all combinations of a set of building-block molecules\n(which are combined via virtual chemical bonding rules), (b) for each molecule, calculating the\ndesired properties via simulations or prediction models, (c) \ufb01ltering the most interesting molecules to\nsynthesize in the lab. While VS is general, it has the important downside that the generation process\nis not targeted: VS needs to get lucky to \ufb01nd molecules with desirable properties, it does not search\nfor them. Given that the number of possible drug-like compounds is estimated to be \u2208 [1023, 10100]\n[52], the chemical space usually screened in VS \u2208 [107, 1010] is tiny. Searching in combinatorial\nfragment spaces has been proposed, but is limited to simpler similarity queries [38].\n\n2.2 The Molecular Search Problem\n\nTo address these downsides, one idea is to replace this full enumeration with a search algorithm; an\nidea called de novo-design (DND) [42]. Instead of generating a large set of molecules with small\nvariations, DND searches for molecules with particular properties, recomputes them for the newfound\nmolecules, and searches again. We call this the molecular search problem. Early work on the\nmolecular search problem used genetic algorithms, ant-colony optimization, or other discrete search\ntechniques to make local changes to molecules [16]. While more directed than library-generation,\nthese approaches still explored locally, limiting the diversity of discovered molecules.\nThe \ufb01rst work to apply current ML techniques to this problem was G\u00f3mez-Bombarelli et al. [14]\n(in a late 2016 preprint). Their idea was to search by learning a mapping from molecular space to\ncontinuous space and back. With this mapping it is possible to leverage well-studied optimization\ntechniques to do search: local search can be done via gradient descent and global search via Bayesian\noptimization [50, 13]. For such a mapping, the authors chose to represent molecules as SMILES\nstrings [54] and leverage advances in generative models for text [5] to learn a character variational\nautoencoder (CVAE) [26]. Shortly after this work, in an early 2017 preprint, Segler et al. [46] trained\nrecurrent neural networks (RNNs) to take properties as input and output SMILES strings with these\nproperties, with molecular search done using reinforcement learning (RL).\n\n2\n\nz\u00b5\u03c3MoleculeChef (ours)fragmentsscore via simulationsim=0.5Virtual ScreeningPrior Machine Learning De Novo Design zoptimization in latent space\u00b5\u03c3optimization in latent spacegenerate reactantsgenerate productsselected molecule!selected molecule!reaction predictorproductsOHNNONNsim=0.3sim=0.2OHNHN1) Combination Rule:2) EnumerateNOHHNOFFFOHNH2OOO+OHHNO\fIn Search of Molecular Validity. However, the SMILES string representation is very brittle: if\nindividual characters are changed or swapped, it may no longer represent any molecule (called an\ninvalid molecule). Thus, the CVAE often produced invalid molecules (in one experiment, Kusner\net al. [27] sampling from the continuous space, produced valid molecules only 0.7% of the time). To\naddress this validity problem, recent works have proposed using alternative molecular representations\nsuch as parse trees [27, 10] or graphs [49, 11, 29, 20, 30, 55, 21, 24, 41], where some of the more\nrecent among these enforce or strongly encourage validity [20, 30, 55, 24, 41]. In parallel, there has\nbeen work based on RL that has aimed to learn a validity function during training directly [15, 18].\n\n2.3 The Molecular Recipe Problem\n\nCrucially, all of the works in the previous section solving the molecular search problem focus purely\non optimizing molecules towards desirable properties. These works, in addressing the downsides of\nVS, removed a bene\ufb01t of it: knowledge of the synthesis pathway of each molecule. Without this we\ndo not know how practical it is to make ML-generated molecules.\nTo address this concern is to address the molecular recipe problem: what molecules are we able to\nmake, given a set of readily-available starting molecules? So far, this problem has been addressed\nindependently of the molecular search problem through synthesis planning (SP) [47]. SP works by\nrecursively deconstructing a molecule. This deconstruction is done via (reversed) reaction predictors:\nmodels that predict how reactant molecules produce a product molecule. More recently, novel ML\nmodels have been designed for reaction prediction [53, 45, 19, 43, 6, 44].\n\n2.4 This Work\n\nIn this paper, we propose to address both the molecular search problem and the molecular recipe\nproblem jointly. To do so, we propose a generative model over molecules using the following\nmap: First, a mapping from continuous space to a set of known, reliable, easy-to-obtain reactant\nmolecules. Second a mapping from this set of reactant molecules to a \ufb01nal product molecule, based\non a reaction prediction model [53, 45, 19, 44, 6]. Thus our generative model not only generates\nmolecules, but also a synthesis route using available reactants. This addresses the molecular recipe\nproblem, and also the molecular search problem, as the learned continuous space can also be used for\nsearch. Compared to previous work, in this work we are searching for new molecules through virtual\nchemical reactions, more directly simulating how molecules are actually discovered in the lab.\nConcretely, we argue that our model, which we shall introduce in the next section, has several\nadvantages over the current deep generative models of molecules reviewed previously:\n\nBetter extrapolation properties Generating molecules through graph editing operations, represent-\n\ning reactions, we hope gives us strong inductive biases for extrapolating well.\n\nValidity of generated molecules Naive generation of molecular SMILES strings or graphs can lead\nto molecules that are invalid. Although the syntactic validity can be \ufb01xed by using masking\n[27, 30], the molecules generated can often still be semantically invalid. By generating\nmolecules from chemically stable reactants by means of reactions, our model proposes more\nsemantically valid molecules.\n\nProvide synthesis routes Proposed molecules from other methods can often not be evaluated in\npractice, as chemists do not know how to synthesize them. As a byproduct of our model we\nsuggest synthetic routes, which could have a useful, practical value.\n\n3 Model\nIn this section we describe our model1. We de\ufb01ne the set of all possible valid molecular graphs as G,\nwith an individual graph g \u2208 G representing the atoms of a molecule as its nodes, and the type of\nbonds between these atoms (we consider single, double and triple bonds) as its edge types. The set of\ncommon reactant molecules, easily procurable by a chemist, which we want to act as building blocks\nfor any \ufb01nal molecule is a subset of this, R \u2282 G.\n\n1Further details can also be found in our appendix and code is available at https://github.com/\n\njohn-bradshaw/molecule-chef\n\n3\n\n\fAs discussed in the previous section (and shown in Figure 1) our generative model for molecules\nconsists of the composition of two parts: (1) a decoder from a continuous latent space, z \u2208 Rm,\nto a bag (ie multiset2) of easily procurable reactants, x \u2282 R; (2) a reaction predictor model that\ntransforms this bag of molecules into a multiset of product molecules y \u2282 G.\nThe bene\ufb01t of this approach is that for step (2) we can pick from several existing reaction predictor\nmodels, including recently proposed methods that have used ML techniques [25, 45, 44, 6, 9]. In this\nwork we use the Molecular Transformer (MT) of Schwaller et al. [44], as it has recently been shown\nto provide state-of-the-art performance in this task [44, Table 4].\nThis leaves us with the task of (1), learning a way to decode to (and encode from) a bag of reac-\ntants, using a parameterized encoder q(z|x) and decoder p(x|z). We call this co-occurrence model\nMOLECULE CHEF, and by moving around in the latent space we can select using MOLECULE CHEF\ndifferent \u201cbags of reactants\u201d.\nAgain there are several viable options of how to learn MOLECULE CHEF. For instance one could\nchoose to use a VAE for this task [26, 40]. However, when paired with a complex decoder these\nmodels are often dif\ufb01cult to train [5, 1], such that much of the previous work for generating graphs\nhas has tuned down the KL regularization term in these models [30, 27]. We therefore instead propose\nusing the WAE objective [51], which involves minimizing\n\nL =Ex\u223cDEq(z|x) [c(x, p(x|z))] + \u03bbD (Ex\u223cD [q(z|x)] , p(z))\n\nwhere c is a cost function, that enforces the reconstructed bag to be similar to the encoded one. D is\na divergence measure, which is weighted in relative importance by \u03bb, that forces the marginalised\ndistribution of all encodings to match the prior on the latent space. Following Tolstikhin et al. [51]\nwe use the maximum mean discrepancy (MMD) divergence measure, with \u03bb = 10 and a standard\nnormal prior over the latents. We choose c so that this \ufb01rst term matches the reconstruction term we\nwould obtain in a VAE, i.e. with c(x, z) = \u2212 log p(x|z). This means that the objective only differs\nfrom a VAE in the second, regularisation term, such that we are not trying to match each encoding to\nthe prior but instead the marginalised distribution over all datapoints. Empirically, we \ufb01nd that this\ntrains well and does not suffer from the same local optimum issues as the VAE.\n\n3.1 Encoder and Decoder\n\nWe can now begin describing the structure of our encoder and decoder. In these functions it is often\nconvenient to work with n-dimensional vector embeddings of graphs, mg \u2208 Rn. Again we are faced\nwith a series of possible alternative ways to compute these embeddings. For instance, we could\nignore the structure of the molecular graph and learn a distinct embedding for each molecule, or use\n\ufb01xed molecular \ufb01ngerprints, such as Morgan Fingerprints [33]. We instead choose to use deep graph\nneural networks [32, 12, 3] that can produce graph-isomorphic representations.\nDeep graph neural networks have been shown to perform well on a variety of tasks involving\nsmall organic molecules, and their advantages compared to the previously mentioned alternative\napproaches are that (1) they take the structure of the graph into account and (2) they can learn which\ncharacteristics are important when forming higher-level representations. In particular in this work we\nuse 4 layer Gated Graph Neural Networks (GGNN) [28]. These compute higher-level representations\nfor each node. These node-level representations in turn can be combined by a weighted sum, to\nform a graph-level representation invariant to the order of the nodes, in an operation referred to as an\naggregation transformation [22, \u00a73].\nEncoder The structure of MOLECULE CHEF\u2019s encoder, q(z|x), is shown in Figure 2. For the ith\n2,\u00b7\u00b7\u00b7}. It \ufb01rst computes the\ndata point the encoder has as input the multiset of reactants xi = {xi\nrepresentation of each individual reactant molecule graph using the GGNN, before summing these\nrepresentations to get a representation that is invariant to the order of the multiset. A feed forward\nnetwork is then used to parameterize the mean and variance of a Gaussian distribution over z.\nDecoder The decoder, p(x|z), (Figure 3) maps from the latent space to a multiset of reactant\nmolecules. These reactants are typically small molecules, which means we could \ufb01t a deep generative\n\n1, xi\n\n2Note how we allow molecules to be present multiple times as reactants in our reaction, although practically\n\nmany reactions only have one instance of a particular reactant.\n\n4\n\n\fFigure 2: The encoder of MOLECULE CHEF. This maps from a multiset of reactants to a distribution\nover latent space. There are three main steps: (1) the reactants molecules are embedded into a\ncontinuous space by using GGNNs [28] to form molecule embeddings; (2) the molecule embeddings\nin the multiset are summed to form one order-invariant embedding for the whole multiset; (3) this is\nthen used as input to a neural network which parameterizes a Gaussian distribution over z.\n\nFigure 3: The decoder of MOLECULE CHEF. The decoder generates the multiset of reactants in\nsequence through calls to a RNN. At each step the model picks either one reactant from the pool or to\nhalt, \ufb01nishing the sequence. The latent vector, z, is used to parameterize the initial hidden layer of\nthe RNN. Reactants that are selected are fed back into the RNN on the next step. The reactant bag\nformed is later fed through a reaction predictor to form a \ufb01nal product.\n\nmodel which produces them from scratch. However, to better mimic the process of selecting reactant\nmolecules from an easily obtainable set, we instead restrict the output of the decoder to pick the\nmolecules from a \ufb01xed set of reactant molecules, R.\nThis happens in a sequential process using a recurrent neural network (RNN), with the full process\ndescribed in Algorithm 1. The latent vector, z is used to parametrize the initial hidden layer of the\nRNN. The selected reactants are fed back in as inputs to the RNN at the next generation stage. Whilst\ntraining we randomly sample the ordering of the reactants, and use teacher forcing.\n\n3.2 Adding a predictive penalty loss to the latent space\n\nAs discussed in section 2.2 we are interested in using and evaluating our model\u2019s performance in the\nmolecular search problem, that is using the learnt latent space to \ufb01nd new molecules with desirable\nproperties. In reality we would wish to measure some complex chemical property that can only be\nmeasured experimentally. However, as a surrogate for this, following [14], we optimize instead for\nthe QED (Quantitative Estimate of Drug-likeness [4]) score of a molecule, w, as a deterministic\nmapping from molecules to this score, y (cid:55)\u2192 w, exists in RDKit [39].\nTo this end, in a similar manner to Liu et al. [30, \u00a74.3] & Jin et al. [20, \u00a73.3], we can simultaneously\ntrain a 2 hidden layer property predictor NN for use in local optimization tasks. This network tries to\npredict the QED property, w, of the \ufb01nal product y from the latent encoding of the associated bag of\nreactants. The use of this property predictor network for local optimization is described in Section\n4.2.\n\n5\n\nBrX(null)(null)(null)(null)\u00b5(null)(null)(null)(null)(null)(null)(null)(null)z(null)(null)(null)(null)reactant bag embeddinglatent spacereactant embeddingorder-invariant combination(null)(null)(null)(null)z(null)(null)(null)(null)\u00b5(null)(null)(null)(null)RNN cellHalt(null)(null)(null)(null)Halt(null)(null)(null)(null)Halt(null)(null)(null)(null)reactant probabilityreaction predictorgenerate reactant baglatent space\fAlgorithm 1 MOLECULE CHEF\u2019s Decoder\nRequire: zi (latent space sample), GGNN (for embedding molecules), RNN (recurrent neural network), R (set\nof easy-to-obtain reactant molecules), s (learnt \u201chalt\u201d embedding), A (learnt matrix that projects the size of\nthe latent space to the size of RNN\u2019s hidden space)\nh0 \u2190 Azi ; m0 \u2190 0 {Start symbol}\nfor t = 1 to Tmax do\n\nht \u2190 RNN(mt\u22121, ht\u22121) ; B \u2190 STACK([GGNN(g) for all g in R] + [s])\nlogits \u2190 htBT\nxt \u223c softmax(logits)\nif xt = HALT then\n\nelse\n\nbreak {If the logit corresponding to the halt embedding is selected then we stop early}\nmt \u2190 GGNN(xt)\n\nend if\nend for\nreturn x1, x2,\u00b7\u00b7\u00b7\n\n4 Evaluation\n\nIn this section we evaluate MOLECULE CHEF in (1) its ability to generate a diverse set of valid\nmolecules; (2) how useful its learnt latent space is when optimizing product molecules for some\nproperty; and (3) whether by training a regressor back from product molecules to the latent space,\nMOLECULE CHEF can be used as part of a setup to perform retrosynthesis.\nIn order to train our model we need a dataset of reactant bags. For this we use the USPTO dataset\n[31], processed and cleaned up by Jin et al. [19]. We \ufb01lter out reagents, molecules that form context\nunder which the reaction occurs but do not contribute atoms to the \ufb01nal products, by following the\napproach of Schwaller et al. [43, \u00a73.1].\nWe wish to use as possible reactant molecules only popular molecules that a chemist would have easy\naccess to. To this end, we \ufb01lter our training (using Jin et al. [19]\u2019s split) dataset so that each reaction\nonly contains reactants that occur at least 15 times across different reactions in the original larger\ntraining USPTO dataset. This leaves us with a dataset of 34426 unique reactant bags for training the\nMOLECULE CHEF. In total there are 4344 unique reactants. For training the baselines, we combine\nthese 4344 unique reactants and the associated products from their different combinations, to form a\ntraining set for baselines, as even though MOLECULE CHEF has not seen the products during training,\nthe reaction predictor has.\n\n4.1 Generation\n\nWe begin by analyzing our model using the metrics favored by previous work3 [20, 30, 29, 27]:\nvalidity, uniqueness and novelty. Validity is de\ufb01ned as requiring that at least one of the molecules in\nthe bag of products can be parsed by RDKit. For a bag of products to be unique we require it to have\nat least one valid molecule that the model has not generated before in any of the previously seen bags.\nFinally, for computing novelty we require that the valid molecules not be present in the same training\nset we use for the baseline generative models.\nIn addition, we compute the Fr\u00e9chet ChemNet Distance (FCD) [36] between the valid molecules\ngenerated by each method and our baseline training set. Finally in order to try to assess the quality of\nthe molecules generated we record the (train-normalized) proportion of valid molecules that pass\nthe quality \ufb01lters proposed by Brown et al. [7, \u00a73.3]; these \ufb01lters aim to remove molecules that are\n\u201cpotentially unstable, reactive, laborious to synthesize, or simply unpleasant to the eye of medicinal\nchemists\u201d.\n\n3Note that we have extended the de\ufb01nition of these metrics to a bag (multiset) of products, given that our\nmodel can output multiple molecules for each reaction. However, when sampling 20000 times from the prior of\nour model, we generate single product bags 97% of the time, so that in practice most of the time we are using\nthe same de\ufb01nition for these metrics as the previous work which always generated single molecules.\n\n6\n\n\fTable 1: Table showing the validity, uniqueness, novelty and normalized quality (all as %, higher\nbetter) of the products/or molecules generated from decoding from 20k random samples from the\nprior p(z). Quality is the proportion of valid molecules that pass the quality \ufb01lters proposed in Brown\net al. [7, \u00a73.3], normalized such that the score on the training set is 100. FCD is the Fr\u00e9chet ChemNet\nDistance [36], capturing a notion of distance between the generated valid molecules and the training\ndataset (lower better). The uniqueness and novelty \ufb01gures are also conditioned on validity. MT stands\nfor the Molecular Transformer [44].\n\nModel Name\nMOLECULE CHEF + MT\nAAE [23, 35]\nCGVAE [30]\nCVAE [14]\nGVAE [27]\nLSTM [46]\n\nValidity Uniqueness Novelty Quality\n95.30\n94.89\n44.45\n52.86\n46.87\n100.12\n\n99.05\n85.86\n100.00\n12.02\n12.91\n91.18\n\n95.95\n98.54\n93.51\n56.28\n70.06\n93.42\n\n89.11\n93.37\n95.88\n85.65\n87.88\n74.03\n\nFCD\n0.73\n1.12\n11.73\n37.65\n29.32\n0.43\n\nFor the baselines we consider the character VAE (CVAE) [14], the grammar VAE (GVAE) [27], the\nAAE (adversarial autoencoder) [23], the constrained graph VAE (CGVAE) [30], and a stacked LSTM\ngenerator with no latent space [46]. Further details about the baselines can be found in the appendix.\nThe results are shown in Table 1. As MOLECULE CHEF decodes to a bag made up from a prede\ufb01ned\nset of molecules, those reactants going into the reaction predictor are valid. The validity of the \ufb01nal\nproduct is not 100%, as the reaction predictor can make non-valid edits to these molecules, but we see\nthat in a high number of cases the products are valid too. Furthermore, what is very encouraging is\nthat the molecules generated often pass the quality \ufb01lters, giving evidence that the process of building\nmolecules up by combining stable reactant building blocks often leads to stable products.\n\n4.2 Local Optimization\n\nAs discussed in Section 3.2, when training MOLECULE CHEF we can simultaneously train a property\npredictor network, mapping from the latent space of MOLECULE CHEF to the QED score of the \ufb01nal\nproduct. In this section we look at using the gradient information obtainable from this network to do\nlocal optimization to \ufb01nd a molecule created from our reactant pool that has a high QED score.\nWe evaluate the local optimization of molecular properties by taking 250 bags of reactants, encoding\nthem into the latent space of MOLECULE CHEF, and then repeatedly moving in the latent space using\nthe gradient direction of the property predictor until we have decoded ten different reactant bags. As a\ncomparison we consider instead moving in a random walk until we have also decoded to ten different\nreaction bags. In Figure 4 we look at the distribution of the best QED score found in considering\nthese ten reactant bags, and how this compared to the QEDs started with.\nWhen looking at individual optimization runs, we see that the QEDs vary a lot between different\nproducts even if made with similar reactants. However, Figure 4 shows that overall the distribution\nof the \ufb01nal best found QED scores is improved when purposefully optimizing for this. This is\nencouraging as it gives evidence of the utility of these models for the molecular search problem.\n\n4.3 Retrosynthesis\n\nA unique feature of our approach is that we learn a decoder from latent space to a bag of reactants. This\ngives us the ability to do retrosynthesis by training a model to map from products to their associated\nreactants\u2019 representation in latent space and using this in addition to MOLECULE CHEF\u2019s decoder\nto generate a bag of reactants. This process is highlighted in Figure 5. Although retrosynthesis\nis a dif\ufb01cult task, with often multiple possible ways to create the same product and with current\nstate-of-the-art approaches built using large reaction databases and able to deal with multiple reactions\n[47], we believe that our model could open up new interesting and exciting approaches to this task.\nWe therefore train a small network based on the same graph neural network structure used for\nMOLECULE CHEF followed by four fully connected layers to regress from products to latent space.\n\n7\n\n\fFigure 4: KDE plot showing that the distribution\nof the best QEDs found through local optimization,\nusing our trained property predictor for QEDs, has\nhigher mass over higher QED scores compared to\nthe best found from a random walk. The starting\nlocations\u2019 distribution (sampled from the training\ndata) is shown in green. The \ufb01nal products, given a\nreactant bag are predicted using the MT [44].\n\nFigure 5: Having learnt a latent space which\ncan map to products through reactants, we can\nlearn a regressor back from the suggested prod-\nucts to the latent space (orange dashed \u2212\u2212 ar-\nrow shown) and couple this with MOLECULE\nCHEF\u2019s decoder to see if we can do retrosyn-\nthesis \u2013 the act of computing the reactants that\ncreate a particular product.\n\nFigure 6: An example of perform-\ning retrosynthesis prediction us-\ning a trained regressor from prod-\nucts to latent space. This reactant-\nproduct pair has not been seen in\nthe training set of MOLECULE\nCHEF.\nFurther examples are\nshown in the appendix.\n\n(a) Reachable Products\n\n(b) Unreachable Products\n\nFigure 8: Assessing the correlation between the QED scores\nfor the original product and its reconstruction (see text for\ndetails). We assess on two portions of the test set, products\nthat are made up of only reactants in MOLECULE CHEF\u2019s\nvocabulary are called \u2018Reachable Products\u2019, those that have\nat least one reactant that is absent are called \u2018Unreachable\nProducts\u2019.\n\nA few examples of the predicted reactants corresponding to products from reactions in the USPTO\ntest set, but which can be made in one step from the prede\ufb01ned possible reactants, are shown in\nFigure 6 and the appendix. We see that often this approach, although not always able to suggest\nthe correct whole reactant bag, chooses similar reactants that on reaction produce similar structures\nto the original product we were trying to synthesize. While we would not expect this approach to\nretrosynthesis to be competitive with complex planning tools, we think this provides a promising\nnew approach, which could be used to identify bags of reactants that produce molecules similar\nto a desired target molecule. In practice, it would be valuable to be pointed directly to molecules\nwith similar properties to a target molecule if they are easier to make than the target, since it is the\nproperties of the molecules, and not the actual molecules themselves, that we are after.\nWith this in mind, we assess our approach in the following way: (1) we take a product and perform\nretrosynthesis on it to produce a bag of reactants, (2) we transform this bag of reactants using the\nMolecular Transformer to produce a new reconstructed product, and then \ufb01nally (3) we plot the\nresulting reconstructed product molecule\u2019s QED score against the QED score of the initial product.\nWe evaluate on a \ufb01ltered version of Jin et al. [19]\u2019s test set split of USPTO, where we have \ufb01ltered\nout any reactions which have the exact same reactant and product multisets as a reaction present in\nthe set used to train Molecule Chef. In addition, we further split this \ufb01ltered set into two sets: (i)\n\n8\n\n0.00.20.40.60.81.0QED score0246810Starting locationsBest found with random walkBest found with local optimizationzxwyreactantsproducts\"desirableproperty\"MoleculeChefReactionPredictorLabExperiment(proxied for by chemoinformaticstoolkit function) (eg Molecular Transformer)USPTO Datasetreactantsproductspredicted reactantspredicted productONClONClClNClOONClBr0.00.20.40.60.81.0product's QED0.00.20.40.60.81.0reconstructed product's QEDR2: 0.610.00.20.40.60.81.0product's QED0.00.20.40.60.81.0reconstructed product's QEDR2: 0.26\f\u2018Reachable Products\u2019, which are reactions in the test set that contain as reactants only molecules that\nare in MOLECULE CHEF\u2019s reactant vocabulary, and (ii) \u2018Unreachable Products\u2019, which have at least\none reactant molecule that is not in the vocabulary.\nThe results are shown in Figure 8; overall we see that there is some correlation between the properties\nof products and the properties of their reconstructions. This is more prominent for the reachable\nproducts, which we believe is because our latent space is only trained on reachable product reactions\nand so is better able to model these reactions. Furthermore, some of the unreachable products may\nalso require reactants that are not available in our pool of easily available reactants, at least when\nconsidering one-step reactions. However, given that unreachable products have at least one reactant\nwhich is not in Molecule Chef\u2019s vocabulary, we think it is very encouraging that there still is some,\nalbeit smaller, correlation with the true QED. This is because it shows that our model can suggest\nmolecules with similar properties made from reactants that are available.\n\n4.4 Qualitative Quality of Samples\n\nFigure 9: Random walk in latent space. See text for details.\n\nIn Figure 9 we show molecules generated from a random walk starting from the encoding of a\nparticular molecule (shown in the left-most column). We compare the CVAE, GVAE, and MOLECULE\nCHEF (for MOLECULE CHEF we encode the reactant bag known to generate the same molecule).\nWe showed all generated molecules to a domain expert and asked them to evaluate their properties\nin terms of their stability, toxicity, oxidizing power, corrosiveness (the rationales are provided in\nmore detail in the Appendix). Many molecules produced by the CVAE and GVAE show undesirable\nfeatures, unlike the molecules generated by MOLECULE CHEF.\n\n5 Discussion\n\nIn this work we have introduced MOLECULE CHEF, a model that generates synthesizable molecules,\nby considering the products produced as a result of one-step reactions from a pool of pre-de\ufb01ned\nreactants. By constructing molecules through selecting reactants and running chemical reactions,\nwhile performing optimization in a continuous latent space, we can combine the strengths of previous\nVAE-based models and classical discrete de-novo design algorithms based on virtual reactions. As\nfuture work, we hope to explore how to extend our approach to deal with larger reactant vocabularies\nand multi-step reactions. This would allow the generation of a wider range of molecules, whilst\nmaintaining our approach\u2019s advantages of being able to suggest synthetic routes and often producing\nsemantically valid molecules.\n\nAcknowledgements\n\nThis work was supported by The Alan Turing Institute under the EPSRC grant EP/N510129/1. JB\nalso acknowledges support from an EPSRC studentship.\n\n9\n\nOOONNHONHBrSOOOSNHONHBrNOOOSNH2NONBrOOOSNHONHBrNOOOSNHONHBrNOOOONOSOC.OOOSNH2NOClNONBrOOOSNHONHBrNOHOSNHOBrNOOOONHONHBrNOOOSNH2NONNClClNOClOOOSNHONNONClClNOOHOSNHOBrNOOOONSOHONOOOOSNH2NONHNHOOOSNHONHNNH2OOOSNHONHBrNNOSOSNONHOOClSClNNNOOOOSNH2NOClClOOOSNHONHNOOONHONHBrOOONHNONHOOClSHNNNOONNClClNOClONNClClNONHOOHOSNHONHBrNSHH2NOClOHS.HOOOONNClClNOClFFFOIOOOFFFOOONSNHONHBrNOOSNONHONHClBrNNNOOOOSNH2NONHOOOOSNHONHNOOOSNHOBrNOOSSNHONHBrNMg+OONHNH2ONNClClNOClONNONClClONBrOOOSNH2NOOOSNHONHBrNOOOSNHONHBrNOOOSNHONHBrNMolecule Chef reactants Molecule Chef productsGVAE(Kusner et al., 2017)CVAE(G\u00f3mez-Bombarelli et al., 2018)starting moleculesoxidizing agentunstableunstableunstabletoxictoxicunstableunstable\fReferences\n[1] Alexander Alemi, Ben Poole, Ian Fischer, Joshua Dillon, Rif A Saurous, and Kevin Murphy.\nFixing a broken ELBO. In International Conference on Machine Learning, pages 159\u2013168,\n2018.\n\n[2] Rim Assouel, Mohamed Ahmed, Marwin H Segler, Amir Saffari, and Yoshua Bengio. De-\nfactor: Differentiable edge factorization-based probabilistic graph generation. arXiv preprint\narXiv:1811.09766, 2018.\n\n[3] Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius\nZambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan\nFaulkner, et al. Relational inductive biases, deep learning, and graph networks. arXiv preprint\narXiv:1806.01261, 2018.\n\n[4] G Richard Bickerton, Gaia V Paolini, J\u00e9r\u00e9my Besnard, Sorel Muresan, and Andrew L Hopkins.\n\nQuantifying the chemical beauty of drugs. Nature chemistry, 4(2):90, 2012.\n\n[5] Samuel R Bowman, Luke Vilnis, Oriol Vinyals, Andrew M Dai, Rafal Jozefowicz, and Samy\nBengio. Generating sentences from a continuous space. In Proceedings of The 20th SIGNLL\nConference on Computational Natural Language Learning, 2016.\n\n[6] John Bradshaw, Matt J Kusner, Brooks Paige, Marwin HS Segler, and Jos\u00e9 Miguel Hern\u00e1ndez-\nLobato. A generative model for electron paths. In International Conference on Learning\nRepresentations, 2019.\n\n[7] Nathan Brown, Marco Fiscato, Marwin H.S. Segler, and Alain C. Vaucher. Guacamol: Bench-\nmarking models for de novo molecular design. Journal of Chemical Information and Modeling,\n59(3):1096\u20131108, 2019. doi: 10.1021/acs.jcim.8b00839.\n\n[8] Florent Chevillard and Peter Kolb. Scubidoo: A large yet screenable and easily searchable\ndatabase of computationally created chemical compounds optimized toward high likelihood of\nsynthetic tractability. J. Chem. Inf. Mod., 55(9):1824\u20131835, 2015.\n\n[9] Connor W Coley, Wengong Jin, Luke Rogers, Timothy F Jamison, Tommi S Jaakkola, William H\nGreen, Regina Barzilay, and Klavs F Jensen. A graph-convolutional neural network model for\nthe prediction of chemical reactivity. Chemical Science, 10(2):370\u2013377, 2019.\n\n[10] Hanjun Dai, Yingtao Tian, Bo Dai, Steven Skiena, and Le Song. Syntax-directed variational\nautoencoder for structured data. In International Conference on Learning Representations,\n2018.\n\n[11] Nicola De Cao and Thomas Kipf. MolGAN: An implicit generative model for small molecular\ngraphs. In International Conference on Machine Learning Deep Generative Models Workshop,\n2018.\n\n[12] David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel,\nAl\u00e1n Aspuru-Guzik, and Ryan P Adams. Convolutional networks on graphs for learning\nmolecular \ufb01ngerprints. In Advances in neural information processing systems, pages 2224\u2013\n2232, 2015.\n\n[13] Jacob R Gardner, Matt J Kusner, Zhixiang Eddie Xu, Kilian Q Weinberger, and John P Cun-\nningham. Bayesian optimization with inequality constraints. In International Conference on\nMachine Learning, pages 937\u2013945, 2014.\n\n[14] Rafael G\u00f3mez-Bombarelli, Jennifer N Wei, David Duvenaud, Jos\u00e9 Miguel Hern\u00e1ndez-Lobato,\nBenjam\u00edn S\u00e1nchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D Hirzel,\nRyan P Adams, and Al\u00e1n Aspuru-Guzik. Automatic chemical design using a Data-Driven\ncontinuous representation of molecules. ACS Cent Sci, 4(2):268\u2013276, February 2018.\n\n[15] Gabriel Lima Guimaraes, Benjamin Sanchez-Lengeling, Carlos Outeiral, Pedro Luis Cunha\nFarias, and Al\u00e1n Aspuru-Guzik. Objective-reinforced generative adversarial networks (ORGAN)\nfor sequence generation models. arXiv preprint arXiv:1705.10843, 2017.\n\n10\n\n\f[16] Markus Hartenfeller and Gisbert Schneider. Enabling future drug discovery by de novo design.\n\nWiley Interdisc. Rev. Comp. Mol. Sci., 1(5):742\u2013759, 2011.\n\n[17] Qiyue Hu, Zhengwei Peng, Jaroslav Kostrowicki, and Atsuo Kuki. Leap into the P\ufb01zer global\nvirtual library (PGVL) space: creation of readily synthesizable design ideas automatically. In\nChemical Library Design, pages 253\u2013276. Springer, 2011.\n\n[18] David Janz, Jos van der Westhuizen, Brooks Paige, Matt J Kusner, and Jos\u00e9 Miguel Hern\u00e1ndez-\nLobato. Learning a generative model for validity in complex discrete structures. In International\nConference on Learning Representations, 2018.\n\n[19] Wengong Jin, Connor W Coley, Regina Barzilay, and Tommi Jaakkola. Predicting organic\nIn Advances in Neural Information\n\nreaction outcomes with Weisfeiler-Lehman network.\nProcessing Systems, 2017.\n\n[20] Wengong Jin, Regina Barzilay, and Tommi Jaakkola. Junction tree variational autoencoder for\n\nmolecular graph generation. In International Conference on Machine Learning, 2018.\n\n[21] Wengong Jin, Kevin Yang, Regina Barzilay, and Tommi Jaakkola. Learning multimodal graph-\nIn International Conference on Learning\n\nto-graph translation for molecular optimization.\nRepresentations, 2019.\n\n[22] Daniel D Johnson. Learning graphical state transitions. In International Conference on Learning\n\nRepresentations, 2017.\n\n[23] Artur Kadurin, Alexander Aliper, Andrey Kazennov, Polina Mamoshina, Quentin Vanhaelen,\nKuzma Khrabrov, and Alex Zhavoronkov. The cornucopia of meaningful leads: Applying deep\nadversarial autoencoders for new molecule development in oncology. Oncotarget, 8(7):10883,\n2017.\n\n[24] Hiroshi Kajino. Molecular hypergraph grammar with its application to molecular optimization.\n\nIn International Conference on Machine Learning, pages 3183\u20133191, 2019.\n\n[25] Matthew A Kayala, Chlo\u00e9-Agathe Azencott, Jonathan H Chen, and Pierre Baldi. Learning to\npredict chemical reactions. Journal of chemical information and modeling, 51(9):2209\u20132222,\n2011.\n\n[26] Diederik P Kingma and Max Welling. Auto-encoding variational Bayes. In International\n\nConference on Learning Representations, 2014.\n\n[27] Matt J Kusner, Brooks Paige, and Jos\u00e9 Miguel Hern\u00e1ndez-Lobato. Grammar variational\n\nautoencoder. In International Conference on Machine Learning, 2017.\n\n[28] Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. Gated graph sequence neural\n\nnetworks. International Conference on Learning Representations, 2016.\n\n[29] Yujia Li, Oriol Vinyals, Chris Dyer, Razvan Pascanu, and Peter Battaglia. Learning deep\n\ngenerative models of graphs. arXiv preprint arXiv:1803.03324, March 2018.\n\n[30] Qi Liu, Miltiadis Allamanis, Marc Brockschmidt, and Alexander L Gaunt. Constrained graph\nvariational autoencoders for molecule design. In Advances in neural information processing\nsystems, 2018.\n\n[31] Daniel Mark Lowe. Extraction of chemical structures and reactions from the literature. PhD\n\nthesis, University of Cambridge, 2012.\n\n[32] Christian Merkwirth and Thomas Lengauer. Automatic generation of complementary descriptors\nwith molecular graph networks. Journal of chemical information and modeling, 45(5):1159\u2013\n1168, 2005.\n\n[33] HL Morgan. The generation of a unique machine description for chemical structures-a technique\ndeveloped at chemical abstracts service. Journal of Chemical Documentation, 5(2):107\u2013113,\n1965.\n\n11\n\n\f[34] Christos A Nicolaou, Ian A Watson, Hong Hu, and Jibo Wang. The proximal lilly collection:\nMapping, exploring and exploiting feasible chemical space. Journal of chemical information\nand modeling, 56(7):1253\u20131266, 2016.\n\n[35] Daniil Polykovskiy, Alexander Zhebrak, Benjamin Sanchez-Lengeling, Sergey Golovanov,\nOktai Tatanov, Stanislav Belyaev, Rauf Kurbanov, Aleksey Artamonov, Vladimir Aladinskiy,\nMark Veselov, Artur Kadurin, Sergey Nikolenko, Alan Aspuru-Guzik, and Alex Zhavoronkov.\nMolecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models. arXiv\npreprint arXiv:1811.12823, 2018.\n\n[36] Kristina Preuer, Philipp Renz, Thomas Unterthiner, Sepp Hochreiter, and G\u00fcnter Klambauer.\nFr\u00e9chet chemnet distance: A metric for generative models for molecules in drug discovery.\nJournal of Chemical Information and Modeling, 58(9):1736\u20131741, 2018. doi: 10.1021/acs.jcim.\n8b00234.\n\n[37] Edward O Pyzer-Knapp, Changwon Suh, Rafael G\u00f3mez-Bombarelli, Jorge Aguilera-\nIparraguirre, and Al\u00e1n Aspuru-Guzik. What is high-throughput virtual screening? A perspective\nfrom organic materials discovery. Annual Review of Materials Research, 45:195\u2013216, 2015.\n\n[38] Matthias Rarey and Martin Stahl. Similarity searching in large combinatorial chemistry spaces.\n\nJournal of Computer-Aided Molecular Design, 15(6):497\u2013520, 2001.\n\n[39] RDKit, online. RDKit: Open-source cheminformatics. http://www.rdkit.org. [Online;\n\naccessed 01-February-2018].\n\n[40] Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation\nand approximate inference in deep generative models. In International Conference on Machine\nLearning, pages 1278\u20131286, 2014.\n\n[41] Bidisha Samanta, DE Abir, Gourhari Jana, Pratim Kumar Chattaraj, Niloy Ganguly, and\nManuel Gomez Rodriguez. NeVAE: A deep generative model for molecular graphs.\nIn\nProceedings of the AAAI Conference on Arti\ufb01cial Intelligence, volume 33, pages 1110\u20131117,\n2019.\n\n[42] Petra Schneider and Gisbert Schneider. De novo design at the edge of chaos: Miniperspective.\n\nJ. Med. Chem., 59(9):4077\u20134086, 2016.\n\n[43] Philippe Schwaller, Th\u00e9ophile Gaudin, D\u00e1vid L\u00e1nyi, Costas Bekas, and Teodoro Laino. \u201cFound\nin Translation\u201d: predicting outcomes of complex organic chemistry reactions using neural\nsequence-to-sequence models. Chem. Sci., 9:6091\u20136098, 2018. doi: 10.1039/C8SC02339E.\n\n[44] Philippe Schwaller, Teodoro Laino, Th\u00e9ophile Gaudin, Peter Bolgar, Christopher A. Hunter,\nCostas Bekas, and Alpha A. Lee. Molecular transformer: A model for uncertainty-calibrated\nchemical reaction prediction. ACS Central Science, 5(9):1572\u20131583, 2019. doi: 10.1021/\nacscentsci.9b00576.\n\n[45] Marwin HS Segler and Mark P Waller. Neural-symbolic machine learning for retrosynthesis\n\nand reaction prediction. Chemistry\u2013A European Journal, 23(25):5966\u20135971, 2017.\n\n[46] Marwin HS Segler, Thierry Kogej, Christian Tyrchan, and Mark P Waller. Generating focused\nmolecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci., 4(1):\n120\u2013131, 2017.\n\n[47] Marwin HS Segler, Mike Preuss, and Mark P Waller. Planning chemical syntheses with deep\n\nneural networks and symbolic AI. Nature, 555(7698):604, 2018.\n\n[48] Brian K Shoichet. Virtual screening of chemical libraries. Nature, 432(7019):862, 2004.\n\n[49] Martin Simonovsky and Nikos Komodakis. GraphVAE: Towards generation of small graphs\nusing variational autoencoders. In V\u02c7era K\u02daurkov\u00e1, Yannis Manolopoulos, Barbara Hammer,\nLazaros Iliadis, and Ilias Maglogiannis, editors, Arti\ufb01cial Neural Networks and Machine\nLearning \u2013 ICANN 2018, pages 412\u2013422, Cham, 2018. Springer International Publishing. ISBN\n978-3-030-01418-6.\n\n12\n\n\f[50] Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical Bayesian optimization of machine\nlearning algorithms. In Advances in neural information processing systems, pages 2951\u20132959,\n2012.\n\n[51] Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, and Bernhard Schoelkopf. Wasserstein auto-\n\nencoders. In International Conference on Learning Representations, 2018.\n\n[52] Niek van Hilten, Florent Chevillard, and Peter Kolb. Virtual compound libraries in computer-\n\nassisted drug discovery. Journal of chemical information and modeling, 2019.\n\n[53] Jennifer N Wei, David Duvenaud, and Al\u00e1n Aspuru-Guzik. Neural networks for the prediction\n\nof organic chemistry reactions. ACS central science, 2(10):725\u2013732, 2016.\n\n[54] David Weininger. SMILES, a chemical language and information system. 1. Introduction to\nmethodology and encoding rules. Journal of chemical information and computer sciences, 28\n(1):31\u201336, 1988.\n\n[55] Jiaxuan You, Bowen Liu, Rex Ying, Vijay Pande, and Jure Leskovec. Graph convolutional\npolicy network for goal-directed molecular graph generation. In Advances in Neural Information\nProcessing Systems, 2018.\n\n13\n\n\f", "award": [], "sourceid": 4354, "authors": [{"given_name": "John", "family_name": "Bradshaw", "institution": "University of Cambridge/MPI IS T\u00fcbingen"}, {"given_name": "Brooks", "family_name": "Paige", "institution": "Alan Turing Institute"}, {"given_name": "Matt", "family_name": "Kusner", "institution": "University College London"}, {"given_name": "Marwin", "family_name": "Segler", "institution": "BenevolentAI"}, {"given_name": "Jos\u00e9 Miguel", "family_name": "Hern\u00e1ndez-Lobato", "institution": "University of Cambridge"}]}