{"title": "Prediction of Protein Topologies Using Generalized IOHMMs and RNNs", "book": "Advances in Neural Information Processing Systems", "page_first": 1473, "page_last": 1480, "abstract": "", "full_text": "Prediction of Protein Topologies Using\n\nGeneralized IOHMMs and RNNs\n\nGianluca Pollastri and Pierre Baldi\n\nDepartment of Information and Computer Science\n\nUniversity of California, Irvine\n\nIrvine, CA 92697-3425\n\ngpollast,pfbaldi@ics.uci.edu\n\nAlessandro Vullo and Paolo Frasconi\n\nDipartimento di Sistemi e Informatica\n\nUniversit(cid:18)a di Firenze\n\nVia di Santa Marta 3, 50139 Firenze, ITALY\n\nvullo,paolo@dsi.uni(cid:12).it\n\nAbstract\n\nWe develop and test new machine learning methods for the predic-\ntion of topological representations of protein structures in the form\nof coarse- or (cid:12)ne-grained contact or distance maps that are transla-\ntion and rotation invariant. The methods are based on generalized\ninput-output hidden Markov models (GIOHMMs) and generalized\nrecursive neural networks (GRNNs). The methods are used to pre-\ndict topology directly in the (cid:12)ne-grained case and, in the coarse-\ngrained case, indirectly by (cid:12)rst learning how to score candidate\ngraphs and then using the scoring function to search the space of\npossible con(cid:12)gurations. Computer simulations show that the pre-\ndictors achieve state-of-the-art performance.\n\n1\n\nIntroduction: Protein Topology Prediction\n\nPredicting the 3D structure of protein chains from the linear sequence of amino\nacids is a fundamental open problem in computational molecular biology [1]. Any\napproach to the problem must deal with the basic fact that protein structures are\ntranslation and rotation invariant. To address this invariance, we have proposed a\nmachine learning approach to protein structure prediction [4] based on the predic-\ntion of topological representations of proteins, in the form of contact or distance\nmaps. The contact or distance map is a 2D representation of neighborhood rela-\ntionships consisting of an adjacency matrix at some distance cuto(cid:11) (typically in the\nrange of 6 to 12 (cid:23)A), or a matrix of pairwise Euclidean distances. Fine-grained maps\nare derived at the amino acid or even atomic level. Coarse maps are obtained by\nlooking at secondary structure elements, such as helices, and the distance between\ntheir centers of gravity or, as in the simulations below, the minimal distances be-\ntween their C(cid:11) atoms. Reasonable methods for reconstructing 3D coordinates from\ncontact/distance maps have been developed in the NMR literature and elsewhere\n\n\fO\n\ni\n\nH\n\nB\ni\n\nH\n\nF\ni\n\nI i\n\nFigure 1: Bayesian network for bidirectional IOHMMs consisting of input units,\noutput units, and both forward and backward Markov chains of hidden states.\n\n[14] using distance geometry and stochastic optimization techniques. Thus the main\nfocus here is on the more di(cid:14)cult task of contact map prediction.\n\nVarious algorithms for the prediction of contact maps have been developed, in par-\nticular using feedforward neural networks [6]. The best contact map predictor in the\nliterature and at the last CASP prediction experiment reports an average precision\n[True Positives/(True Positives + False Positives)] of 21% for distant contacts, i.e.\nwith a linear distance of 8 amino acid or more [6] for (cid:12)ne-grained amino acid maps.\nWhile this result is encouraging and well above chance level by a factor greater\nthan 6, it is still far from providing su(cid:14)cient accuracy for reliable 3D structure\nprediction. A key issue in this area is the amount of noise that can be tolerated in\na contact map prediction without compromising the 3D-reconstruction step. While\nsystematic tests in this area have not yet been published, preliminary results appear\nto indicate that recovery of as little as half of the distant contacts may su(cid:14)ce for\nproper reconstruction, at least for proteins up to 150 amino acid long (Rita Casa-\ndio and Piero Fariselli, private communication and oral presentation during CASP4\n[10]).\n\nIt is important to realize that the input to a (cid:12)ne-grained contact map predictor\nneed not be con(cid:12)ned to the sequence of amino acids only, but may also include\nevolutionary information in the form of pro(cid:12)les derived by multiple alignment of\nhomologue proteins, or structural feature information, such as secondary structure\n(alpha helices, beta strands, and coils), or solvent accessibility (surface/buried), de-\nrived by specialized predictors [12, 13]. In our approach, we use di(cid:11)erent GIOHMM\nand GRNN strategies to predict both structural features and contact maps.\n\n2 GIOHMM Architectures\n\nLoosely speaking, GIOHMMs are Bayesian networks with input, hidden, and output\nunits that can be used to process complex data structures such as sequences, images,\ntrees, chemical compounds and so forth, built on work in, for instance, [5, 3, 7, 2, 11].\nIn general, the connectivity of the graphs associated with the hidden units matches\nthe structure of the data being processed. Often multiple copies of the same hidden\ngraph, but with di(cid:11)erent edge orientations, are used in the hidden layers to allow\ndirect propagation of information in all relevant directions.\n\n\fNE\n\nNW\n\nSW\n\nSE\n\nOutput Plane\n\n4 Hidden Planes\n\nInput Plane\n\nFigure 2: 2D GIOHMM Bayesian network for processing two-dimensional objects\nsuch as contact maps, with nodes regularly arranged in one input plane, one output\nplane, and four hidden planes.\nIn each hidden plane, nodes are arranged on a\nsquare lattice, and all edges are oriented towards the corresponding cardinal corner.\nAdditional directed edges run vertically in column from the input plane to each\nhidden plane, and from each hidden plane to the output plane.\n\nTo illustrate the general idea, a (cid:12)rst example of GIOHMM is provided by the bidi-\nrectional IOHMMs (Figure 1) introduced in [2] to process sequences and predict\nprotein structural features, such as secondary structure. Unlike standard HMMs\nor IOHMMS used, for instance in speech recognition, this architecture is based on\ntwo hidden markov chains running in opposite directions to leverage the fact that\nbiological sequences are spatial objects rather than temporal sequences. Bidirec-\ntional IOHMMs have been used to derive a suite of structural feature predictors\n[12, 13, 4] available through http://promoter.ics.uci.edu/BRNN-PRED/. These\npredictors have accuracy rates in the 75-80% range on a per amino acid basis.\n\n2.1 Direct Prediction of Topology\n\nTo predict contact maps, we use a 2D generalization of the previous 1D Bayesian\nnetwork. The basic version of this architecture (Figures 2) contains 6 layers of\nunits: input, output, and four hidden layers, one for each cardinal corner. Within\neach column indexed by i and j, connections run from the input to the four hidden\nunits, and from the four hidden units to the output unit. In addition, the hidden\nunits in each hidden layer are arranged on a square or triangular lattice, with all\nthe edges oriented towards the corresponding cardinal corner. Thus the parameters\nof this two-dimensional GIOHMMs, in the square lattice case, are the conditional\nprobability distributions:\n\nP (OijIi;j ; H N E\n\n; H SE\ni;j; )\n\ni;j ; H NW\nP (H N E\ni;j jIi;j; H N E\nP (H NW\njIi;j ; H NW\ni;j\nP (H SW\njIi;j ; H SW\ni;j\nP (H SE\ni;j jIi;j; H SE\n\n; H SW\ni;j\ni;j\ni(cid:0)1;j ; H N E\ni;j(cid:0)1)\ni+1;j; H NW\ni;j(cid:0)1)\ni+1;j; H SW\ni;j+1)\ni(cid:0)1;j ; H SE\ni;j+1)\n\n8>>>><\n>>>>:\n\n(1)\n\nIn a contact map prediction at the amino acid level, for instance, the (i; j) output\nrepresents the probability of whether amino acids i and j are in contact or not.\n\n\fThis prediction depends directly on the (i; j) input and the four-hidden units in\nthe same column, associated with omni-directional contextual propagation in the\nhidden planes. In the simulations reported below, we use a more elaborated input\nconsisting of a 20 (cid:2) 20 probability matrix over amino acid pairs derived from a\nmultiple alignment of the given protein sequence and its homologues, as well as\nthe structural features of the corresponding amino acids, including their secondary\nstructure classi(cid:12)cation and their relative exposure to the solvent, derived from our\ncorresponding predictors.\n\nIt should be clear how GIOHMM ideas can be generalized to other data structures\nand problems in many ways.\nIn the case of 3D data, for instance, a standard\nGIOHMM would have an input cube, an output cube, and up to 8 cubes of hidden\nunits, one for each corner with connections inside each hidden cube oriented towards\nthe corresponding corner. In the case of data with an underlying tree structure, the\nhidden layers would correspond to copies of the same tree with di(cid:11)erent orientations\nand so forth. Thus a fundamental advantage of GIOHMMs is that they can process\na wide range of data structures of variable sizes and dimensions.\n\n2.2 Indirect Prediction of Topology\n\nAlthough GIOHMMs allow (cid:13)exible integration of contextual information over ranges\nthat often exceed what can be achieved, for instance, with (cid:12)xed-input neural net-\nworks, the models described above still su(cid:11)er from the fact that the connections\nremain local and therefore long-ranged propagation of information during learning\nremains di(cid:14)cult. Introduction of large numbers of long-ranged connections is com-\nputationally intractable but in principle not necessary since the number of contacts\nin proteins is known to grow linearly with the length of the protein, and hence\nconnectivity is inherently sparse. The di(cid:14)culty of course is that the location of the\nlong-ranged contacts is not known.\n\nTo address this problem, we have developed also a complementary GIOHMM ap-\nproach described in Figure 3 where a candidate graph structure is proposed in the\nhidden layers of the GIOHMM, with the two di(cid:11)erent orientations naturally associ-\nated with a protein sequence. Thus the hidden graphs change with each protein. In\nprinciple the output ought to be a single unit (Figure 3b) which directly computes\na global score for the candidate structure presented in the hidden layer. In order\nto cope with long-ranged dependencies, however, it is preferable to compute a set\nof local scores (Figure 3c), one for each vertex, and combine the local scores into a\nglobal score by averaging.\n\nMore speci(cid:12)cally, consider a true topology represented by the undirected contact\ngraph G(cid:3) = (V; E (cid:3)), and a candidate undirected prediction graph G = (V; E). A\nglobal measure of how well E approximates E (cid:3) is provided by the information-\nretrieval F1 score de(cid:12)ned by the normalized edge-overlap F1 = 2jE \\ E (cid:3)j=(jEj +\njE (cid:3)j) = 2P R=(P + R), where P = jE \\ E (cid:3)j=jEj is the precision (or speci(cid:12)city) and\nR = jE \\ E (cid:3)j=jE (cid:3)j is the recall (or sensitivity) measure. Obviously, 0 (cid:20) F1 (cid:20) 1\nand F1 = 1 if and only if E = E (cid:3). The scoring function F1 has the property of\nbeing monotone in the sense that if jEj = jE 0j then F1(E) < F1(E 0) if and only if\njE \\ E (cid:3)j < jE 0 \\ E (cid:3)j. Furthermore, if E 0 = E [ feg where e is an edge in E (cid:3) but\nnot in E, then F1(E 0) > F1(E). Monotonicity is important to guide the search in\nthe space of possible topologies. It is easy to check that a simple search algorithm\nbased on F1 takes on the order of O(jV j3) steps to (cid:12)nd E (cid:3), basically by trying all\npossible edges one after the other. The problem then is to learn F1, or rather a\ngood approximation to F1.\n\nTo approximate F1, we (cid:12)rst consider a similar local measure Fv by considering the\n\n\fO\n\nI(v)\n\nI(v)\n\nI(v)\n\nF\n\nH (v) H (v)\n\nB\n\nF\n\nH (v) H (v)\n\nB\n\nO(v)\n\n(a)\n\n(b)\n\n(c)\n\nIndirect prediction of contact maps.\n\nFigure 3:\n(a) target contact graph to be\npredicted. (b) GIOHMM with two hidden layers: the two hidden layers correspond\nto two copies of the same candidate graph oriented in opposite directions from one\nend of the protein to the other end. The single output O is the global score of how\nwell the candidate graph approximates the true contact map. (c) Similar to (b) but\nwith a local score O(v) at each vertex. The local scores can be averaged to produce\na global score. In (b) and (c) I(v) represents the input for vertex v, and H F (v) and\nH B(v) are the corresponding hidden variables.\n\nset Ev of edges adjacent to vertex v and Fv = 2jEv \\ E (cid:3)\nv j) with the\nglobal average (cid:22)F = Pv Fv=jV j. If n and n(cid:3) are the average degrees of G and G(cid:3),\n\nv j=(jEvj + jE (cid:3)\n\nit can be shown that:\n\nF1 =\n\n1\njV j X\n\nv\n\n2jEv \\ E (cid:3)j\n\nn + n(cid:3)\n\nand\n\n(cid:22)F =\n\n1\njV j X\n\nv\n\n2jEv \\ E (cid:3)j\n\nn + (cid:15)v + n(cid:3) + (cid:15)(cid:3)\nv\n\n(2)\n\nwhere n + (cid:15)v (resp. n(cid:3) + (cid:15)(cid:3)\nv) is the degree of v in G (resp. in G(cid:3)). In particular, if G\nand G(cid:3) are regular graphs, then F1(E) = (cid:22)F (E) so that (cid:22)F is a good approximation\nto F1. In the contact map regime where the number of contacts grows linearly with\nthe length of the sequence, we should have in general jEj (cid:25) jE (cid:3)j (cid:25) (1 + (cid:11))jV j so\nthat each node on average has n = n(cid:3) = 2(1 + (cid:11)) edges. The value of (cid:11) depends of\ncourse on the neighborhood cuto(cid:11).\n\nAs in reinforcement learning, to learn the scoring function one is faced with the\nproblem of generating good training sets in a high dimensional space, where the\nstates are the topologies (graphs), and the policies are algorithms for adding a\nsingle edge to a given graph. In the simulations we adopt several di(cid:11)erent strate-\ngies including static and dynamic generation. Within dynamic generation we use\nthree exploration strategies: random exploration (successor graph chosen at ran-\ndom), pure exploitation (successor graph maximizes the current scoring function),\nand semi-uniform exploitation to (cid:12)nd a balance between exploration and exploita-\ntion [with probability (cid:15) (resp. 1 (cid:0) (cid:15)) we choose random exploration (resp. pure\nexploitation)].\n\n\f3 GRNN Architectures\n\nInference and learning in the protein GIOHMMs we have described is computa-\ntionally intensive due to the large number of undirected loops they contain. This\nproblem can be addressed using a neural network reparameterization assuming that:\n(a) all the nodes in the graphs are associated with a deterministic vector (note that\nin the case of the output nodes this vector can represent a probability distribution\nso that the overall model remains probabilistic); (b) each vector is a deterministic\nfunction of its parents; (c) each function is parameterized using a neural network (or\nsome other class of approximators); and (d) weight-sharing or stationarity is used\nbetween similar neural networks in the model. For example, in the 2D GIOHMM\ncontact map predictor, we can use a total of 5 neural networks to recursively com-\npute the four hidden states and the output in each column in the form:\n\ni;j\n\n; H SE\ni;j )\n\nOij = NO(Iij ; H NW\n\n; H N E\nH N E\ni;j = NN E(Ii;j ; H N E\ni;j = NNW (Ii;j ; H NW\nH NW\ni;j = NSW (Ii;j ; H SW\nH SW\nH SE\ni;j = NSE(Ii;j ; H SE\n\ni;j ; H SW\ni;j\ni(cid:0)1;j; H N E\ni;j(cid:0)1)\ni+1;j; H NW\ni;j(cid:0)1)\ni+1;j; H SW\ni;j+1)\ni(cid:0)1;j; H SE\ni;j+1)\n\n8>>>><\n>>>>:\n\n(3)\n\nIn the NE plane, for instance, the boundary conditions are set to H N E\nij = 0 for i = 0\nor j = 0. The activity vector associated with the hidden unit H N E\ndepends on the\nlocal input Iij , and the activity vectors of the units H N E\ni;j(cid:0)1. Activity\nin NE plane can be propagated row by row, West to East, and from the (cid:12)rst row\nto the last (from South to North), or column by column South to North, and from\nthe (cid:12)rst column to the last. These GRNN architectures can be trained by gradient\ndescent by unfolding the structures in space, leveraging the acyclic nature of the\nunderlying GIOHMMs.\n\ni(cid:0)1;j and H N E\n\nij\n\n4 Data\n\nMany data sets are available or can be constructed for training and testing purposes,\nas described in the references. The data sets used in the present simulations are\nextracted from the publicly available Protein Data Bank (PDB) and then redun-\ndancy reduced, or from the non-homologous subset of PDB Select (ftp://ftp.embl-\nheidelberg.de/pub/databases/). In addition, we typically exclude structures with\npoor resolution (less than 2.5-3 (cid:23)A), sequences containing less than 30 amino acids,\nand structures containing multiple sequences or sequences with chain breaks. For\ncoarse contact maps, we use the DSSP program [9] (CMBI version) to assign sec-\nondary structures and we remove also sequences for which DSSP crashes. The\nresults we report for (cid:12)ne-grained contact maps are derived using 424 proteins with\nlengths in the 30-200 range for training and an additional non-homologous set of\n48 proteins in the same length range for testing. For the coarse contact map, we\nuse a set of 587 proteins of length less than 300. Because the average length of a\nsecondary structure element is slightly above 7, the size of a coarse map is roughly\n2% the size of the corresponding amino acid map.\n\n5 Simulation Results and Conclusions\n\nWe have trained several 2D GIOHMM/GRNN models on the direct prediction of\n(cid:12)ne-grained contact maps. Training of a single model typically takes on the order of\na week on a fast workstation. A sample of validation results is reported in Table 1 for\nfour di(cid:11)erent distance cuto(cid:11)s. Overall percentages of correctly predicted contacts\n\n\fTable 1: Direct prediction of amino acid contact maps. Column 1: four distance\ncuto(cid:11)s. Column 2, 3, and 4: overall percentages of amino acids correctly classi(cid:12)ed\nas contacts, non-contacts, and in total. Column 5: Precision percentage for distant\ncontacts (ji (cid:0) jj (cid:21) 8) with a threshold of 0.5. Single model results except for last\nline corresponding to an ensemble of 5 models.\n\nCuto(cid:11) Contact Non-Contact Total Precision (P)\n\n6 (cid:23)A\n8 (cid:23)A\n10 (cid:23)A\n12 (cid:23)A\n12 (cid:23)A\n\n.714\n.638\n.512\n.433\n.445\n\n.998\n.998\n.993\n.987\n.990\n\n.985\n.970\n.931\n.878\n.883\n\n.594\n.670\n.557\n.549\n.717\n\nand non-contacts at all linear distances, as well as precision results for distant\ncontacts (ji (cid:0) jj (cid:21) 8) are reported for a single GIOHMM/GRNN model. The\nmodel has k = 14 hidden units in the hidden and output layers of the four hidden\nnetworks, as well as in the hidden layer of the output network. In the last row, we\nalso report as an example the results obtained at 12(cid:23)A by an ensemble of 5 networks\nwith k = 11; 12; 13; 14 and 15. Note that precision for distant contacts exceeds all\npreviously reported results and is well above 50%.\n\nFor\nthe prediction of coarse-grained contact maps, we use the indirect\nGIOHMM/GRNN strategy and compare di(cid:11)erent exploration/exploitation strate-\ngies: random exploration, pure exploitation, and their convex combination (semi-\nuniform exploitation). In the semi-uniform case we set the probability of random\nuniform exploration to (cid:15) = 0:4. In addition, we also try a fourth hybrid strategy in\nwhich the search proceeds greedily (i.e. the best successor is chosen at each step,\nas in pure exploitation), but the network is trained by randomly sub-sampling the\nsuccessors of the current state. Eight numerical features encode the input label\nof each node: one-hot encoding of secondary structure classes; normalized linear\ndistances from the N to C terminus; average, maximum and minimum hydrophobic\ncharacter of the segment (based on the Kyte-Doolittle scale with a moving window\nof length 7). A sample of results obtained with 5-fold cross-validation is shown in\nTable 2. Hidden state vectors have dimension k = 5 with no hidden layers. For each\nstrategy we measure performances by means of several indices: micro and macro-\naveraged precision (mP , M P ), recall (mR, M R) and F1 measure (mF1, M F1).\nMicro-averages are derived based on each pair of secondary structure elements in\neach protein, whereas macro-averages are obtained on a per-protein basis, by (cid:12)rst\ncomputing precision and recall for each protein, and then averaging over the set of\nall proteins. In addition, we also measure the micro and macro averages for speci-\n(cid:12)city in the sense of percentage of correct prediction for non-contacts (mP (nc),\nM P (nc)). Note the tradeo(cid:11)s between precision and recall across the training meth-\nods, the hybrid method achieving the best F 1 results.\n\nTable 2: Indirect prediction of coarse contact maps with dynamic sampling.\n\nStrategy\nRandom exploration\nSemi-uniform\nPure exploitation\nHybrid\n\nmP mP (nc) mR mF1 M P M P (nc) M R M F1\n.574\n.715\n.454\n.588\n.596\n.431\n.417\n.607\n\n.769\n.787\n.806\n.834\n\n.469\n.702\n.787\n.843\n\n.709\n.767\n.793\n.821\n\n.418\n.631\n.726\n.790\n\n.518\n.526\n.539\n.546\n\n.767\n.507\n.481\n.474\n\n\fWe have presented two approaches, based on a very general IOHMM/RNN frame-\nwork, that achieve state-of-the-art performance in the prediction of proteins contact\nmaps at (cid:12)ne and coarse-grained levels of resolution. In principle both methods can\nbe applied to both resolution levels, although the indirect prediction is computation-\nally too demanding for (cid:12)ne-grained prediction of large proteins. Several extensions\nare currently under development, including the integration of these methods into\ncomplete 3D structure predictors. While these systems require long training peri-\nods, once trained they can rapidly sift through large proteomic data sets.\n\nAcknowledgments\n\nThe work of PB and GP is supported by a Laurel Wilkening Faculty Innovation\naward and awards from NIH, BREP, Sun Microsystems, and the California Institute\nfor Telecommunications and Information Technology. The work of PF and AV is\npartially supported by a MURST grant.\n\nReferences\n\n[1] D. Baker and A. Sali. Protein structure prediction and structural genomics. Science,\n\n294:93{96, 2001.\n\n[2] P. Baldi and S. Brunak and P. Frasconi and G. Soda and G. Pollastri. Exploiting\nthe past and the future in protein secondary structure prediction. Bioinformatics,\n15(11):937{946, 1999.\n\n[3] P. Baldi and Y. Chauvin. Hybrid modeling, HMM/NN architectures, and protein\n\napplications. Neural Computation, 8(7):1541{1565, 1996.\n\n[4] P. Baldi and G. Pollastri. Machine learning structural and functional proteomics.\nIEEE Intelligent Systems. Special Issue on Intelligent Systems in Biology, 17(2), 2002.\n\n[5] Y. Bengio and P. Frasconi.\n\nInput-output HMM\u2019s for sequence processing.\n\nIEEE\n\nTrans. on Neural Networks, 7:1231{1249, 1996.\n\n[6] P. Fariselli, O. Olmea, A. Valencia, and R. Casadio. Prediction of contact maps with\n\nneural networks and correlated mutations. Protein Engineering, 14:835{843, 2001.\n\n[7] P. Frasconi, M. Gori, and A. Sperduti. A general framework for adaptive processing\n\nof data structures. IEEE Trans. on Neural Networks, 9:768{786, 1998.\n\n[8] Z. Ghahramani and M. I. Jordan. Factorial hidden Markov models Machine Learning,\n\n29:245{273, 1997.\n\n[9] W. Kabsch and C. Sander. Dictionary of protein secondary structure: pattern recog-\nnition of hydrogen-bonded and geometrical features. Biopolymers, 22:2577{2637,\n1983.\n\n[10] A. M. Lesk, L. Lo Conte, and T. J. P. Hubbard. Assessment of novel fold targets\nin CASP4: predictions of three-dimensional structures, secondary structures, and\ninterresidue contacts. Proteins, 45, S5:98{118, 2001.\n\n[11] G. Pollastri and P. Baldi. Predition of contact maps by GIOHMMs and recurrent\nneural networks using lateral propagation from all four cardinal corners. Proceedings\nof 2002 ISMB (Intelligent Systems for Molecular Biology) Conference. Bioinformatics,\n18, S1:62{70, 2002.\n\n[12] G. Pollastri, D. Przybylski, B. Rost, and P. Baldi. Improving the prediction of protein\nsecondary structure in three and eight classes using recurrent neural networks and\npro(cid:12)les. Proteins, 47:228{235, 2002.\n\n[13] G. Pollastri, P. Baldi, P. Fariselli, and R. Casadio. Prediction of coordination number\n\nand relative solvent accessibility in proteins. Proteins, 47:142{153, 2002.\n\n[14] M. Vendruscolo, E. Kussell, and E. Domany. Recovery of protein structure from\n\ncontact maps. Folding and Design, 2:295{306, 1997.\n\n\f", "award": [], "sourceid": 2302, "authors": [{"given_name": "Gianluca", "family_name": "Pollastri", "institution": null}, {"given_name": "Pierre", "family_name": "Baldi", "institution": null}, {"given_name": "Alessandro", "family_name": "Vullo", "institution": null}, {"given_name": "Paolo", "family_name": "Frasconi", "institution": null}]}