{"title": "Expressive power of tensor-network factorizations for probabilistic modeling", "book": "Advances in Neural Information Processing Systems", "page_first": 1498, "page_last": 1510, "abstract": "Tensor-network techniques have recently proven useful in machine learning, both as a tool for the formulation of new learning algorithms and for enhancing the mathematical understanding of existing methods. Inspired by these developments, and the natural correspondence between tensor networks and probabilistic graphical models, we provide a rigorous analysis of the expressive power of various tensor-network factorizations of discrete multivariate probability distributions. These factorizations include non-negative tensor-trains/MPS, which are in correspondence with hidden Markov models, and Born machines, which are naturally related to the probabilistic interpretation of quantum circuits. When used to model probability distributions, they exhibit tractable likelihoods and admit efficient learning algorithms. Interestingly, we prove that there exist probability distributions for which there are unbounded separations between the resource requirements of some of these tensor-network factorizations. Of particular interest, using complex instead of real tensors can lead to an arbitrarily large reduction in the number of parameters of the network. Additionally, we introduce locally purified states (LPS), a new factorization inspired by techniques for the simulation of quantum systems, with provably better expressive power than all other representations considered. The ramifications of this result are explored through numerical experiments.", "full_text": "Expressive power of tensor-network\n\nfactorizations for probabilistic modeling\n\n1Max-Planck-Institut f\u00fcr Quantenoptik, D-85748 Garching\n\n2Munich Center for Quantum Science and Technology (MCQST), D-80799 M\u00fcnchen\n\nIvan Glasser1,2\u2217, Ryan Sweke3, Nicola Pancotti1,2, Jens Eisert3,4, J. Ignacio Cirac1,2\n\n3Dahlem Center for Complex Quantum Systems, Freie Universit\u00e4t Berlin, D-14195 Berlin\n\n4Department of Mathematics and Computer Science, Freie Universit\u00e4t Berlin, D-14195 Berlin\n\nAbstract\n\nTensor-network techniques have recently proven useful in machine learning, both\nas a tool for the formulation of new learning algorithms and for enhancing the\nmathematical understanding of existing methods. Inspired by these developments,\nand the natural correspondence between tensor networks and probabilistic graphical\nmodels, we provide a rigorous analysis of the expressive power of various tensor-\nnetwork factorizations of discrete multivariate probability distributions. These\nfactorizations include non-negative tensor-trains/MPS, which are in correspon-\ndence with hidden Markov models, and Born machines, which are naturally related\nto the probabilistic interpretation of quantum circuits. When used to model proba-\nbility distributions, they exhibit tractable likelihoods and admit ef\ufb01cient learning\nalgorithms. Interestingly, we prove that there exist probability distributions for\nwhich there are unbounded separations between the resource requirements of some\nof these tensor-network factorizations. Of particular interest, using complex instead\nof real tensors can lead to an arbitrarily large reduction in the number of parameters\nof the network. Additionally, we introduce locally puri\ufb01ed states (LPS), a new\nfactorization inspired by techniques for the simulation of quantum systems, with\nprovably better expressive power than all other representations considered. The\nrami\ufb01cations of this result are explored through numerical experiments.\n\n1\n\nIntroduction\n\nMany problems in diverse areas of computer science and physics involve constructing ef\ufb01cient\nrepresentations of high-dimensional functions. Neural networks are a particular example of such\nrepresentations that have enjoyed great empirical success, and much effort has been dedicated to\nunderstanding their expressive power - i.e. the set of functions that they can ef\ufb01ciently represent.\nAnalogously, tensor networks are a class of powerful representations of high-dimensional arrays\n(tensors), for which a variety of algorithms and methods have been developed. Examples of such\ntensor networks are tensor trains/matrix product states (MPS) [1, 2] or the hierarchical Tucker\ndecomposition [3, 4], which have found application in data compression [5\u20137], the simulation of\nphysical systems [8\u201310] and the design of machine learning algorithms [11\u201316]. In addition to\ntheir use in numerical algorithms, tensor networks enjoy a rich analytical understanding which has\nfacilitated their use as a tool for obtaining rigorous results on the expressive power of deep learning\nmodels [17\u201322], and fundamental insights into the structure of quantum mechanical systems [23].\nIn the context of probabilistic modeling, tensor networks have been shown to be in natural corre-\nspondence with probabilistic graphical models [24\u201329], as well as with Sum-Product Networks and\n\n\u2217Corresponding author, ivan.glasser@mpq.mpg.de\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fArithmetic Circuits [17, 30, 31]. Motivated by this correspondence, and with the goal of enhancing\nthe toolbox for deriving analytical results on the properties of machine learning algorithms, we\nstudy the expressive power of various tensor-network models of discrete multivariate probability\ndistributions. The models we consider, de\ufb01ned in Section 2, fall into two main categories:\n\nof non-negative tensors [32], as in a probabilistic graphical model [33].\n\n\u2022 Non-negative tensor networks, which decompose a probability mass function as a network\n\u2022 Born machines (BM), which model a probability mass function as the absolute value\nsquared of a real or complex function, which is itself represented as a network of real or\ncomplex tensors. While Born machines have been previously employed for probabilistic\nmodeling [34\u201340], they have additional potential applications in the context of quantum\nmachine learning [41\u201344], since they arise naturally from the probabilistic interpretation of\nquantum mechanics.\n\nThese models are considered precisely because they represent non-negative tensors by construction.\nIn this work we focus on tensor networks which are based on tensor-trains/MPS and generaliza-\ntions thereof, motivated by the fact that these have tractable likelihood, and thus ef\ufb01cient learning\nalgorithms, while lending themselves to a rigorous theoretical analysis. In this setting non-negative\ntensor networks encompass hidden Markov models (HMM), while Born machines include models\nthat arise from local quantum circuits of \ufb01xed depth. Our results also apply to tensor networks with\na tree structure, and as such can be seen as a more general comparison of the difference between\nnon-negative tensor networks and Born machines.\nThe main result of this work is a characterization of the expressive power of these tensor networks.\nInterestingly, we prove that there exist families of probability distributions for which there are\nunbounded separations between the resource requirements of some of these tensor-network factoriza-\ntions. This allows us to show that neither HMM nor Born machines should be preferred to each other\nin general. Moreover, we prove that using complex instead of real tensors can sometimes lead to an\narbitrarily large reduction in the number of parameters of the network.\nFurthermore, we introduce a new tensor-network model of discrete multivariate probability distri-\nbutions with provably better expressive power than the previously introduced models. This tensor\nnetwork, which retains an ef\ufb01cient learning algorithm, is referred to as a locally puri\ufb01ed state (LPS)\ndue to its origin in the classical simulation of quantum systems [45\u201348]. We demonstrate through\nnumerical experiments on both random probability distributions as well as realistic data sets that our\ntheoretical \ufb01ndings are relevant in practice - i.e. that LPS should be considered over HMM and Born\nmachines for probabilistic modeling.\nThis paper is structured as follows: The models we consider are introduced in Section 2. Their\nrelation with HMM and quantum circuits is made explicit in Section 3. The main results on expressive\npower are presented in Section 4. Section 5 then introduces learning algorithms for these tensor\nnetworks, and the results of numerical experiments are provided in Section 6.\n\n2 Tensor-network models of probability distributions\n\nConsider a multivariate probability mass function P (X1, . . . , XN ) over N discrete random variables\n{Xi} taking values in {1, . . . , d}. This probability mass function is naturally represented as a multi-\ndimensional array, or tensor, with N indices, each of which can take d values. As such, we use the\nnotation P to refer simultaneously to both the probability mass function and the equivalent tensor\nrepresentation. More speci\ufb01cally, for each con\ufb01guration X1, . . . , XN the tensor element PX1,...,XN\nstores the probability P (X1, . . . , XN ). Note that as P is a representation of a probability mass\nfunction, it is a tensor with non-negative entries summing to one.\nHere we are interested in the case where N is large. Since the number of elements of this tensor scales\nexponentially with N, it is quickly impossible to store. In cases where there is some structure to the\nvariables, one may use a compact representation of P which exploits this structure, such as Bayesian\nnetworks or Markov random \ufb01elds de\ufb01ned on a graph. In the following we consider models, known\nas tensor networks, in which a tensor T is factorized into the contraction of multiple smaller tensors.\nTX1,...,XN\nis a normalization factor. For all tensor networks considered in this work, this normalization factor\ncan be evaluated ef\ufb01ciently, as explained in Section 5.\n\nAs long as T is non-negative, one can model P as P = T /ZT , where ZT =(cid:80)\n\nX1,...,XN\n\n2\n\n\fIn particular, we de\ufb01ne the following tensor networks, in both algebraic and graphical notation. In\nthe diagrams each box represents a tensor and lines emanating from these boxes represent tensor\nindices. Connecting two lines implies a contraction, which is a summation over the connected index.\n\n1. Tensor-train/matrix product state (MPSF): A tensor T , with N d-dimensional indices,\n\nadmits an MPSF representation of TT-rankF r when the entries of T can be written as\n\nTX1,...,XN\n\n=\n\nA\u03b11\n\n1,X1\n\nA\u03b11,\u03b12\n2,X2\n\n\u00b7\u00b7\u00b7 A\u03b1N\u22122,\u03b1N\u22121\nN\u22121,XN\u22121\n\nA\u03b1N\u22121\nN,XN\n\n,\n\n(1)\n\nX1\n\nXN\n\nX1\n\nXN\n\nT\n\n(2)\nwhere A1 and AN are d \u00d7 r matrices, and Ai are order-3 tensors of dimension d \u00d7 r \u00d7 r,\nwith elements in F \u2208 {R\u22650, R, C}. The indices \u03b1i of these constituent tensors run from 1\nto r and are contracted (summed over) to construct T .\n\nAN\n\nA1\n\n=\n\n,\n\n\u03b11\n\n\u03b12\n\n\u03b1N\u22121\n\n2. Born machine (BMF): A tensor T , with N d-dimensional indices, admits a BMF represen-\n\ntation of Born-rankF r when the entries of T can be written as\n\nr(cid:88)\n\n{\u03b1i=1}\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\n\nr(cid:88)\n\n{\u03b1i=1}\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)2\n\nTX1,...,XN\n\n=\n\nX1\n\nXN\n\nT\n\n=\n\nA\u03b11\n\n1,X1\n\nA\u03b11,\u03b12\n2,X2\n\n\u00b7\u00b7\u00b7 A\u03b1N\u22122,\u03b1N\u22121\nN\u22121,XN\u22121\n\nA\u03b1N\u22121\nN,XN\n\n,\n\n(3)\n\nX1\n\nA1\n\nA1\n\n\u03b11\n\n\u03b1(cid:48)\n\n1\n\n\u03b12\n\n\u03b1(cid:48)\n\n2\n\n\u03b1N\u22121\n\nXN\n\nAN\n\n\u03b1(cid:48)\nN\u22121\n\nAN\n\n,\n\n(4)\n\nwith elements of the constituent tensors Ai in F \u2208 {R, C}, i.e., when T admits a representa-\ntion as the absolute-value squared (element-wise) of an MPSF of TT-rankF r.\n\nX1\n\nXN\n\n3. Locally puri\ufb01ed state (LPSF): A tensor T , with N d-dimensional indices, admits an LPSF\nrepresentation of puri-rankF r and puri\ufb01cation dimension \u00b5 when the entries of T can be\nwritten as\n\nr(cid:88)\n\n\u00b5(cid:88)\n\n{\u03b1i,\u03b1(cid:48)\n\ni=1}\n\n{\u03b2i=1}\n\nTX1,...,XN =\n\nA\u03b21,\u03b11\n1,X1\n\nA\u03b21,\u03b1(cid:48)\n\n1\n1,X1\n\nA\u03b22,\u03b11,\u03b12\n\n2,X2\n\nA\u03b22,\u03b1(cid:48)\n\n2,X2\n\n1,\u03b1(cid:48)\n\n2\n\n\u00b7\u00b7\u00b7 A\u03b2N ,\u03b1N\u22121\n\nN,XN\n\nA\n\n\u03b2N ,\u03b1(cid:48)\nN,XN\n\nN\u22121\n\n,\n\n(5)\n\nX1\n\nXN\n\nX1\n\nA1\n\n\u03b11\n\n\u03b12\n\n\u03b1N\u22121\n\nXN\n\nAN\n\nT\n\n=\n\n\u03b21\n\n\u03b22\n\n\u03b2N ,\n\n(6)\n\nA1\n\n\u03b1(cid:48)\n\n1\n\n\u03b1(cid:48)\n\n2\n\n\u03b1(cid:48)\nN\u22121\n\nAN\n\nwhere A1 and AN are order-3 tensors of dimension d \u00d7 \u00b5 \u00d7 r and Ai are order-4 tensors\nof dimension d \u00d7 \u00b5 \u00d7 r \u00d7 r. The indices \u03b1i run from 1 to r, the indices \u03b2i run from 1 to\n\u00b5, and both are contracted to construct T . Without loss of generality we can consider only\n\u00b5 \u2264 rd2.\n\nX1\n\nXN\n\nNote that all the representations de\ufb01ned above yield non-negative tensors by construction, except for\nMPSR/C. In this work, we consider only the subset of MPSR/C which represent non-negative tensors.\nGiven a non-negative tensor T we de\ufb01ne the TT-rankF (Born-rankF) of T as the minimal r such that\nT admits an MPSF (BMF) representation of TT-rankF (Born-rankF) r. We de\ufb01ne the puri-rankF of T\n\n3\n\n\fas the minimal r such that T admits an LPSF representation of puri-rankF r, for some puri\ufb01cation\ndimension \u00b5. We note that if we consider tensors T with 2 d-dimensional indices (i.e., matrices) then\nthe TT-rankR\u22650 is the non-negative rank, i.e., the smallest k such that T can be written as T = AB\nwith A being d\u00d7 k and B being k \u00d7 d matrices with real non-negative entries. The TT-rankR/C is the\nconventional matrix rank, the Born-rankR (Born-rankC) is the real (complex) Hadamard square-root\nrank, i.e., the minimal rank of a real (complex) entry-wise square root of T , and \ufb01nally the puri-rankR\n(puri-rankC) is the real (complex) positive semide\ufb01nite rank [49].\n\nTable 1: Summary of notations for the different tensor-network representations and their ranks.\n\nTensor representation\nTensor rank\nMatrix rank [49]\n\nMPSR\u22650\nTT-rankR\u22650\n\nrank+\n\nMPSR/C\nTT-rankR/C Born-rankR/C\n\nBMR/C\nrankR/C\u221a\n\nrank\n\nLPSR/C\n\npuri-rankR/C\nrankR/C,psd\n\nFor a given rank and a given tensor network, there is a set of non-negative tensors that can be exactly\nrepresented, and as the rank is increased, this set grows. In the limit of arbitrarily large rank, all tensor\nnetworks we consider can represent any non-negative tensor. This work is concerned with the relative\nexpressive power of these different tensor-network representations, i.e. how do these representable\nsets compare for different tensor networks. This will be characterized in Section 4 in terms of the\ndifferent ranks needed by different tensor networks to represent a non-negative tensor.\n\n3 Relationship to hidden Markov models and quantum circuits\n\nIn order to provide context for the factorizations introduced in Section 2, we show here how they are\nrelated to other representations of probability distributions based on probabilistic graphical models\nand quantum circuits. In particular, we show that there is a mapping between hidden Markov models\nwith constant number of hidden units per variable and MPSR\u22650 with constant TT-rankR\u22650, as well as\nbetween local quantum circuits of \ufb01xed depth and Born machines of constant Born-rankC. These\nrelations imply that results on the expressive power of the former directly provide results on the\nexpressive power of the latter.\n\n3.1 Hidden Markov models are non-negative matrix product states\nConsider a hidden Markov model (HMM) with observed variables {Xi} taking values in {1, . . . , d}\nand hidden variables {Hi} taking values in {1, . . . , r} (Fig. 1). The probability of the observed\nvariables may be expressed as\n\n(cid:88)\n\nN(cid:89)\n\nP (X1, . . . , XN ) =\n\nP (X1|H1)\n\nP (Hi|Hi\u22121)P (Xi|Hi).\n\n(7)\n\nH1,...,HN\n\ni=2\n\nNotice that P (Hi|Hi\u22121) and P (Xi|Hi) are matrices with non-negative elements, as depicted in the\n1,l = P (Xi = l|H1 = j), and\nfactor graph in the central diagram of Fig. 1. Now de\ufb01ne the tensors Aj\ni,l = P (Hi = k|Hi\u22121 = j)P (Xi = l|Hi = k). Then the MPS with TT-rankR\u22650 = r de\ufb01ned with\nAjk\ntensors Ai de\ufb01nes the same probability distribution on the observed variables as the HMM.\n\nFigure 1: Mapping between a HMM and a non-negative MPS.\n\nConversely, given an MPSR\u22650 with TT-rankR\u22650 = r, there exists an HMM, with hidden variables\nof dimension r(cid:48) \u2264 min(dr, r2), de\ufb01ning the same probability mass function, as shown in the\nsupplementary material. We note also that by using a different graph for the HMM, it is possible to\nconstruct an equivalent HMM with hidden variables of dimension r [28, 29]. As such, any results on\nexpressivity derived for MPSR\u22650 hold also for HMM.\n\n4\n\n\f3.2 Quantum circuits are Born machines or locally puri\ufb01ed states\n\nAn introductory presentation of the details of the connection between quantum circuits and Born\nmachines is contained in the supplementary material. There, we show that local quantum circuits\nof \ufb01xed depth D allow sampling from the probability mass function of N discrete d-dimensional\nrandom variables {Xi} which is given by the modulus squared of the amplitudes de\ufb01ned by the\nquantum circuit. For local quantum circuits of \ufb01xed depth D, these can be written as an MPS of\nTT-rankC = dD+1. Therefore quantum circuits of \ufb01xed depth are in correspondence with Born\nmachines of constant Born-rankC, and any results on the expressive power of Born machines hold\nalso for local quantum circuits, when considered as probabilistic models.\nFurthermore, quantum circuits that include alternating ancillary (or \u201chidden\u201d) and visible variables\nallow to sample from a probability distribution that can be expressed as a LPS. As such, this\ncorrespondence implies that any results on the expressive power of LPS hold also for local quantum\ncircuits with alternating visible and hidden variables.\n\n4 Expressive power of tensor-network representations\n\nIn this section we present various relationships between the expressive power of all representations,\nwhich constitute the primary results of this work. The proofs of the propositions in this section can\nbe found in the supplementary material.\n\nFigure 2: Representation of the sets of non-negative tensors that admit a given tensor-network\nfactorization. In this \ufb01gure we \ufb01x the different ranks of the different tensor networks to be equal.\n\nFor a given rank, there is a set of non-negative tensors that can be exactly represented by a given\ntensor network. These sets are represented in Fig. 2 for the case in which the ranks of the tensor\nnetworks are equal. When one set is included in another, it means that for every non-negative tensor,\nthe rank of one of the tensor-network factorizations is always greater than or equal to the rank of the\nother factorization. The inclusion relationships between these sets can therefore be characterized in\nterms of inequalities between the ranks, as detailed in Proposition 1.\nProposition 1. For all non-negative tensors TT-rankR\u22650 \u2265 TT-rankR, Born-rankR \u2265 Born-rankC,\nBorn-rankR \u2265 puri-rankR, Born-rankC \u2265 puri-rankC, puri-rankR \u2265 puri-rankC, TT-rankR\u22650 \u2265\npuri-rankR, TT-rankR = TT-rankC.\n\nNext, as detailed in Proposition 2, and summarized in Table 2, we continue by showing that all the\ninequalities of Proposition 1 can in fact be strict, and that for all other pairs of representations there\nexist probability distributions showing that neither rank can always be lower than the other. This\nshows that neither of the two corresponding sets of tensors can be included in the other. The main\nnew result is the introduction of a matrix with non-negative rank strictly smaller than its complex\nHadamard square-root rank, i.e. TT-rankR\u22650 < Born-rankC.\nProposition 2. The ranks of all introduced tensor-network representations satisfy the properties\ncontained in Table 2. Speci\ufb01cally, denoting by rrow (rcolumn) the rank appearing in the row (column),\n< indicates that there exists a tensor satisfying rrow < rcolumn and <, > indicates that there exists\nboth a tensor satisfying rrow < rcolumn and another tensor satisfying rcolumn > rrow.\n\nWe now answer the question: By how much do we need to increase the rank of a tensor network such\nthat the set of tensors it can represent includes the set of tensors that can be represented by a different\ntensor network of a different rank? More speci\ufb01cally, consider a tensor that has rank r according to\none representation and rank r(cid:48) according to another. Can we bound the rank r as a function of the\nrank r(cid:48) only? The results of Proposition 3, presented via Table 3, indicate that in many cases there is\n\n5\n\n\fTable 2: Results of Proposition 2\n\nTT-rankR\n\nTT-rankR\u22650 Born-rankR Born-rankC\n\npuri-rankR\n\npuri-rankC\n\nTT-rankR\nTT-rankR\u22650\nBorn-rankR\nBorn-rankC\npuri-rankR\npuri-rankC\n\n=\n>\n\n<, >\n<, >\n<, >\n<, >\n\n<\n=\n\n<, >\n<, >\n\n<\n<\n\n<, >\n<, >\n\n=\n<\n<\n<\n\n<, >\n<, >\n\n>\n=\n\n<, >\n\n<\n\n<, >\n\n>\n>\n\n<, >\n\n=\n<\n\n<, >\n\n>\n>\n>\n>\n=\n\nno such function - i.e. there exists a family of non-negative tensors, describing a family of probability\ndistributions over N binary variables, with the property that as N goes to in\ufb01nity r(cid:48) remains constant,\nwhile r also goes to in\ufb01nity.\nProposition 3. The ranks of all introduced tensor-network representations satisfy the relationships\nwithout asterisk contained in Table 3. A function g(x) denotes that for all non-negative tensors\nrrow \u2264 g(rcolumn). \u201cNo\u201d indicates that there exists a family of probability distributions of increasing\nN with d = 2 and rcolumn constant, but such that rrow goes to in\ufb01nity, i.e. that no such function can\nexist.\n\nTable 3: Results of Proposition 3.\n\nTT-rankR\u22650 Born-rankR Born-rankC\n\nTT-rankR\n\nTT-rankR\nTT-rankR\u22650\nBorn-rankR\nBorn-rankC\npuri-rankR\npuri-rankC\n\n=\nNo\nNo\nNo\nNo\nNo\n\n\u2264 x\n=\nNo\nNo\u2217\n\u2264 x\n\u2264 x\n\n\u2264 x2\nNo\n=\n\u2264 x\n\u2264 x\n\u2264 x\n\n\u2264 x2\nNo\nNo\n=\n\u2264 2x\n\u2264 x\n\npuri-rankR\n\n\u2264 x2\nNo\nNo\nNo\u2217\n=\n\u2264 x\n\npuri-rankC\n\n\u2264 x2\nNo\nNo\nNo\u2217\n\u2264 2x\n=\n\nWe conjecture that the relationships with an asterisk in Table 3 also hold. The existence of a family\nof matrices with constant non-negative rank but unbounded complex Hadamard square-root rank,\ntogether with the techniques introduced in the supplementary material, would provide a proof of these\nconjectured results. Proposition 3 indicates the existence of various families of non-negative tensors\nfor which the rank of one representation remains constant, while the rank of another representation\ngrows with the number of binary variables, however, the rate of this growth is not given. The\nfollowing propositions provide details of the asymptotic growth of these ranks.\nProposition 4 ([46]). There exists a family of non-negative tensors over 2N binary variables and\nconstant TT-rankR=3 that have puri-rankC = \u2126(N ), and hence also puri-rankC, Born-rankR/C and\nTT-rankR\u22650 \u2265 \u2126(N ).\nProposition 5. There exists a family of non-negative tensors over 2N binary variables and constant\nTT-rankR\u22650=2 (and hence also puri-rankR/C = 2) that have Born-rankR \u2265 \u03c0(2N +1), where \u03c0(x) is\nthe number of prime numbers up to x, which asymptotically satis\ufb01es \u03c0(x) \u223c x/ log(x).\nProposition 6. There exists a family of non-negative tensors over 2N binary variables and constant\nBorn-rankR=2 (and hence also constant Born-rankC and puri-rankR/C) that have TT-rankR\u22650 \u2265 N.\nProposition 7. There exists a family of non-negative tensors over 2N binary variables and constant\nBorn-rankC=2 that have Born-rankR \u2265 N.\nSome comments and observations which may aid in facilitating an intuitive understanding of these\nresults are as follows: Cancellations between negative contributions allow an MPSR to represent a\nnon-negative tensor while having lower rank than an MPSR\u22650 (this separation can also be derived\nfrom the separation between Arithmetic Circuits and Monotone Arithmetic Circuits [50]). The\nseparations between MPSR\u22650 and BMR/C are due to the difference of rank between probability\ndistributions and their real or complex square roots. Finally, the difference between real and complex\nBM is due to the way in which real and imaginary elements are combined through the modulus\nsquared, and this is illustrated well by the fact that real LPS of puri\ufb01cation dimension 2 include\ncomplex BM.\n\n6\n\n\fAs the techniques via which the results of Proposition 3 have been obtained are of interest, we provide\na sketch of the proof for all \u201cNo\u201d entries here . Assume that for a given pair of representations there\nexists a family of non-negative matrices with the property that the rank rcolumn of one representation\nremains constant as a function of matrix dimension, while the rank rrow of the other representation\ngrows. Now, consider such a matrix M of dimension 2N \u00d7 2N . The \ufb01rst step is to show that M can\nbe unfolded into a tensor network of constant rank rcolumn, for 2N binary variables, such that M is a\nreshaping of the central bipartition of this tensor as\n\n2N 2N\n\n2N 2N\n\nN\n\nN\n\n.\n\n=\n\n=\n\nM =\n\n(8)\nIf the rank rrow of matrix M is large, the rank rrow of the corresponding tensor-network representation\nof the unfolded tensor will also be large. While above unfolding requires a particular matrix\ndimension, it is in fact possible to write any N \u00d7 N matrix M as a submatrix of a 2N \u00d7 2N matrix, to\nwhich the above unfolding strategy can then be used as a tool for leveraging matrix rank separations\n[51, 52, 49, 53] into tensor rank separations [54].\nFinally, in order to discuss the signi\ufb01cance of these results, note \ufb01rstly that the TT-rankR can be\narbitrarily smaller than all other ranks, however, optimizing a real MPS to represent a probability\ndistribution presents a problem since it is not clear how to impose positivity of the contracted tensor\nnetwork [25, 48]. All other separations are relevant in practice since, as discussed in the following\nsection, they apply to tensor networks that can be trained to represent probability distributions over\nmany variables. Taken together, these results then show that LPS should be preferred over MPSR\u22650\nor BM, since the puri-ranks will always be lower bounded compared to the other ranks. Additionally,\ncomplex BM should also be preferred to real BM as they can lead to an arbitrarily large reduction\nin the number of parameters of the tensor network. Note that because of the structure of the tensor\nnetworks we consider, these results also apply to more general tensor factorizations relying on a\ntree structure of the tensor network. How these results are affected if one considers approximate as\nopposed to exact representations remains an interesting open problem.\n\n5 Learning algorithms\n\nWhile the primary results of this work concern the expressive power of different tensor-network\nrepresentations of probability distributions, these results are relevant in practice since MPSR\u22650,\nBMR/C and LPSR/C admit ef\ufb01cient learning algorithms, as shown in this section.\nN )} from a discrete multivariate distribution, they can be\nFirst, given samples {xi = (X i\ntrained to approximate this distribution through maximum likelihood estimation. Speci\ufb01cally, this\ncan be done by minimizing the negative log-likelihood,\n\n1, . . . , X i\n\n, with derivatives \u2202wL = \u2212(cid:88)\n\nL = \u2212(cid:88)\n\nlog\n\nTxi\nZT\n\ni\n\n\u2202wTxi\n\nTxi\n\n\u2212 \u2202wZT\nZT\n\n,\n\n(9)\n\ni\n\nwhere i indexes training samples and Txi is given by the contraction of one of the tensor-network\nmodels we have introduced. The negative log-likelihood can be minimized using a mini-batch\ngradient-descent algorithm. Note that when using complex tensors, the derivatives are replaced by\nWirtinger derivatives with respect to the conjugated tensor elements. This algorithm requires the\ncomputation of Txi and \u2202wTxi for a training instance, as well as of the normalization ZT and its\nderivative \u2202wZT . We \ufb01rst focus on the computation of these quantities for LPS. Since BM are LPS\nof puri\ufb01cation dimension \u00b5 = 1, they can directly use the same algorithm [34]. For an LPSC of\npuri-rank r, the normalization ZT can be computed by contracting the tensor network\n\nZT =\n\nA1\n\nA1\n\n, with derivatives\n\n\u2202ZT\n\u2202 \u00afAj,k,l\ni,m\n\nAi\n\n=\n\nEi\u22121\n\nFi+1 ,\n\n(10)\n\nj\n\nk l\n\nm\n\nwhere the tensors Ei and Fi are intermediate tensors obtained by contracting the left part and right\npart of the tensor network corresponding to the norm. All these computations can be performed\n\n7\n\n\fin O(d\u00b5r3N ) operations, and a similar contraction with \ufb01xed values for Xi is used for computing\nTxi and its derivative at a training example. More details about this algorithm are included in the\nsupplementary material, together with the algorithm we use for training MPSR\u22650, which is a variation\nof the one given above for LPS. MPSR\u22650 could also be trained using the expectation-maximization\n(EM) algorithm, but as BM and LPS use real or complex tensors, different algorithms are required.\nNote that in all these models not only the likelihood can be evaluated ef\ufb01ciently: marginals and\ncorrelation functions can be computed in a time linear in the number of variables, while exact samples\nfrom the distribution can also be generated ef\ufb01ciently [55, 34].\nInstead of approximating a distribution from samples, it might also be useful to compress a probability\nmass function P given in the form of a non-negative tensor. Since the original probability mass\nfunction has a number of parameters that is exponential in N, this is only possible for a small number\nof variables. It can be done by minimizing the Kullback\u2013Leibler (KL) divergence D(P||T /ZT ) =\n, where T is represented by a tensor-network model. The\ngradient of the KL-divergence can be obtained in the same way as the gradient of the log-likelihood\nand gradient-based optimization algorithms can then be used to solve this optimization problem. Note\nthat for the case of matrices and MPSR\u22650 more speci\ufb01c algorithms have been developed [56], and\n\ufb01nding more ef\ufb01cient algorithms for factorizing a given tensor in the form of a BM or LPS represents\nan interesting problem that we leave for future work.\n\n(cid:16) PX1 ,...,XN\n\nTX1,...,XN /ZT\n\n(cid:17)\n\n(cid:80)\n\nX1,...,XN\n\nPX1,...,XN log\n\n6 Numerical experiments\n\nUsing the algorithms discussed in Section 5 we numerically investigate the extent to which the sepa-\nrations found in Section 4 apply in both the setting of approximating a distribution from samples, and\nthe setting of compressing given non-negative tensors. Code, data sets and choice of hyperparameters\nare available in the supplementary material and the provided repository [57].\n\n6.1 Random tensor factorizations\n\nWe \ufb01rst generate random probability mass functions P by generating a tensor with elements chosen\nuniformly in [0, 1] and normalizing it. We then minimize the KL-divergence D(P||T /ZT ), where T\nis the tensor de\ufb01ned by an MPS, BM or LPS with given rank r. We choose LPS to have a puri\ufb01cation\ndimension of 2. Details of the optimization are available in the supplementary material.\n\nFigure 3: Mean of the minimum error of the approximation of 50 random tensors P with tensor\nnetworks of \ufb01xed rank, as a function of the rank or the number of (real) parameters. Left: 20 \u00d7 20\nmatrix. Right: tensor over 8 binary variables. The errors bars represent one standard deviation, and\nare omitted below 10\u221212.\nResults are presented in Fig. 3 for a 20 \u00d7 20 matrix and a tensor with 8 binary variables. They show\nthat complex BM as well as real and complex LPS generically provide a better approximation to a\ntensor than an MPS or real BM, for \ufb01xed rank as well as for \ufb01xed number of real parameters.\n\n8\n\n\f6.2 Maximum likelihood estimation on realistic data sets\n\nWe now investigate how well the different tensor-network representations are able to learn from\nrealistic data sets. We train MPSR\u22650, BMR, BMC, LPSR and LPSC (of puri\ufb01cation dimension 2) using\nthe algorithm of Section 5 on different data sets of categorical variables. Since we are interested in the\nexpressive power of the different representations we use the complete data sets and no regularization.\nAdditional results on generalization performance are included in the supplementary material.\n\nFigure 4: Maximum likelihood estimation with tensor networks, HMM and a Bayesian network\nwithout hidden units with graph learned from the data on different data sets: a) biofam data set\nof family life states from the Swiss Household Panel biographical survey [58]; data sets from the\nUCI Machine Learning Repository [59]: b) Lymphography [60], c) SPECT Heart, d) Congressional\nVoting Records, e) Primary Tumor [60], f) Solar Flare.\n\nThe results in Fig. 4 show the best negative log-likelihood per sample obtained for each tensor\nnetwork of \ufb01xed rank. As a comparison we also include the best negative log-likelihood obtained\nfrom an HMM trained using the Baum-Welch algorithm [61, 62], as well as the best possible Bayesian\nnetwork without hidden variables, where the network graph is learned from the data [62]. We observe\nthat despite the different algorithm choice, the performance of HMM and MPSR\u22650 are similar, as we\ncould expect from their relationship. On all data sets, BM and LPS lead to signi\ufb01cant improvements\nfor the same rank over MPSR\u22650.\n\n7 Conclusion\n\nWe have characterized the expressive power of various tensor-network models of probability distri-\nbutions, in the process enhancing the scope and applicability of the tensor-network toolbox within\nthe broader context of learning algorithms. In particular, our analysis has concrete implications\nfor model selection, suggesting that in generic settings LPS should be preferred over both hidden\nMarkov models and Born machines. Furthermore, our results prove that unexpectedly the use of\ncomplex tensors over real tensors can lead to an unbounded expressive advantage in particular network\narchitectures. Additionally, this work contributes to the growing body of rigorous results concerning\nthe expressive power of learning models, which have been obtained via tensor-network techniques. A\nformal understanding of the expressive power of state-of-the-art learning models is often elusive; it is\nhoped that both the techniques and spirit of this work can be used to add momentum to this program.\nFinally, through the formal relationship of LPS and Born machines to quantum circuits, our work\nprovides a concrete foundation for both the development and analysis of quantum machine learning\nalgorithms for near-term quantum devices.\n\n9\n\n\fAcknowledgments\n\nWe would like to thank Vedran Dunjko for his comments on the manuscript and Jo\u00e3o Gouveia for\nhis suggestion of the proof of Lemma 9 in the supplementary material. I. G., N. P. and J. I. C.\nare supported by an ERC Advanced Grant QENOCOBA under the EU Horizon 2020 program\n(grant agreement 742102) and the German Research Foundation (DFG) under Germany\u2019s Excellence\nStrategy through Project No. EXC-2111 - 390814868 (MCQST). R. S. acknowledges the \ufb01nancial\nsupport of the Alexander von Humboldt foundation. N. P. acknowledges \ufb01nancial support from\nExQM. J. E. acknowledges \ufb01nancial support by the German Research Foundation DFG (CRC 183\nproject B2, EI 519/7-1, CRC 1114, GRK 2433) and MATH+. This work has also received funding\nfrom the European Union\u2019s Horizon 2020 research and innovation programme under grant agreement\nNo 817482 (PASQuanS).\n\nReferences\n[1] Stellan \u00d6stlund and Stefan Rommer. Thermodynamic limit of density matrix renormalization. Physical\n\nReview Letters, 75:3537\u20133540, 1995.\n\n[2] Ivan V. Oseledets. Tensor-train decomposition. SIAM Journal on Scienti\ufb01c Computing, 33(5):2295\u20132317,\n\n2011.\n\n[3] Wolfgang Hackbusch and Stefan K\u00fchn. A new scheme for the tensor representation. Journal of Fourier\n\nAnalysis and Applications, 15(5):706\u2013722, 2009.\n\n[4] Lars Grasedyck. Hierarchical singular value decomposition of tensors. SIAM Journal on Matrix Analysis\n\nand Applications, 31(4):2029\u20132054, 2010.\n\n[5] Andrzej Cichocki, Namgil Lee, Ivan Oseledets, Anh-Huy Phan, Qibin Zhao, and Danilo P. Mandic. Tensor\nnetworks for dimensionality reduction and large-scale optimization: Part 1 low-rank tensor decompositions.\nFoundations and Trends R(cid:13) in Machine Learning, 9(4-5):249\u2013429, 2016.\n\n[6] Andrzej Cichocki, Anh-Huy Phan, Qibin Zhao, Namgil Lee, Ivan Oseledets, Masashi Sugiyama, and\nDanilo P. Mandic. Tensor networks for dimensionality reduction and large-scale optimization: Part 2\napplications and future perspectives. Foundations and Trends R(cid:13) in Machine Learning, 9(6):431\u2013673, 2017.\n[7] Alexander Novikov, Dmitry Podoprikhin, Anton Osokin, and Dmitry P. Vetrov. Tensorizing neural\nnetworks. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural\nInformation Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 442\u2013450,\n2015.\n\n[8] Ulrich Schollw\u00f6ck. The density-matrix renormalization group in the age of matrix product states. Annals\n\nof Physics, 326(1):96 \u2013 192, 2011.\n\n[9] Rom\u00e1n Or\u00fas. A practical introduction to tensor networks: Matrix product states and projected entangled\n\npair states. Annals of Physics, 349:117 \u2013 158, 2014.\n\n[10] Valentin Murg Frank Verstraete and J. Ignacio Cirac. Matrix product states, projected entangled pair\nstates, and variational renormalization group methods for quantum spin systems. Advances in Physics,\n57(2):143\u2013224, 2008.\n\n[11] Hanie Sedghi, Majid Janzamin, and Anima Anandkumar. Provable tensor methods for learning mixtures of\ngeneralized linear models. In Proceedings of the 19th International Conference on Arti\ufb01cial Intelligence\nand Statistics, AISTATS 2016, Cadiz, Spain, May 9-11, 2016, pages 1223\u20131231, 2016.\n\n[12] Daniel J. Hsu, Sham M. Kakade, and Tong Zhang. A spectral algorithm for learning hidden markov models.\n\nJournal of Computer and System Sciences, 78(5):1460\u20131480, 2012.\n\n[13] Animashree Anandkumar, Rong Ge, Daniel J. Hsu, Sham M. Kakade, and Matus Telgarsky. Tensor\ndecompositions for learning latent variable models. Journal of Machine Learning Research, 15(1):2773\u2013\n2832, 2014.\n\n[14] Edwin Miles Stoudenmire and David J. Schwab. Supervised learning with tensor networks. In Advances in\nNeural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems\n2016, December 5-10, 2016, Barcelona, Spain, pages 4799\u20134807, 2016.\n\n[15] Alexander Novikov, Mikhail Tro\ufb01mov, and Ivan V. Oseledets. Exponential machines. In 5th International\nConference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track\nProceedings, 2017.\n\n10\n\n\f[16] Mathieu Blondel, Masakazu Ishihata, Akinori Fujino, and Naonori Ueda. Polynomial networks and\nIn Proceedings of the 33nd\nfactorization machines: New insights and ef\ufb01cient training algorithms.\nInternational Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016,\npages 850\u2013858, 2016.\n\n[17] Nadav Cohen, Or Sharir, and Amnon Shashua. On the expressive power of deep learning: A tensor analysis.\nIn Proceedings of the 29th Conference on Learning Theory, COLT 2016, New York, USA, June 23-26, 2016,\npages 698\u2013728, 2016.\n\n[18] Nadav Cohen and Amnon Shashua. Convolutional recti\ufb01er networks as generalized tensor decompositions.\nIn Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City,\nNY, USA, June 19-24, 2016, pages 955\u2013963, 2016.\n\n[19] Nadav Cohen and Amnon Shashua.\n\nInductive bias of deep convolutional networks through pooling\ngeometry. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April\n24-26, 2017, Conference Track Proceedings, 2017.\n\n[20] Nadav Cohen, Or Sharir, Yoav Levine, Ronen Tamari, David Yakira, and Amnon Shashua. Analysis and\n\ndesign of convolutional networks via hierarchical tensor decompositions. arXiv:1705.02302, 2017.\n\n[21] Yoav Levine, David Yakira, Nadav Cohen, and Amnon Shashua. Deep learning and quantum entanglement:\nIn 6th International Conference on\nFundamental connections with implications to network design.\nLearning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track\nProceedings, 2018.\n\n[22] Valentin Khrulkov, Alexander Novikov, and Ivan V. Oseledets. Expressive power of recurrent neural\nnetworks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC,\nCanada, April 30 - May 3, 2018, Conference Track Proceedings, 2018.\n\n[23] Jens Eisert. Entanglement and tensor network states. Modeling and Simulation, 3:520, 2013.\n\n[24] Andrew Critch. Algebraic geometry of matrix product states. Symmetry, Integrability and Geometry:\n\nMethods and Applications, 10:095, 2014.\n\n[25] Martin Kliesch, David Gross, and Jens Eisert. Matrix-product operators and states: NP-hardness and\n\nundecidability. Physical Review Letters, 113:160503, 2014.\n\n[26] Jing Chen, Song Cheng, Haidong Xie, Lei Wang, and Tao Xiang. Equivalence of restricted Boltzmann\n\nmachines and tensor network states. Physical Review B, 97:085104, 2018.\n\n[27] Ivan Glasser, Nicola Pancotti, Moritz August, Ivan D. Rodriguez, and J. Ignacio Cirac. Neural-network\n\nquantum states, string-bond states, and chiral topological states. Physical Review X, 8:011006, 2018.\n\n[28] Elina Robeva and Anna Seigal. Duality of graphical models and tensor networks. arXiv:1710.01437, 2017.\n\n[29] Ivan Glasser, Nicola Pancotti, and J. Ignacio Cirac. Supervised learning with generalized tensor networks.\n\narXiv:1806.05964, 2018.\n\n[30] Priyank Jaini, Pascal Poupart, and Yaoliang Yu. Deep homogeneous mixture models: Representation,\nIn Advances in Neural Information Processing Systems 31: Annual\nseparation, and approximation.\nConference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montr\u00e9al,\nCanada., pages 7136\u20137145, 2018.\n\n[31] Or Sharir, Ronen Tamari, Nadav Cohen, and Amnon Shashua. Tensorial mixture models. arXiv:1610.04167,\n\n2016.\n\n[32] Amnon Shashua and Tamir Hazan. Non-negative tensor factorization with applications to statistics and\ncomputer vision. In Proceedings of the 22Nd International Conference on Machine Learning, ICML \u201905,\npages 792\u2013799, New York, NY, USA, 2005. ACM.\n\n[33] Brendan J. Frey. Extending factor graphs so as to unify directed and undirected graphical models. In UAI\n\u201903, Proceedings of the 19th Conference in Uncertainty in Arti\ufb01cial Intelligence, Acapulco, Mexico, August\n7-10 2003, pages 257\u2013264, 2003.\n\n[34] Zhao-Yu Han, Jun Wang, Heng Fan, Lei Wang, and Pan Zhang. Unsupervised generative modeling using\n\nmatrix product states. Physical Review X, 8:031012, 2018.\n\n[35] Song Cheng, Jing Chen, and Lei Wang. Information perspective to probabilistic modeling: Boltzmann\n\nmachines versus born machines. Entropy, 20(8), 2018.\n\n11\n\n\f[36] Chengran Yang, Felix C. Binder, Varun Narasimhachar, and Mile Gu. Matrix product states for quantum\n\nstochastic modeling. Physical Review Letters, 121:260602, 2018.\n\n[37] Vasily Pestun, John Terilla, and Yiannis Vlassopoulos.\n\narXiv:1711.01416, 2017.\n\nLanguage as a matrix product state.\n\n[38] E. Miles Stoudenmire. Learning relevant features of data with multi-scale tensor networks. Quantum\n\nScience and Technology, 3(3):034003, 2018.\n\n[39] James Stokes and John Terilla. Probabilistic modeling with matrix product states. arXiv:1902.06888,\n\nabs/1902.06888, 2019.\n\n[40] Song Cheng, Lei Wang, Tao Xiang, and Pan Zhang. Tree tensor networks for generative modeling. Physical\n\nReview B, 99:155131, Apr 2019.\n\n[41] Jin-Guo Liu and Lei Wang. Differentiable learning of quantum circuit born machines. Physical Review A,\n\n98:062324, 2018.\n\n[42] Marcello Benedetti, Del\ufb01na Garcia-Pintos, Yunseong Nam, and Alejandro Perdomo-Ortiz. A generative\nmodeling approach for benchmarking and training shallow quantum circuits. npj Quantum Information,\n5(45), 2018.\n\n[43] Edward Grant, Marcello Benedetti, Shuxiang Cao, Andrew Hallam, Joshua Lockhart, Vid Stojevic,\nAndrew G. Green, and Simone Severini. Hierarchical quantum classi\ufb01ers. npj Quantum Information, 4(1),\n2018.\n\n[44] William Huggins, Piyush Patil, Bradley Mitchell, K. Birgitta Whaley, and E. Miles Stoudenmire. Towards\nquantum machine learning with tensor networks. Quantum Science and Technology, 4(2):024001, 2019.\n\n[45] Frank Verstraete, Juan Jos\u00e9 Garc\u00eda-Ripoll, and J. Ignacio Cirac. Matrix product density operators:\n\nSimulation of \ufb01nite-temperature and dissipative systems. Physical Review Letters, 93:207204, 2004.\n\n[46] Gemma De las Cuevas, Norbert Schuch, David P\u00e9rez-Garc\u00eda, and J. Ignacio Cirac. Puri\ufb01cations of\nmultipartite states: limitations and constructive methods. New Journal of Physics, 15(12):123021, 2013.\n\n[47] Thomas Barthel. Precise evaluation of thermal response functions by optimized density matrix renormal-\n\nization group schemes. New Journal of Physics, 15(7):073010, 2013.\n\n[48] Albert H. Werner, Daniel Jaschke, Pietro Silvi, Martin Kliesch, Tommaso Calarco, Jens Eisert, and Simone\nMontangero. Positive tensor network approach for simulating open quantum many-body systems. Physical\nReview Letters, 116(23):237201, 2016.\n\n[49] Hamza Fawzi, Jo\u00e3o Gouveia, Pablo A. Parrilo, Richard Z. Robinson, and Rekha R. Thomas. Positive\n\nsemide\ufb01nite rank. Mathematical Programming, 153(1):133\u2013177, 2015.\n\n[50] Amir Shpilka and Amir Yehudayoff. Arithmetic circuits: A survey of recent results and open questions.\n\nFoundations and Trends in Theoretical Computer Science, 5(3-4):207\u2013388, 2010.\n\n[51] Joel E. Cohen and Uriel G. Rothblum. Nonnegative ranks, decompositions, and factorizations of nonnega-\n\ntive matrices. Linear Algebra and its Applications, 190:149 \u2013 168, 1993.\n\n[52] Ant\u00f3nio Pedro Goucha, Jo ao Gouveia, and Pedro M. Silva. On ranks of regular polygons. SIAM Journal\n\non Discrete Mathematics, 31(4):2612\u20132625, 2017.\n\n[53] Jo\u00e3o Gouveia, Pablo A. Parrilo, and Rekha R. Thomas. Lifts of convex sets and cone factorizations.\n\nMathematics of Operations Research, 38(2):248\u2013264, 2013.\n\n[54] Gemma De las Cuevas and Tim Netzer. Mixed states in one spatial dimension: decompositions and\n\ncorrespondence with nonnegative matrices. arXiv:1907.03664, 2019.\n\n[55] Andrew J. Ferris and Guifre Vidal. Perfect sampling with unitary tensor networks. Physical Review B,\n\n85:165146, 2012.\n\n[56] Daniel D. Lee and H. Sebastian Seung. Algorithms for non-negative matrix factorization. In Advances in\nNeural Information Processing Systems 13, Papers from Neural Information Processing Systems (NIPS)\n2000, Denver, CO, USA, pages 556\u2013562, 2000.\n\n[57] https://github.com/glivan/tensor_networks_for_probabilistic_modeling.\n\n12\n\n\f[58] Nicolas S. M\u00fcller, Matthias Studer, and Gilbert Ritschard. Classi\ufb01cation de parcours de vie \u00e0 l\u2019aide de\nl\u2019optimal matching. In XIVe Rencontre de la Soci\u00e9t\u00e9 francophone de classi\ufb01cation, Paris (SFC 2007), page\n157\u2013160, 2007.\n\n[59] Dheeru Dua and Casey Graff. UCI machine learning repository, 2019.\n\n[60] This lymphography and tumor domains were obtained from the University Medical Centre, Institute of\n\nOncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and M. Soklic for providing the data.\n\n[61] Leonard E. Baum, Ted Petrie, George Soules, and Norman Weiss. A maximization technique occurring in\nthe statistical analysis of probabilistic functions of markov chains. The Annals of Mathematical Statistics,\n41(1):164\u2013171, 1970.\n\n[62] Jacob Schreiber. pomegranate: Fast and \ufb02exible probabilistic modeling in python. J. Mach. Learn. Res.,\n\n18:164:1\u2013164:6, 2017.\n\n13\n\n\f", "award": [], "sourceid": 845, "authors": [{"given_name": "Ivan", "family_name": "Glasser", "institution": "Max Planck Institute of Quantum Optics"}, {"given_name": "Ryan", "family_name": "Sweke", "institution": "Freie Universitaet Berlin"}, {"given_name": "Nicola", "family_name": "Pancotti", "institution": "Max Planck Institute of Quantum Optics"}, {"given_name": "Jens", "family_name": "Eisert", "institution": "Freie Universitaet Berlin"}, {"given_name": "Ignacio", "family_name": "Cirac", "institution": "Max-Planck Institute of Quantum Optics"}]}