{"title": "Information Theoretic Properties of Markov Random Fields, and their Algorithmic Applications", "book": "Advances in Neural Information Processing Systems", "page_first": 2463, "page_last": 2472, "abstract": "Markov random fields are a popular model for high-dimensional probability distributions. Over the years, many mathematical, statistical and algorithmic problems on them have been studied. Until recently, the only known algorithms for provably learning them relied on exhaustive search, correlation decay or various incoherence assumptions. Bresler gave an algorithm for learning general Ising models on bounded degree graphs. His approach was based on a structural result about mutual information in Ising models. Here we take a more conceptual approach to proving lower bounds on the mutual information. Our proof generalizes well beyond Ising models, to arbitrary Markov random fields with higher order interactions. As an application, we obtain algorithms for learning Markov random fields on bounded degree graphs on $n$ nodes with $r$-order interactions in $n^r$ time and $\\log n$ sample complexity. Our algorithms also extend to various partial observation models.", "full_text": "Information Theoretic Properties of Markov Random\n\nFields, and their Algorithmic Applications\n\nLinus Hamilton\u2217\n\nFrederic Koehler \u2020\n\nAnkur Moitra \u2021\n\nAbstract\n\nMarkov random \ufb01elds are a popular model for high-dimensional probability distri-\nbutions. Over the years, many mathematical, statistical and algorithmic problems\non them have been studied. Until recently, the only known algorithms for provably\nlearning them relied on exhaustive search, correlation decay or various incoher-\nence assumptions. Bresler [4] gave an algorithm for learning general Ising models\non bounded degree graphs. His approach was based on a structural result about\nmutual information in Ising models.\nHere we take a more conceptual approach to proving lower bounds on the mutual\ninformation. Our proof generalizes well beyond Ising models, to arbitrary Markov\nrandom \ufb01elds with higher order interactions. As an application, we obtain algo-\nrithms for learning Markov random \ufb01elds on bounded degree graphs on n nodes\nwith r-order interactions in nr time and log n sample complexity. Our algorithms\nalso extend to various partial observation models.\n\n1\n\nIntroduction\n\n1.1 Background\n\nMarkov random \ufb01elds are a popular model for de\ufb01ning high-dimensional distributions by using a\ngraph to encode conditional dependencies among a collection of random variables. More precisely,\nthe distribution is described by an undirected graph G = (V, E) where to each of the n nodes u \u2208 V\nwe associate a random variable Xu which takes on one of ku different states. The crucial property\nis that the conditional distribution of Xu should only depend on the states of u\u2019s neighbors. It turns\nout that as long as every con\ufb01guration has positive probability, the distribution can be written as\n\nPr(a1,\u00b7\u00b7\u00b7 an) = exp\n\n\u03b8i1\u00b7\u00b7\u00b7i(cid:96)(a1,\u00b7\u00b7\u00b7 an) \u2212 C\n\n(1)\n\n(cid:32) r(cid:88)\n\n(cid:88)\n\n(cid:96)=1\n\ni1 \u03c4, set S := S \u222a I.\n3. For each i \u2208 S, if(cid:98)\u03bdu,i|S\\i < \u03c4 then remove i from S.\n\n4. Return set S as our estimate of the neighborhood of u.\n\n(cid:16)\n\nm \u2265 60K 2L\n\u03c4 2\u03b42L\n\n(cid:17)\n\nTheorem 5.1. Fix \u03c9 > 0. Suppose we are given m samples from an \u03b1, \u03b2-non-degenerate Markov\nrandom \ufb01eld with r-order interactions where the underlying graph has maximum degree at most D\nand each node takes on at most K states. Suppose that\n\nlog(1/\u03c9) + log(L + r) + (L + r) log(nK) + log 2\n\n.\n\nThen with probability at least 1 \u2212 \u03c9, MRFNBHD when run starting from each node u recovers the\ncorrect neighborhood of u, and thus recovers the underlying graph G. Furthermore, each run of the\nalgorithm takes O(mLnr) time.\n\nIn many situations, it is too expensive to obtain full samples from a Markov random \ufb01eld (e.g. this\ncould involve needing to measure every potential symptom of a patient). Here we consider a model\nwhere we are allowed only partial observations in the form of a C-bounded query:\nDe\ufb01nition 5.2. A C-bounded query to a Markov random \ufb01eld is speci\ufb01ed by a set S with |S| \u2264 C\nand we observe XS\n\n8\n\n\fOur algorithm MRFNBHD can be made to work with C-bounded queries instead of full observations.\nWe prove:\nTheorem 5.3. Fix an \u03b1, \u03b2-non-degenerate Markov random \ufb01eld with r-order interactions where the\nunderlying graph has maximum degree at most D and each node takes on at most K states. The\nbounded queries modi\ufb01cation to the algorithm returns the correct neighborhood of every vertex u\nusing m(cid:48)Lrnr-bounded queries of size at most L + r where\n\n(cid:17)\n\n(cid:16)\n\nm(cid:48) =\n\n60K 2L\n\u03c4 2\u03b42L\n\nlog(Lrnr/\u03c9) + log(L + r) + (L + r) log(nK) + log 2\n\n,\n\nwith probability at least 1 \u2212 \u03c9.\nIn the supplementary material, we extend our results to the setting where we observe partial samples\nwhere the state of each node is revealed independently with probability p, and the choice of which\nnodes to reveal is independent of the sample.\n\nAcknowledgements: We thank Guy Bresler for valuable discussions and feedback.\n\nReferences\n[1] Pieter Abbeel, Daphne Koller, and Andrew Y Ng. Learning factor graphs in polynomial time and sample\n\ncomplexity. Journal of Machine Learning Research, 7(Aug):1743\u20131788, 2006.\n\n[2] Anima Anandkumar, Daniel J Hsu, Furong Huang, and Sham M Kakade. Learning mixtures of tree\n\ngraphical models. In Advances in Neural Information Processing Systems, pages 1052\u20131060, 2012.\n\n[3] Animashree Anandkumar, Vincent YF Tan, Furong Huang, and Alan S Willsky. High-dimensional struc-\nture estimation in ising models: Local separation criterion. The Annals of Statistics, pages 1346\u20131375,\n2012.\n\n[4] Guy Bresler. Ef\ufb01ciently learning ising models on arbitrary graphs. In Proceedings of the Forty-Seventh\n\nAnnual ACM on Symposium on Theory of Computing, pages 771\u2013782. ACM, 2015.\n\n[5] Guy Bresler, Elchanan Mossel, and Allan Sly. Reconstruction of markov random \ufb01elds from samples:\nSome observations and algorithms. In Approximation, Randomization and Combinatorial Optimization.\nAlgorithms and Techniques, pages 343\u2013356. Springer, 2008.\n\n[6] Stephen G Brush. History of the lenz-ising model. Reviews of modern physics, 39(4):883, 1967.\n\n[7] C Chow and Cong Liu. Approximating discrete probability distributions with dependence trees. IEEE\n\ntransactions on Information Theory, 14(3):462\u2013467, 1968.\n\n[8] Imre Csisz\u00b4ar and Zsolt Talata. Consistent estimation of the basic neighborhood of markov random \ufb01elds.\nIn Information Theory, 2004. ISIT 2004. Proceedings. International Symposium on, page 170. IEEE,\n2004.\n\n[9] Gautam Dasarathy, Aarti Singh, Maria-Florina Balcan, and Jong Hyuk Park. Active learning algorithms\n\nfor graphical model selection. J. Mach. Learn. Res, page 199207, 2016.\n\n[10] Sanjoy Dasgupta. Learning polytrees.\n\nIn Proceedings of the Fifteenth conference on Uncertainty in\n\narti\ufb01cial intelligence, pages 134\u2013141. Morgan Kaufmann Publishers Inc., 1999.\n\n[11] Ali Jalali, Pradeep Ravikumar, Vishvas Vasuki, and Sujay Sanghavi. On learning discrete graphical\nmodels using group-sparse regularization. In Proceedings of the Fourteenth International Conference on\nArti\ufb01cial Intelligence and Statistics, pages 378\u2013387, 2011.\n\n[12] Jon Kleinberg and Eva Tardos. Approximation algorithms for classi\ufb01cation problems with pairwise re-\nlationships: Metric labeling and markov random \ufb01elds. Journal of the ACM (JACM), 49(5):616\u2013639,\n2002.\n\n[13] Su-In Lee, Varun Ganapathi, and Daphne Koller. Ef\ufb01cient structure learning of markov networks using l\n1-regularization. In Proceedings of the 19th International Conference on Neural Information Processing\nSystems, pages 817\u2013824. MIT Press, 2006.\n\n[14] Fabio Martinelli and Enzo Olivieri. Approach to equilibrium of glauber dynamics in the one phase region.\n\nCommunications in Mathematical Physics, 161(3):447\u2013486, 1994.\n\n9\n\n\f[15] Elchanan Mossel, Dror Weitz, and Nicholas Wormald. On the hardness of sampling independent sets\n\nbeyond the tree threshold. Probability Theory and Related Fields, 143(3):401\u2013439, 2009.\n\n[16] Ryan O\u2019Donnell. Analysis of Boolean Functions. Cambridge University Press, New York, NY, USA,\n\n2014.\n\n[17] Pradeep Ravikumar, Martin J Wainwright, John D Lafferty, et al. High-dimensional ising model selection\n\nusing ?1-regularized logistic regression. The Annals of Statistics, 38(3):1287\u20131319, 2010.\n\n[18] Narayana P Santhanam and Martin J Wainwright. Information-theoretic limits of selecting binary graph-\n\nical models in high dimensions. IEEE Transactions on Information Theory, 58(7):4117\u20134134, 2012.\n\n[19] Allan Sly. Computational transition at the uniqueness threshold. In Foundations of Computer Science\n\n(FOCS), 2010 51st Annual IEEE Symposium on, pages 287\u2013296. IEEE, 2010.\n\n[20] Allan Sly and Nike Sun. The computational hardness of counting in two-spin models on d-regular graphs.\nIn Foundations of Computer Science (FOCS), 2012 IEEE 53rd Annual Symposium on, pages 361\u2013369.\nIEEE, 2012.\n\n[21] Nathan Srebro. Maximum likelihood bounded tree-width markov networks. In Proceedings of the Seven-\nteenth conference on Uncertainty in arti\ufb01cial intelligence, pages 504\u2013511. Morgan Kaufmann Publishers\nInc., 2001.\n\n[22] Gregory Valiant. Finding correlations in subquadratic time, with applications to learning parities and\nIn Foundations of Computer Science (FOCS), 2012 IEEE 53rd Annual Symposium on, pages\n\njuntas.\n11\u201320. IEEE, 2012.\n\n[23] Marc Vuffray, Sidhant Misra, Andrey Lokhov, and Michael Chertkov. Interaction screening: Ef\ufb01cient and\nsample-optimal learning of ising models. In Advances in Neural Information Processing Systems, pages\n2595\u20132603, 2016.\n\n10\n\n\f", "award": [], "sourceid": 1443, "authors": [{"given_name": "Linus", "family_name": "Hamilton", "institution": "MIT"}, {"given_name": "Frederic", "family_name": "Koehler", "institution": "MIT"}, {"given_name": "Ankur", "family_name": "Moitra", "institution": null}]}