Part of Advances in Neural Information Processing Systems 33 (NeurIPS 2020)
Zahra Razaee, Arash Amini
Modeling dependencies in multivariate discrete data is a challenging problem, especially in high dimensions. The Potts model is a versatile such model, suitable when each coordinate is a categorical variable. However, the full Potts model has too many parameters to be accurately fit when the number of categories is large. We introduce a variation on the Potts model that allows for general categorical marginals and Ising-type multivariate dependence. This reduces the number of parameters from $\Omega(d^2 K^2)$ in the full Potts model to $O(d^2 + Kd)$, where $K$ is the number of categories and $d$ is the dimension of the data. We show that the complexity of fitting this new Potts-Ising model is the same as that of an Ising model. In particular, adopting the neighborhood regression framework, the model can be fit by solving $d$ separate logistic regressions. We demonstrate the ability of the model to capture multivariate dependencies by comparing with existing approaches.