Globally optimal score-based learning of directed acyclic graphs in high-dimensions

Part of Advances in Neural Information Processing Systems 32 (NeurIPS 2019)

AuthorFeedback Bibtex MetaReview Metadata Paper Reviews Supplemental


Bryon Aragam, Arash Amini, Qing Zhou


We prove that $\Omega(s\log p)$ samples suffice to learn a sparse Gaussian directed acyclic graph (DAG) from data, where $s$ is the maximum Markov blanket size. This improves upon recent results that require $\Omega(s^{4}\log p)$ samples in the equal variance case. To prove this, we analyze a popular score-based estimator that has been the subject of extensive empirical inquiry in recent years and is known to achieve state-of-the-art results. Furthermore, the approach we study does not require strong assumptions such as faithfulness that existing theory for score-based learning crucially relies on. The resulting estimator is based around a difficult nonconvex optimization problem, and its analysis may be of independent interest given recent interest in nonconvex optimization in machine learning. Our analysis overcomes the drawbacks of existing theoretical analyses, which either fail to guarantee structure consistency in high-dimensions (i.e. learning the correct graph with high probability), or rely on restrictive assumptions. In contrast, we give explicit finite-sample bounds that are valid in the important $p\gg n$ regime.