Authors

Karthika Mohan, Judea Pearl, Jin Tian

Abstract

We address the problem of deciding whether there exists a consistent estimator of a given relation Q, when data are missing not at random. We employ a formal representation called `Missingness Graphs' to explicitly portray the causal mechanisms responsible for missingness and to encode dependencies between these mechanisms and the variables being measured. Using this representation, we define the notion of \textit{recoverability} which ensures that, for a given missingness-graph $G$ and a given query $Q$ an algorithm exists such that in the limit of large samples, it produces an estimate of $Q$ \textit{as if} no data were missing. We further present conditions that the graph should satisfy in order for recoverability to hold and devise algorithms to detect the presence of these conditions.