Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks

Part of Advances in Neural Information Processing Systems 13 (NIPS 2000)

Bibtex Metadata Paper


Brendan J. Frey, Relu Patrascu, Tommi Jaakkola, Jodi Moran


An important class of problems can be cast as inference in noisy(cid:173) OR Bayesian networks, where the binary state of each variable is a logical OR of noisy versions of the states of the variable's par(cid:173) ents. For example, in medical diagnosis, the presence of a symptom can be expressed as a noisy-OR of the diseases that may cause the symptom - on some occasions, a disease may fail to activate the symptom. Inference in richly-connected noisy-OR networks is in(cid:173) tractable, but approximate methods (e .g., variational techniques) are showing increasing promise as practical solutions. One prob(cid:173) lem with most approximations is that they tend to concentrate on a relatively small number of modes in the true posterior, ig(cid:173) noring other plausible configurations of the hidden variables. We introduce a new sequential variational method for bipartite noisy(cid:173) OR networks, that favors including all modes of the true posterior and models the posterior distribution as a tree. We compare this method with other approximations using an ensemble of networks with network statistics that are comparable to the QMR-DT med(cid:173) ical diagnostic network.

Inclusive variational approximations

1 Approximate algorithms for probabilistic inference are gaining in popularity and are now even being incorporated into VLSI hardware (T. Richardson, personal commu(cid:173) nication). Approximate methods include variational techniques (Ghahramani and Jordan 1997; Saul et al. 1996; Frey and Hinton 1999; Jordan et al. 1999), local prob(cid:173) ability propagation (Gallager 1963; Pearl 1988; Frey 1998; MacKay 1999a; Freeman and Weiss 2001) and Markov chain Monte Carlo (Neal 1993; MacKay 1999b). Many algorithms have been proposed in each of these classes.

One problem that most of the above algorithms suffer from is a tendency to con(cid:173) centrate on a relatively small number of modes of the target distribution (the dis(cid:173) tribution being approximated). In the case of medical diagnosis, different modes correspond to different explanations of the symptoms. Markov chain Monte Carlo methods are usually guaranteed to eventually sample from all the modes, but this may take an extremely long time, even when tempered transitions (Neal 1996) are