Bernd Fischer, Johann Schumann, Wray Buntine, Alexander Gray
Machine learning has reached a point where many probabilistic meth- ods can be understood as variations, extensions and combinations of a much smaller set of abstract themes, e.g., as different instances of the EM algorithm. This enables the systematic derivation of algorithms cus- tomized for different models. Here, we describe the AUTO BAYES sys- tem which takes a high-level statistical model speciﬁcation, uses power- ful symbolic techniques based on schema-based program synthesis and computer algebra to derive an efﬁcient specialized algorithm for learning that model, and generates executable code implementing that algorithm. This capability is far beyond that of code collections such as Matlab tool- boxes or even tools for model-independent optimization such as BUGS for Gibbs sampling: complex new algorithms can be generated with- out new programming, algorithms can be highly specialized and tightly crafted for the exact structure of the model and data, and efﬁcient and commented code can be generated for different languages or systems. We present automatically-derived algorithms ranging from closed-form solutions of Bayesian textbook problems to recently-proposed EM algo- rithms for clustering, regression, and a multinomial form of PCA.
1 Automatic Derivation of Statistical Algorithms
Overview. We describe a symbolic program synthesis system which works as a “statistical algorithm compiler:” it compiles a statistical model speciﬁcation into a custom algorithm design and from that further down into a working program implementing the algorithm design. This system, AUTOBAYES, can be loosely thought of as “part theorem prover, part Mathematica, part learning textbook, and part Numerical Recipes.” It provides much more ﬂexibility than a ﬁxed code repository such as a Matlab toolbox, and allows the creation of efﬁcient algorithms which have never before been implemented, or even written down. AUTOBAYES is intended to automate the more routine application of complex methods in novel contexts. For example, recent multinomial extensions to PCA [2, 4] can be derived in this way.
The algorithm design problem. Given a dataset and a task, creating a learning method can be characterized by two main questions: 1. What is the model? 2. What algorithm will optimize the model parameters? The statistical algorithm (i.e., a parameter optimization algorithm for the statistical model) can then be implemented manually. The system in this paper answers the algorithm question given that the user has chosen a model for the data,and continues through to implementation. Performing this task at the state-of-the-art level requires an intertwined meld of probability theory, computational mathematics, and software engineering. However, a number of factors unite to allow us to solve the algorithm design problem computationally: 1. The existence of fundamental building blocks (e.g., standardized probability distributions, standard optimization procedures, and generic data structures). 2. The existence of common representations (i.e., graphical models [3, 13] and program schemas). 3. The formalization of schema applicability constraints as guards.1 The challenges of algorithm design. The design problem has an inherently combinatorial nature, since subparts of a function may be optimized recursively and in different ways. It also involves the use of new data structures or approximations to gain performance. As the research in statistical algorithms advances, its creative focus should move beyond the ultimately mechanical aspects and towards extending the abstract applicability of already existing schemas (algorithmic principles like EM), improving schemas in ways that gener- alize across anything they can be applied to, and inventing radically new schemas.
2 Combining Schema-based Synthesis and Bayesian Networks
with 0 < n_points;
1 model mog as ’Mixture of Gaussians’;
with 0 < nclasses with nclasses << n_points;
with 1 = sum(I := 1..n_classes, phi(I));
7 double phi(1..nclasses) as ’weights’ 8 9 double mu(1..nclasses); 9 double sigma(1..n_classes);
2 const int npoints as ’nr. of data points’ 3 4 const int nclasses := 3 as ’nr. classes’ 5 6
Statistical Models. Externally, AUTOBAYES has the look and feel of a compiler. Users specify their model of interest in a high-level speciﬁcation language (as opposed to a program- ming language). The ﬁgure shows the speciﬁcation of the mixture of Gaus- sians example used throughout this paper.2 Note the constraint that the sum of the class probabilities must equal one (line 8) along with others (lines 3 and 5) that make optimization of the model well-deﬁned. Also note the ability to specify assumptions of the kind in line 6, which may be used by some algorithms. The last line speciﬁes the goal
10 int c(1..npoints) as ’class labels’; 11 c ˜ disc(vec(I := 1..nclasses, phi(I)));
12 data double x(1..n_points) as ’data’; 13 x(I) ˜ gauss(mu(c(I)), sigma(c(I)));
14 max pr(x| phi,mu,sigma ) wrt phi,mu,sigma ; inference task: maximize the conditional probability pr rameters