Charles Isbell, Parry Husbands
Imagine that you wish to classify data consisting of tens of thousands of ex(cid:173) amples residing in a twenty thousand dimensional space. How can one ap(cid:173) ply standard machine learning algorithms? We describe the Parallel Prob(cid:173) lems Server (PPServer) and MATLAB*P. In tandem they allow users of networked computers to work transparently on large data sets from within Matlab. This work is motivated by the desire to bring the many benefits of scientific computing algorithms and computational power to machine learning researchers. We demonstrate the usefulness of the system on a number of tasks. For example, we perform independent components analysis on very large text corpora consisting of tens of thousands of documents, making minimal changes to the original Bell and Sejnowski Matlab source (Bell and Se(cid:173) jnowski, 1995). Applying ML techniques to data previously beyond their reach leads to interesting analyses of both data and algorithms.