Ting liu, Andrew Moore, Alexander Gray
Computer Science Dept.
Carnegie Mellon University
Pittsburgh, PA 15213 firstname.lastname@example.org
This paper is about non-approximate acceleration of high dimensional nonparametric operations such as k nearest neighbor classiﬁers and the prediction phase of Support Vector Machine classiﬁers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly ﬁnd the datapoints close to the query, but merely need to ask questions about the properties about that set of datapoints. This offers a small amount of computational lee- way, and we investigate how much that leeway can be exploited. For clarity, this paper concentrates on pure k-NN classiﬁcation and the pre- diction phase of SVMs. We introduce new ball tree algorithms that on real-world datasets give accelerations of 2-fold up to 100-fold compared against highly optimized traditional ball-tree-based k-NN. These results include datasets with up to 106 dimensions and 105 records, and show non-trivial speedups while giving exact answers.