Parallelizing Support Vector Machines on Distributed Computers

Part of Advances in Neural Information Processing Systems 20 (NIPS 2007)

Bibtex Metadata Paper

Authors

Kaihua Zhu, Hao Wang, Hongjie Bai, Jian Li, Zhihuan Qiu, Hang Cui, Edward Chang

Abstract

Support Vector Machines (SVMs) suffer from a widely recognized scalability problem in both memory use and computational time. To improve scalability, we have developed a parallel SVM algorithm (PSVM), which reduces memory use through performing a row-based, approximate matrix factorization, and which loads only essential data to each machine to perform parallel computation. Let $n$ denote the number of training instances, $p$ the reduced matrix dimension after factorization ($p$ is significantly smaller than $n$), and $m$ the number of machines. PSVM reduces the memory requirement from $\MO$($n^2$) to $\MO$($np/m$), and improves computation time to $\MO$($np^2/m$). Empirical studies on up to $500$ computers shows PSVM to be effective.