PDP: Parameter-free Differentiable Pruning is All You Need

Part of Advances in Neural Information Processing Systems 36 (NeurIPS 2023) Main Conference Track

Bibtex Paper

Authors

Minsik Cho, Saurabh Adya, Devang Naik

Abstract

DNN pruning is a popular way to reduce the size of a model, improve the inferencelatency, and minimize the power consumption on DNN accelerators. However,existing approaches might be too complex, expensive or ineffective to apply toa variety of vision/language tasks, DNN architectures and to honor structuredpruning constraints. In this paper, we propose an efficient yet effective train-timepruning scheme, Parameter-free Differentiable Pruning (PDP), which offers state-of-the-art qualities in model size, accuracy, and training cost. PDP uses a dynamicfunction of weights during training to generate soft pruning masks for the weightsin a parameter-free manner for a given pruning target. While differentiable, thesimplicity and efficiency of PDP make it universal enough to deliver state-of-the-artrandom/structured/channel pruning results on various vision and natural languagetasks. For example, for MobileNet-v1, PDP can achieve 68.2% top-1 ImageNet1kaccuracy at 86.6% sparsity, which is 1.7% higher accuracy than those from thestate-of-the-art algorithms. Also, PDP yields over 83.1% accuracy on Multi-GenreNatural Language Inference with 90% sparsity for BERT, while the next best fromthe existing techniques shows 81.5% accuracy. In addition, PDP can be applied tostructured pruning, such as N:M pruning and channel pruning. For 1:4 structuredpruning of ResNet18, PDP improved the top-1 ImageNet1k accuracy by over 3.6%over the state-of-the-art. For channel pruning of ResNet50, PDP reduced the top-1ImageNet1k accuracy by 0.6% from the state-of-the-art.