ALTER: All-in-One Layer Pruning and Temporal Expert Routing for Efficient Diffusion Generation

Xiaomeng Yang, LEI LU, Qihui Fan, Changdi Yang, Juyi Lin, Yanzhi Wang, Xuan Zhang, Shangqian Gao

Advances in Neural Information Processing Systems 38 (NeurIPS 2025) Main Conference Track

Diffusion models have demonstrated exceptional capabilities in generating high-fidelity images. However, their iterative denoising process results in significant computational overhead during inference, limiting their practical deployment in resource-constrained environments. Existing acceleration methods often adopt uniform strategies that fail to capture the temporal variations during diffusion generation, while the commonly adopted sequential $\textit{pruning-then-fine-tuning strategy}$ suffers from sub-optimality due to the misalignment between pruning decisions made on pretrained weights and the model’s final parameters. To address these limitations, we introduce $\textbf{ALTER}$: $\textbf{A}$ll-in-One $\textbf{L}$ayer Pruning and $\textbf{T}$emporal $\textbf{E}$xpoert $\textbf{R}$outing, a unified framework that transforms diffusion models into a mixture of efficient temporal experts. ALTER achieves a single-stage optimization that unifies layer pruning, expert routing, and model fine-tuning by employing a trainable hypernetwork, which dynamically generates layer pruning decisions and manages timestep routing to specialized, pruned expert sub-networks throughout the ongoing fine-tuning of the UNet. This unified co-optimization strategy enables significant efficiency gains while preserving high generative quality. Specifically, ALTER achieves same-level visual fidelity to the original 50-step Stable Diffusion v2.1 model while utilizing only 25.9\% of its total MACs with just 20 inference steps and delivering a 3.64$\times$ speedup through 35\% sparsity.