Image Editing As Programs with Diffusion Models

Yujia Hu, Songhua Liu, Zhenxiong Tan, Xingyi Yang, Xinchao Wang

Advances in Neural Information Processing Systems 38 (NeurIPS 2025) Main Conference Track

While diffusion models have achieved remarkable success in text-to-image generation, they encounter significant challenges with instruction-driven image editing. Our research highlights a key challenge: these models particularly struggle with structurally-inconsistent edits that involve substantial layout changes. To address this gap, we introduce Image Editing As Programs (IEAP), a unified image editing framework built upon the Diffusion Transformer (DiT) architecture. Specifically, IEAP deals with complex instructions by decomposing them into a sequence of programmable atomic operations. Each atomic operation manages a specific type of structurally consistent edit; when sequentially combined, IEAP enables the execution of arbitrary and structurally-inconsistent transformations. This reductionist approach enables IEAP to robustly handle a wide spectrum of edits, encompassing both structurally-consistent and inconsistent changes. Extensive experiments demonstrate that IEAP significantly outperforms state-of-the-art methods on standard benchmarks across various editing scenarios. In these evaluations, our framework delivers superior accuracy and semantic fidelity, particularly for complex, multi-step instructions. Codes are available at https://github.com/YujiaHu1109/IEAP.