Efficient and Adaptive Diffusion Model Inference Through Lookup Table on Mobile Devices
- Qipeng Wang ,
- Shiqi Jiang ,
- Yifan Yang ,
- Ruiqi Liu ,
- Yuanchun Li ,
- Ting Cao ,
- Xuanzhe Liu
IEEE Transactions on Mobile Computing | , pp. 1-18
Diffusion models have revolutionized image synthesis applications. Many studies focus on using approximate computation such as model quantization to reduce inference costs on mobile devices. However, due to their extensive model parameters and autoregressive inference fashion, the overhead of diffusion models remains high, which is challenging for mobile devices to handle. To reduce the inference overhead of diffusion models on mobile devices, we propose LUT-Diff, an algorithm-system co-design specifically tailored for mobile device diffusion model inference optimization. LUT-Diff optimizes using lookup tables and can efficiently generate a series of lookup table candidates for diffusion models without end-to-end training. During inference, LUT-Diff adaptively selects the best inference strategy based on the application/user’s latency budget. Additionally, LUT-Diff includes a parallel inference engine that rapidly completes model inference through CPU-GPU co-scheduling. Extensive experiments demonstrate that LUT-Diff can generate images comparable to the original model, with an up to 0.012 MSE in generated images. LUT-Diff can also achieve up to 9.1× inference acceleration and reduce the inference memory footprint by up to 70.9% compared to baseline methods. Moreover, LUT-Diff can save at least 3281× the learning cost of lookup tables.