Recent advanced diffusion methods typically derive strong generative priors by scaling diffusion transformers. However, scaling fails to generalize when adapted for real-time compression scenarios that demand lightweight models. In this paper, we explore the design of real-time and lightweight diffusion codecs by addressing two pivotal questions. First, does diffusion pre-training benefit lightweight diffusion codecs? Through systematic analysis, we find that generation-oriented pre-training is less effective at small model scales whereas compression-oriented pre-training yields consistently better performance. Second, are transformers essential? We find that while global attention is crucial for standard generation, lightweight convolutions suffice for compression-oriented diffusion when paired with distillation. Guided by these findings, we establish a one-step lightweight convolution diffusion codec that achieves real-time 60~FPS encoding and 42~FPS decoding at 1080p. Further enhanced by distillation and adversarial learning, the proposed codec reduces bitrate by 85\% at a comparable FID to MS-ILLM, bridging the gap between generative compression and practical real-time deployment. Codes are released at https://github.com/microsoft/GenCodec/tree/main/CoD_Lite
@article{arxiv.2604.12525,
title = {CoD-Lite: Real-Time Diffusion-Based Generative Image Compression},
author = {Zhaoyang Jia and Naifu Xue and Zihan Zheng and Jiahao Li and Bin Li and Xiaoyi Zhang and Zongyu Guo and Yuan Zhang and Houqiang Li and Yan Lu},
journal= {arXiv preprint arXiv:2604.12525},
year = {2026}
}