Ctrl-Z Sampling: Scaling Diffusion Sampling with Controlled Random Zigzag Explorations
Abstract
Diffusion models generate conditional samples by progressively denoising Gaussian noise, yet the denoising trajectory can stall at visually plausible but low-quality outcomes with conditional misalignment or structural artifacts. We interpret this behavior as local optima in a surrogate quality landscape: Once early denoising commits to a suboptimal global structure, later steps mainly sharpen details and seldom correct the underlying mistake. While existing inference-time approaches explore alternative diffusion states via re-noising with fixed strength or direction, they exhibit limited capacity to escape steep quality plateaus. We propose Controlled Random Zigzag Sampling (Ctrl-Z Sampling),a scalable sampling strategy that detects plateaus in quality landscape via a surrogate score, and allocates exploration only when a plateau is detected. Upon detection, Ctrl-Z Sampling rolls back to noisier states, samples a set of alternative continuations, and updates the trajectory when a candidate improves the score, otherwise escalating the exploration depth to escape the current plateau. The proposed method is model-agnostic and broadly compatible with existing diffusion frameworks. Experiments show that Ctrl-Z Sampling consistently improves generation quality over other inference-time scaling samplers across different NFE budgets, offering a scalable compute-quality trade-off.
Cite
@article{arxiv.2506.20294,
title = {Ctrl-Z Sampling: Scaling Diffusion Sampling with Controlled Random Zigzag Explorations},
author = {Shunqi Mao and Wei Guo and Chaoyi Zhang and Jieting Long and Ke Xie and Weidong Cai},
journal= {arXiv preprint arXiv:2506.20294},
year = {2026}
}
Comments
43 pages, 12 figures, 10 tables