English

Rethinking Video Super-Resolution: Towards Diffusion-Based Methods without Motion Alignment

Computer Vision and Pattern Recognition 2025-11-05 v5 Machine Learning Image and Video Processing

Abstract

In this work, we rethink the approach to video super-resolution by introducing a method based on the Diffusion Posterior Sampling framework, combined with an unconditional video diffusion transformer operating in latent space. The video generation model, a diffusion transformer, functions as a space-time model. We argue that a powerful model, which learns the physics of the real world, can easily handle various kinds of motion patterns as prior knowledge, thus eliminating the need for explicit estimation of optical flows or motion parameters for pixel alignment. Furthermore, a single instance of the proposed video diffusion transformer model can adapt to different sampling conditions without re-training. Empirical results on synthetic and real-world datasets illustrate the feasibility of diffusion-based, alignment-free video super-resolution.

Keywords

Cite

@article{arxiv.2503.03355,
  title  = {Rethinking Video Super-Resolution: Towards Diffusion-Based Methods without Motion Alignment},
  author = {Zhihao Zhan and Wang Pang and Xiang Zhu and Yechao Bai},
  journal= {arXiv preprint arXiv:2503.03355},
  year   = {2025}
}

Comments

ICSPS 2025

R2 v1 2026-06-28T22:07:36.245Z