English
Related papers

Related papers: Diffusion Action Segmentation

200 papers

Recognizing human actions from untrimmed videos is an important task in activity understanding, and poses unique challenges in modeling long-range temporal relations. Recent works adopt a predict-and-refine strategy which converts an…

Computer Vision and Pattern Recognition · Computer Science 2023-02-28 Zhichao Liu , Leshan Wang , Desen Zhou , Jian Wang , Songyang Zhang , Yang Bai , Errui Ding , Rui Fan

Anticipating future actions is inherently uncertain. Given an observed video segment containing ongoing actions, multiple subsequent actions can plausibly follow. This uncertainty becomes even larger when predicting far into the future.…

Computer Vision and Pattern Recognition · Computer Science 2023-11-28 Zeyun Zhong , Chengzhi Wu , Manuel Martin , Michael Voit , Juergen Gall , Jürgen Beyerer

We present a framework for video modeling based on denoising diffusion probabilistic models that produces long-duration video completions in a variety of realistic environments. We introduce a generative model that can at test-time sample…

Computer Vision and Pattern Recognition · Computer Science 2022-12-19 William Harvey , Saeid Naderiparizi , Vaden Masrani , Christian Weilbach , Frank Wood

Temporal action segmentation and long-term action anticipation are two popular vision tasks for the temporal analysis of actions in videos. Despite apparent relevance and potential complementarity, these two problems have been investigated…

Computer Vision and Pattern Recognition · Computer Science 2024-12-06 Dayoung Gong , Suha Kwak , Minsu Cho

We present ActionDiffusion -- a novel diffusion model for procedure planning in instructional videos that is the first to take temporal inter-dependencies between actions into account in a diffusion model for procedure planning. This…

Computer Vision and Pattern Recognition · Computer Science 2024-07-23 Lei Shi , Paul Bürkner , Andreas Bulling

We propose a new formulation of temporal action detection (TAD) with denoising diffusion, DiffTAD in short. Taking as input random temporal proposals, it can yield action proposals accurately given an untrimmed long video. This presents a…

Computer Vision and Pattern Recognition · Computer Science 2023-07-17 Sauradip Nag , Xiatian Zhu , Jiankang Deng , Yi-Zhe Song , Tao Xiang

Temporal action segmentation is a critical task in video understanding, where the goal is to assign action labels to each frame in a video. While recent advances leverage iterative refinement-based strategies, they fail to explicitly…

Computer Vision and Pattern Recognition · Computer Science 2026-01-06 Arjun Ramesh Kaushik , Nalini K. Ratha , Venu Govindaraju

Generating temporally coherent high fidelity video is an important milestone in generative modeling research. We make progress towards this milestone by proposing a diffusion model for video generation that shows very promising initial…

Computer Vision and Pattern Recognition · Computer Science 2022-06-24 Jonathan Ho , Tim Salimans , Alexey Gritsenko , William Chan , Mohammad Norouzi , David J. Fleet

Recent temporal action segmentation approaches need frame annotations during training to be effective. These annotations are very expensive and time-consuming to obtain. This limits their performances when only limited annotated data is…

Computer Vision and Pattern Recognition · Computer Science 2022-11-04 Sovan Biswas , Anthony Rhodes , Ramesh Manuvinakurike , Giuseppe Raffa , Richard Beckwith

Video grounding aims to localize the target moment in an untrimmed video corresponding to a given sentence query. Existing methods typically select the best prediction from a set of predefined proposals or directly regress the target span…

Computer Vision and Pattern Recognition · Computer Science 2024-01-01 Xiao Liang , Tao Shi , Yaoyuan Liang , Te Tao , Shao-Lun Huang

The temporal segmentation of events is an essential task and a precursor for the automatic recognition of human actions in the video. Several attempts have been made to capture frame-level salient aspects through attention but they lack the…

Computer Vision and Pattern Recognition · Computer Science 2020-05-08 Harshala Gammulle , Simon Denman , Sridha Sridharan , Clinton Fookes

Learning a generalist embodied agent capable of completing multiple tasks poses challenges, primarily stemming from the scarcity of action-labeled robotic datasets. In contrast, a vast amount of human videos exist, capturing intricate tasks…

Machine Learning · Computer Science 2024-10-10 Haoran He , Chenjia Bai , Ling Pan , Weinan Zhang , Bin Zhao , Xuelong Li

The evolution of semantic segmentation has long been dominated by learning more discriminative image representations for classifying each pixel. Despite the prominent advancements, the priors of segmentation masks themselves, e.g.,…

Computer Vision and Pattern Recognition · Computer Science 2023-06-23 Zeqiang Lai , Yuchen Duan , Jifeng Dai , Ziheng Li , Ying Fu , Hongsheng Li , Yu Qiao , Wenhai Wang

In recent years, video generation has seen significant advancements. However, challenges still persist in generating complex motions and interactions. To address these challenges, we introduce ReVision, a plug-and-play framework that…

Computer Vision and Pattern Recognition · Computer Science 2026-01-12 Qihao Liu , Ju He , Qihang Yu , Liang-Chieh Chen , Alan Yuille

Diffusion models have recently achieved great success in the synthesis of high-quality images and videos. However, the existing denoising techniques in diffusion models are commonly based on step-by-step noise predictions, which suffers…

Computer Vision and Pattern Recognition · Computer Science 2024-10-15 Hancheng Ye , Jiakang Yuan , Renqiu Xia , Xiangchao Yan , Tao Chen , Junchi Yan , Botian Shi , Bo Zhang

Diffusion models, as a type of generative model, have achieved impressive results in generating images and videos conditioned on textual conditions. However, the generation process of diffusion models involves denoising dozens of steps to…

Computer Vision and Pattern Recognition · Computer Science 2024-12-31 Hui Zhang , Zuxuan Wu , Zhen Xing , Jie Shao , Yu-Gang Jiang

We propose a novel inference technique based on a pretrained diffusion model for text-conditional video generation. Our approach, called FIFO-Diffusion, is conceptually capable of generating infinitely long videos without additional…

Computer Vision and Pattern Recognition · Computer Science 2024-11-05 Jihwan Kim , Junoh Kang , Jinyoung Choi , Bohyung Han

Perceptual studies demonstrate that conditional diffusion models excel at reconstructing video content aligned with human visual perception. Building on this insight, we propose a video compression framework that leverages conditional…

Computer Vision and Pattern Recognition · Computer Science 2025-09-26 Fangqiu Yi , Jingyu Xu , Jiawei Shao , Chi Zhang , Xuelong Li

Video summarization is a task of shortening a video by choosing a subset of frames while preserving its essential moments. Despite the innate subjectivity of the task, previous works have deterministically regressed to an averaged frame…

Machine Learning · Computer Science 2025-10-10 Kwanseok Kim , Jaehoon Hahm , Sumin Kim , Jinhwan Sul , Byunghak Kim , Joonseok Lee

Video moment retrieval and highlight detection have received attention in the current era of video content proliferation, aiming to localize moments and estimate clip relevances based on user-specific queries. Given that the video content…

Computer Vision and Pattern Recognition · Computer Science 2024-03-05 Henghao Zhao , Kevin Qinghong Lin , Rui Yan , Zechao Li
‹ Prev 1 2 3 10 Next ›