English
Related papers

Related papers: Space Time Recurrent Memory Network

200 papers

Emerging world models autoregressively generate video frames in response to actions, such as camera movements and text prompts, among other control signals. Due to limited temporal context window sizes, these models often struggle to…

Computer Vision and Pattern Recognition · Computer Science 2025-06-06 Tong Wu , Shuai Yang , Ryan Po , Yinghao Xu , Ziwei Liu , Dahua Lin , Gordon Wetzstein

Transformers process images and videos by flattening space and time into long token sequences. While attention and KV caching preserve past features, their memory grows with sequence length and they lack an explicit, persistent spatial…

Computer Vision and Pattern Recognition · Computer Science 2026-05-28 Kabir Swain , Sijie Han , Daniel Karl I. Weidele , Mauro Martino , Antonio Torralba

Video Object Segmentation (VOS) is typically formulated in a semi-supervised setting. Given the ground-truth segmentation mask on the first frame, the task of VOS is to track and segment the single or multiple objects of interests in the…

Computer Vision and Pattern Recognition · Computer Science 2020-03-16 Kaihua Zhang , Long Wang , Dong Liu , Bo Liu , Qingshan Liu , Zhu Li

Recently, video object segmentation (VOS) networks typically use memory-based methods: for each query frame, the mask is predicted by space-time matching to memory frames. Despite these methods having superior performance, they suffer from…

Computer Vision and Pattern Recognition · Computer Science 2024-05-08 Yadang Chen , Wentao Zhu , Zhi-Xin Yang , Enhua Wu

Video diffusion models have recently shown promise for world modeling through autoregressive frame prediction conditioned on actions. However, they struggle to maintain long-term memory due to the high computational cost associated with…

Computer Vision and Pattern Recognition · Computer Science 2025-05-27 Ryan Po , Yotam Nitzan , Richard Zhang , Berlin Chen , Tri Dao , Eli Shechtman , Gordon Wetzstein , Xun Huang

This paper proposes a Robust and Efficient Memory Network, referred to as REMN, for studying semi-supervised video object segmentation (VOS). Memory-based methods have recently achieved outstanding VOS performance by performing non-local…

Computer Vision and Pattern Recognition · Computer Science 2023-04-25 Yadang Chen , Dingwei Zhang , Zhi-xin Yang , Enhua Wu

Real-time video analysis remains a challenging problem in computer vision, requiring efficient processing of both spatial and temporal information while maintaining computational efficiency. Existing approaches often struggle to balance…

Computer Vision and Pattern Recognition · Computer Science 2025-07-31 Shahla John

This paper presents a simple yet effective approach to modeling space-time correspondences in the context of video object segmentation. Unlike most existing approaches, we establish correspondences directly between frames without…

Computer Vision and Pattern Recognition · Computer Science 2021-10-11 Ho Kei Cheng , Yu-Wing Tai , Chi-Keung Tang

Two-stream Convolutional Networks (ConvNets) have shown strong performance for human action recognition in videos. Recently, Residual Networks (ResNets) have arisen as a new technique to train extremely deep architectures. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2016-11-08 Christoph Feichtenhofer , Axel Pinz , Richard P. Wildes

Understanding human motion from video is essential for a range of applications, including pose estimation, mesh recovery and action recognition. While state-of-the-art methods predominantly rely on transformer-based architectures, these…

Computer Vision and Pattern Recognition · Computer Science 2024-04-18 Arnab Kumar Mondal , Stefano Alletto , Denis Tome

We propose a novel solution for semi-supervised video object segmentation. By the nature of the problem, available cues (e.g. video frame(s) with object masks) become richer with the intermediate predictions. However, the existing methods…

Computer Vision and Pattern Recognition · Computer Science 2019-08-13 Seoung Wug Oh , Joon-Young Lee , Ning Xu , Seon Joo Kim

Transformer-based embedding models suffer from quadratic computational and linear memory complexity, limiting their utility for long sequences. We propose recurrent architectures as an efficient alternative, introducing a vertically chunked…

Computation and Language · Computer Science 2026-04-21 Tobias Grantner , Emanuel Sallinger , Martin Flechl

Transformers have become one of the dominant architectures in the field of computer vision. However, there are yet several challenges when applying such architectures to video data. Most notably, these models struggle to model the temporal…

Computer Vision and Pattern Recognition · Computer Science 2023-02-14 Gabriele Prato , Yale Song , Janarthanan Rajendran , R Devon Hjelm , Neel Joshi , Sarath Chandar

Video enhancement is a challenging problem, more than that of stills, mainly due to high computational cost, larger data volumes and the difficulty of achieving consistency in the spatio-temporal domain. In practice, these challenges are…

Image and Video Processing · Electrical Eng. & Systems 2022-12-13 Dario Fuoli , Zhiwu Huang , Danda Pani Paudel , Luc Van Gool , Radu Timofte

Matching-based networks have achieved state-of-the-art performance for video object segmentation (VOS) tasks by storing every-k frames in an external memory bank for future inference. Storing the intermediate frames' predictions provides…

Computer Vision and Pattern Recognition · Computer Science 2022-04-15 Ali Pourganjalikhan , Charalambos Poullis

Video super-resolution (VSR) aims to restore a sequence of high-resolution (HR) frames from their low-resolution (LR) counterparts. Although some progress has been made, there are grand challenges to effectively utilize temporal dependency…

Image and Video Processing · Electrical Eng. & Systems 2022-04-21 Chengxu Liu , Huan Yang , Jianlong Fu , Xueming Qian

Video prediction is commonly referred to as forecasting future frames of a video sequence provided several past frames thereof. It remains a challenging domain as visual scenes evolve according to complex underlying dynamics, such as the…

Computer Vision and Pattern Recognition · Computer Science 2021-05-12 Hafez Farazi , Jan Nogga , Sven Behnke

Transformers have reached remarkable success in sequence modeling. However, these models have efficiency issues as they need to store all the history token-level representations as memory. We present Memformer, an efficient neural network…

Computation and Language · Computer Science 2022-04-14 Qingyang Wu , Zhenzhong Lan , Kun Qian , Jing Gu , Alborz Geramifard , Zhou Yu

Video restoration is a low-level vision task that seeks to restore clean, sharp videos from quality-degraded frames. One would use the temporal information from adjacent frames to make video restoration successful. Recently, the success of…

Computer Vision and Pattern Recognition · Computer Science 2023-12-25 Fu-Jen Tsai , Yan-Tsung Peng , Chen-Yu Chang , Chan-Yu Li , Yen-Yu Lin , Chung-Chi Tsai , Chia-Wen Lin

Spatial reasoning is a critical capability for intelligent robots, yet current vision-language models (VLMs) still fall short of human-level performance in video-based spatial reasoning. This gap mainly stems from two challenges: a…

Computer Vision and Pattern Recognition · Computer Science 2025-11-26 Zuntao Liu , Yi Du , Taimeng Fu , Shaoshu Su , Cherie Ho , Chen Wang
‹ Prev 1 2 3 10 Next ›