Related papers: Generative Spatiotemporal Data Augmentation

Learning Temporally Invariant and Localizable Features via Data Augmentation for Video Recognition

Deep-Learning-based video recognition has shown promising improvements along with the development of large-scale datasets and spatiotemporal network architectures. In image recognition, learning spatially invariant features is a key factor…

Computer Vision and Pattern Recognition · Computer Science 2020-08-14 Taeoh Kim , Hyeongmin Lee , MyeongAh Cho , Ho Seong Lee , Dong Heon Cho , Sangyoun Lee

A Simple Background Augmentation Method for Object Detection with Diffusion Model

In computer vision, it is well-known that a lack of data diversity will impair model performance. In this study, we address the challenges of enhancing the dataset diversity problem in order to benefit various downstream tasks such as…

Computer Vision and Pattern Recognition · Computer Science 2024-08-02 Yuhang Li , Xin Dong , Chen Chen , Weiming Zhuang , Lingjuan Lyu

Spatio-temporal Data Augmentation for Visual Surveillance

Visual surveillance aims to stably detect a foreground object using a continuous image acquired from a fixed camera. Recent deep learning methods based on supervised learning show superior performance compared to classical background…

Computer Vision and Pattern Recognition · Computer Science 2021-02-16 Jae-Yeul Kim , Jong-Eun Ha

3D-VirtFusion: Synthetic 3D Data Augmentation through Generative Diffusion Models and Controllable Editing

Data augmentation plays a crucial role in deep learning, enhancing the generalization and robustness of learning-based models. Standard approaches involve simple transformations like rotations and flips for generating extra data. However,…

Computer Vision and Pattern Recognition · Computer Science 2024-08-27 Shichao Dong , Ze Yang , Guosheng Lin

Generative Temporal Models with Spatial Memory for Partially Observed Environments

In model-based reinforcement learning, generative and temporal models of environments can be leveraged to boost agent performance, either by tuning the agent's representations during training or via use as part of an explicit planning…

Machine Learning · Statistics 2018-07-20 Marco Fraccaro , Danilo Jimenez Rezende , Yori Zwols , Alexander Pritzel , S. M. Ali Eslami , Fabio Viola

Extending Temporal Data Augmentation for Video Action Recognition

Pixel space augmentation has grown in popularity in many Deep Learning areas, due to its effectiveness, simplicity, and low computational cost. Data augmentation for videos, however, still remains an under-explored research topic, as most…

Computer Vision and Pattern Recognition · Computer Science 2022-11-10 Artjoms Gorpincenko , Michal Mackiewicz

Exploring Temporally Dynamic Data Augmentation for Video Recognition

Data augmentation has recently emerged as an essential component of modern training recipes for visual recognition tasks. However, data augmentation for video recognition has been rarely explored despite its effectiveness. Few existing…

Computer Vision and Pattern Recognition · Computer Science 2022-07-01 Taeoh Kim , Jinhyung Kim , Minho Shim , Sangdoo Yun , Myunggu Kang , Dongyoon Wee , Sangyoun Lee

Cross-Modal Generative Augmentation for Visual Question Answering

Data augmentation has been shown to effectively improve the performance of multimodal machine learning models. This paper introduces a generative model for data augmentation by leveraging the correlations among multiple modalities.…

Computer Vision and Pattern Recognition · Computer Science 2021-10-26 Zixu Wang , Yishu Miao , Lucia Specia

Video Diffusion Models

Generating temporally coherent high fidelity video is an important milestone in generative modeling research. We make progress towards this milestone by proposing a diffusion model for video generation that shows very promising initial…

Computer Vision and Pattern Recognition · Computer Science 2022-06-24 Jonathan Ho , Tim Salimans , Alexey Gritsenko , William Chan , Mohammad Norouzi , David J. Fleet

Learning Representational Invariances for Data-Efficient Action Recognition

Data augmentation is a ubiquitous technique for improving image classification when labeled data is scarce. Constraining the model predictions to be invariant to diverse data augmentations effectively injects the desired representational…

Computer Vision and Pattern Recognition · Computer Science 2022-11-21 Yuliang Zou , Jinwoo Choi , Qitong Wang , Jia-Bin Huang

Generative Latent Diffusion for Efficient Spatiotemporal Data Reduction

Generative models have demonstrated strong performance in conditional settings and can be viewed as a form of data compression, where the condition serves as a compact representation. However, their limited controllability and…

Machine Learning · Computer Science 2025-07-04 Xiao Li , Liangji Zhu , Anand Rangarajan , Sanjay Ranka

Self-Paced Video Data Augmentation with Dynamic Images Generated by Generative Adversarial Networks

There is an urgent need for an effective video classification method by means of a small number of samples. The deficiency of samples could be effectively alleviated by generating samples through Generative Adversarial Networks (GAN), but…

Computer Vision and Pattern Recognition · Computer Science 2019-10-01 Yumeng Zhang , Gaoguo Jia , Li Chen , Mingrui Zhang , Junhai Yong

Semi-Supervised and Task-Driven Data Augmentation

Supervised deep learning methods for segmentation require large amounts of labelled training data, without which they are prone to overfitting, not generalizing well to unseen images. In practice, obtaining a large number of annotations…

Computer Vision and Pattern Recognition · Computer Science 2019-03-01 Krishna Chaitanya , Neerav Karani , Christian Baumgartner , Olivio Donati , Anton Becker , Ender Konukoglu

Dream4D: Lifting Camera-Controlled I2V towards Spatiotemporally Consistent 4D Generation

The synthesis of spatiotemporally coherent 4D content presents fundamental challenges in computer vision, requiring simultaneous modeling of high-fidelity spatial representations and physically plausible temporal dynamics. Current…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Xiaoyan Liu , Kangrui Li , Yuehao Song , Jiaxin Liu

Generative augmentations for improved cardiac ultrasound segmentation using diffusion models

One of the main challenges in current research on segmentation in cardiac ultrasound is the lack of large and varied labeled datasets and the differences in annotation conventions between datasets. This makes it difficult to design robust…

Image and Video Processing · Electrical Eng. & Systems 2025-02-28 Gilles Van De Vyver , Aksel Try Lenz , Erik Smistad , Sindre Hellum Olaisen , Bjørnar Grenne , Espen Holte , Håavard Dalen , Lasse Løvstakken

Generative Data Augmentation for Vehicle Detection in Aerial Images

Scarcity of training data is one of the prominent problems for deep networks which require large amounts data. Data augmentation is a widely used method to increase the number of training samples and their variations. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2020-12-10 Hilmi Kumdakcı , Cihan Öngün , Alptekin Temizel

Recurrent Mixture Density Network for Spatiotemporal Visual Attention

In many computer vision tasks, the relevant information to solve the problem at hand is mixed to irrelevant, distracting information. This has motivated researchers to design attentional models that can dynamically focus on parts of images…

Computer Vision and Pattern Recognition · Computer Science 2017-02-14 Loris Bazzani , Hugo Larochelle , Lorenzo Torresani

GAUDA: Generative Adaptive Uncertainty-guided Diffusion-based Augmentation for Surgical Segmentation

Augmentation by generative modelling yields a promising alternative to the accumulation of surgical data, where ethical, organisational and regulatory aspects must be considered. Yet, the joint synthesis of (image, mask) pairs for…

Computer Vision and Pattern Recognition · Computer Science 2025-07-02 Yannik Frisch , Christina Bornberg , Moritz Fuchs , Anirban Mukhopadhyay

Spatiotemporal Object Detection for Improved Aerial Vehicle Detection in Traffic Monitoring

This work presents advancements in multi-class vehicle detection using UAV cameras through the development of spatiotemporal object detection models. The study introduces a Spatio-Temporal Vehicle Detection Dataset (STVD) containing 6, 600…

Computer Vision and Pattern Recognition · Computer Science 2024-10-18 Kristina Telegraph , Christos Kyrkou

VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models

This paper presents a novel method for building scalable 3D generative models utilizing pre-trained video diffusion models. The primary obstacle in developing foundation 3D generative models is the limited availability of 3D data. Unlike…

Computer Vision and Pattern Recognition · Computer Science 2024-07-22 Junlin Han , Filippos Kokkinos , Philip Torr