Related papers: scatteR: Generating instance space based on scagno…

AutoSimulate: (Quickly) Learning Synthetic Data Generation

Simulation is increasingly being used for generating large labelled datasets in many machine learning problems. Recent methods have focused on adjusting simulator parameters with the goal of maximising accuracy on a validation task, usually…

Computer Vision and Pattern Recognition · Computer Science 2020-08-20 Harkirat Singh Behl , Atılım Güneş Baydin , Ran Gal , Philip H. S. Torr , Vibhav Vineet

Synthetic data enables faster annotation and robust segmentation for multi-object grasping in clutter

Object recognition and object pose estimation in robotic grasping continue to be significant challenges, since building a labelled dataset can be time consuming and financially costly in terms of data collection and annotation. In this…

Computer Vision and Pattern Recognition · Computer Science 2024-01-25 Dongmyoung Lee , Wei Chen , Nicolas Rojas

Cut-and-Splat: Leveraging Gaussian Splatting for Synthetic Data Generation

Generating synthetic images is a useful method for cheaply obtaining labeled data for training computer vision models. However, obtaining accurate 3D models of relevant objects is necessary, and the resulting images often have a gap in…

Computer Vision and Pattern Recognition · Computer Science 2025-04-14 Bram Vanherle , Brent Zoomers , Jeroen Put , Frank Van Reeth , Nick Michiels

Spatial Data Generators

This gem describes a standard method for generating synthetic spatial data that can be used in benchmarking and scalability tests. The goal is to improve the reproducibility and increase the trust in experiments on synthetic data by using…

Databases · Computer Science 2021-09-28 Tin Vu , Sara Migliorini , Ahmed Eldawy , Alberto Belussi

From few to many maps: A fast map-level emulator for extreme augmentation of CMB systematics datasets

We introduce a novel, fast, and efficient generative model built upon scattering covariances, the most recent iteration of the scattering transforms statistics. This model is designed to augment by several orders of magnitude the number of…

Cosmology and Nongalactic Astrophysics · Physics 2025-08-13 P. Campeti , J. -M. Delouis , L. Pagano , E. Allys , M. Lattanzi , M. Gerbino

Generative models of astrophysical fields with scattering transforms on the sphere

Scattering transforms are a new type of summary statistics recently developed for the study of highly non-Gaussian processes, which have been shown to be very promising for astrophysical studies. In particular, they allow one to build…

Instrumentation and Methods for Astrophysics · Physics 2024-11-22 Louise Mousset , Erwan Allys , Matthew A. Price , Jonathan Aumont , Jean-Marc Delouis , Ludovic Montier , Jason D. McEwen

Accelerating small-angle scattering experiments with simulation-based machine learning

Making material experiments more efficient is a high priority for materials scientists who seek to discover new materials with desirable properties. In this paper, we investigate how to optimize the laborious sequential measurements of…

Materials Science · Physics 2019-10-24 Takuya Kanazawa , Akinori Asahara , Hidekazu Morita

Dataset Distillation with Probabilistic Latent Features

As deep learning models grow in complexity and the volume of training data increases, reducing storage and computational costs becomes increasingly important. Dataset distillation addresses this challenge by synthesizing a compact set of…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Zhe Li , Sarah Cechnicka , Cheng Ouyang , Katharina Breininger , Peter Schüffler , Bernhard Kainz

SSCATeR: Sparse Scatter-Based Convolution Algorithm with Temporal Data Recycling for Real-Time 3D Object Detection in LiDAR Point Clouds

This work leverages the continuous sweeping motion of LiDAR scanning to concentrate object detection efforts on specific regions that receive a change in point data from one frame to another. We achieve this by using a sliding time window…

Computer Vision and Pattern Recognition · Computer Science 2026-01-30 Alexander Dow , Manduhu Manduhu , Matheus Santos , Ben Bartlett , Gerard Dooly , James Riordan

Difficulty Controlled Diffusion Model for Synthesizing Effective Training Data

Generative models have become a powerful tool for synthesizing training data in computer vision tasks. Current approaches solely focus on aligning generated images with the target dataset distribution. As a result, they capture only the…

Computer Vision and Pattern Recognition · Computer Science 2026-01-08 Zerun Wang , Jiafeng Mao , Xueting Wang , Toshihiko Yamasaki

Iterative Scene Graph Generation with Generative Transformers

Scene graphs provide a rich, structured representation of a scene by encoding the entities (objects) and their spatial relationships in a graphical format. This representation has proven useful in several tasks, such as question answering,…

Computer Vision and Pattern Recognition · Computer Science 2022-12-01 Sanjoy Kundu , Sathyanarayanan N. Aakur

Geometry-Based Data Generation

Many generative models attempt to replicate the density of their input data. However, this approach is often undesirable, since data density is highly affected by sampling biases, noise, and artifacts. We propose a method called SUGAR…

Machine Learning · Computer Science 2018-09-10 Ofir Lindenbaum , Jay S. Stanley , Guy Wolf , Smita Krishnaswamy

Stable Diffusion Dataset Generation for Downstream Classification Tasks

Recent advances in generative artificial intelligence have enabled the creation of high-quality synthetic data that closely mimics real-world data. This paper explores the adaptation of the Stable Diffusion 2.0 model for generating…

Machine Learning · Computer Science 2024-05-07 Eugenio Lomurno , Matteo D'Oria , Matteo Matteucci

CARTGen-IR: Synthetic Tabular Data Generation for Imbalanced Regression

Handling imbalanced target distributions in regression poses a persistent challenge, as the underrepresentation of relevant target values can significantly hinder model performance. Existing data-level solutions often adapt…

Machine Learning · Computer Science 2026-03-12 António Pedro Pinheiro , Rita P. Ribeiro

A new approach to observational cosmology using the scattering transform

Parameter estimation with non-Gaussian stochastic fields is a common challenge in astrophysics and cosmology. In this paper, we advocate performing this task using the scattering transform, a statistical tool sharing ideas with…

Cosmology and Nongalactic Astrophysics · Physics 2024-10-07 Sihao Cheng , Yuan-Sen Ting , Brice Ménard , Joan Bruna

A fully data-driven method for estimating the shape of a point cloud

Given a random sample of points from some unknown distribution, we propose a new data-driven method for estimating its probability support $S$. Under the mild assumption that $S$ is $r$-convex, the smallest $r$-convex set which contains the…

Statistics Theory · Mathematics 2014-12-01 Alberto Rodríguez-Casal , Paula Saavedra-Nieves

DTAMS: High-Capacity Generative Steganography via Dynamic Multi-Timestep Selection and Adaptive Deviation Mapping in Latent Diffusion

With the rapid development of AIGC technologies, generative image steganography has attracted increasing attention due to its high imperceptibility and flexibility. However, existing generative steganography methods often maintain…

Cryptography and Security · Computer Science 2026-02-03 Yuhao Xue , Jiuan Zhou , Yu Cheng , Zhaoxia Yin

Evaluation of Categorical Generative Models -- Bridging the Gap Between Real and Synthetic Data

The machine learning community has mainly relied on real data to benchmark algorithms as it provides compelling evidence of model applicability. Evaluation on synthetic datasets can be a powerful tool to provide a better understanding of a…

Machine Learning · Computer Science 2022-11-01 Florence Regol , Anja Kroon , Mark Coates

Improving the Generation and Evaluation of Synthetic Data for Downstream Medical Causal Inference

Causal inference is essential for developing and evaluating medical interventions, yet real-world medical datasets are often difficult to access due to regulatory barriers. This makes synthetic data a potentially valuable asset that enables…

Machine Learning · Computer Science 2025-10-22 Harry Amad , Zhaozhi Qian , Dennis Frauen , Julianna Piskorz , Stefan Feuerriegel , Mihaela van der Schaar

Synthetic Dataset Generation for Autonomous Mobile Robots Using 3D Gaussian Splatting for Vision Training

Annotated datasets are critical for training neural networks for object detection, yet their manual creation is time- and labour-intensive, subjective to human error, and often limited in diversity. This challenge is particularly pronounced…

Robotics · Computer Science 2025-06-06 Aneesh Deogan , Wout Beks , Peter Teurlings , Koen de Vos , Mark van den Brand , Rene van de Molengraft