Related papers: scatteR: Generating instance space based on scagno…
Simulation is increasingly being used for generating large labelled datasets in many machine learning problems. Recent methods have focused on adjusting simulator parameters with the goal of maximising accuracy on a validation task, usually…
Object recognition and object pose estimation in robotic grasping continue to be significant challenges, since building a labelled dataset can be time consuming and financially costly in terms of data collection and annotation. In this…
Generating synthetic images is a useful method for cheaply obtaining labeled data for training computer vision models. However, obtaining accurate 3D models of relevant objects is necessary, and the resulting images often have a gap in…
This gem describes a standard method for generating synthetic spatial data that can be used in benchmarking and scalability tests. The goal is to improve the reproducibility and increase the trust in experiments on synthetic data by using…
We introduce a novel, fast, and efficient generative model built upon scattering covariances, the most recent iteration of the scattering transforms statistics. This model is designed to augment by several orders of magnitude the number of…
Scattering transforms are a new type of summary statistics recently developed for the study of highly non-Gaussian processes, which have been shown to be very promising for astrophysical studies. In particular, they allow one to build…
Making material experiments more efficient is a high priority for materials scientists who seek to discover new materials with desirable properties. In this paper, we investigate how to optimize the laborious sequential measurements of…
As deep learning models grow in complexity and the volume of training data increases, reducing storage and computational costs becomes increasingly important. Dataset distillation addresses this challenge by synthesizing a compact set of…
This work leverages the continuous sweeping motion of LiDAR scanning to concentrate object detection efforts on specific regions that receive a change in point data from one frame to another. We achieve this by using a sliding time window…
Generative models have become a powerful tool for synthesizing training data in computer vision tasks. Current approaches solely focus on aligning generated images with the target dataset distribution. As a result, they capture only the…
Scene graphs provide a rich, structured representation of a scene by encoding the entities (objects) and their spatial relationships in a graphical format. This representation has proven useful in several tasks, such as question answering,…
Many generative models attempt to replicate the density of their input data. However, this approach is often undesirable, since data density is highly affected by sampling biases, noise, and artifacts. We propose a method called SUGAR…
Recent advances in generative artificial intelligence have enabled the creation of high-quality synthetic data that closely mimics real-world data. This paper explores the adaptation of the Stable Diffusion 2.0 model for generating…
Handling imbalanced target distributions in regression poses a persistent challenge, as the underrepresentation of relevant target values can significantly hinder model performance. Existing data-level solutions often adapt…
Parameter estimation with non-Gaussian stochastic fields is a common challenge in astrophysics and cosmology. In this paper, we advocate performing this task using the scattering transform, a statistical tool sharing ideas with…
Given a random sample of points from some unknown distribution, we propose a new data-driven method for estimating its probability support $S$. Under the mild assumption that $S$ is $r$-convex, the smallest $r$-convex set which contains the…
With the rapid development of AIGC technologies, generative image steganography has attracted increasing attention due to its high imperceptibility and flexibility. However, existing generative steganography methods often maintain…
The machine learning community has mainly relied on real data to benchmark algorithms as it provides compelling evidence of model applicability. Evaluation on synthetic datasets can be a powerful tool to provide a better understanding of a…
Causal inference is essential for developing and evaluating medical interventions, yet real-world medical datasets are often difficult to access due to regulatory barriers. This makes synthetic data a potentially valuable asset that enables…
Annotated datasets are critical for training neural networks for object detection, yet their manual creation is time- and labour-intensive, subjective to human error, and often limited in diversity. This challenge is particularly pronounced…