Related papers: BinarySDG: binary sensor data generation with R

Self-Supervised Real-to-Sim Scene Generation

Synthetic data is emerging as a promising solution to the scalability issue of supervised deep learning, especially when real data are difficult to acquire or hard to annotate. Synthetic data generation, however, can itself be prohibitively…

Computer Vision and Pattern Recognition · Computer Science 2021-08-20 Aayush Prakash , Shoubhik Debnath , Jean-Francois Lafleche , Eric Cameracci , Gavriel State , Stan Birchfield , Marc T. Law

Efficient Embedding-based Synthetic Data Generation for Complex Reasoning Tasks

Synthetic Data Generation (SDG), leveraging Large Language Models (LLMs), has recently been recognized and broadly adopted as an effective approach to improve the performance of smaller but more resource and compute efficient LLMs through…

Machine Learning · Computer Science 2026-03-25 Srideepika Jayaraman , Achille Fokoue , Dhaval Patel , Jayant Kalagnanam

The use of Synthetic Data to solve the scalability and data availability problems in Smart City Digital Twins

The A.I. disruption and the need to compete on innovation are impacting cities that have an increasing necessity to become innovation hotspots. However, without proven solutions, experimentation, often unsuccessful, is needed. But…

Artificial Intelligence · Computer Science 2022-07-08 Esteve Almirall , Davide Callegaro , Peter Bruins , Mar Santamaría , Pablo Martínez , Ulises Cortés

Synthetic Data Generation for Bridging Sim2Real Gap in a Production Environment

Synthetic data is being used lately for training deep neural networks in computer vision applications such as object detection, object segmentation and 6D object pose estimation. Domain randomization hereby plays an important role in…

Computer Vision and Pattern Recognition · Computer Science 2024-05-13 Parth Rawal , Mrunal Sompura , Wolfgang Hintze

AutoSimulate: (Quickly) Learning Synthetic Data Generation

Simulation is increasingly being used for generating large labelled datasets in many machine learning problems. Recent methods have focused on adjusting simulator parameters with the goal of maximising accuracy on a validation task, usually…

Computer Vision and Pattern Recognition · Computer Science 2020-08-20 Harkirat Singh Behl , Atılım Güneş Baydin , Ran Gal , Philip H. S. Torr , Vibhav Vineet

Synthetic Data Generation using Benerator Tool

Datasets of different characteristics are needed by the research community for experimental purposes. However, real data may be difficult to obtain due to privacy concerns. Moreover, real data may not meet specific characteristics which are…

Databases · Computer Science 2013-11-15 Vanessa Ayala-Rivera , Patrick McDonagh , Thomas Cerqueus , Liam Murphy

Machine Learning for Synthetic Data Generation: A Review

Machine learning heavily relies on data, but real-world applications often encounter various data-related issues. These include data of poor quality, insufficient data points leading to under-fitting of machine learning models, and…

Machine Learning · Computer Science 2025-04-07 Yingzhou Lu , Lulu Chen , Yuanyuan Zhang , Minjie Shen , Huazheng Wang , Xiao Wang , Capucine van Rechem , Tianfan Fu , Wenqi Wei

Synthetic data enables faster annotation and robust segmentation for multi-object grasping in clutter

Object recognition and object pose estimation in robotic grasping continue to be significant challenges, since building a labelled dataset can be time consuming and financially costly in terms of data collection and annotation. In this…

Computer Vision and Pattern Recognition · Computer Science 2024-01-25 Dongmyoung Lee , Wei Chen , Nicolas Rojas

Synthetic Data for Social Good

Data for good implies unfettered access to data. But data owners must be conservative about how, when, and why they share data or risk violating the trust of the people they aim to help, losing their funding, or breaking the law. Data…

Computers and Society · Computer Science 2017-10-25 Bill Howe , Julia Stoyanovich , Haoyue Ping , Bernease Herman , Matt Gee

Virtual passengers for real car solutions: synthetic datasets

Strategies that include the generation of synthetic data are beginning to be viable as obtaining real data can be logistically complicated, very expensive or slow. Not only the capture of the data can lead to complications, but also its…

Computer Vision and Pattern Recognition · Computer Science 2022-05-16 Paola Natalia Canas , Juan Diego Ortega , Marcos Nieto , Oihana Otaegui

Semantic RGB-D Image Synthesis

Collecting diverse sets of training images for RGB-D semantic image segmentation is not always possible. In particular, when robots need to operate in privacy-sensitive areas like homes, the collection is often limited to a small set of…

Computer Vision and Pattern Recognition · Computer Science 2023-09-20 Shijie Li , Rong Li , Juergen Gall

Comprehensive Exploration of Synthetic Data Generation: A Survey

Recent years have witnessed a surge in the popularity of Machine Learning (ML), applied across diverse domains. However, progress is impeded by the scarcity of training data due to expensive acquisition and privacy legislation. Synthetic…

Machine Learning · Computer Science 2024-02-05 André Bauer , Simon Trapp , Michael Stenger , Robert Leppich , Samuel Kounev , Mark Leznik , Kyle Chard , Ian Foster

Domain-Transferred Synthetic Data Generation for Improving Monocular Depth Estimation

A major obstacle to the development of effective monocular depth estimation algorithms is the difficulty in obtaining high-quality depth data that corresponds to collected RGB images. Collecting this data is time-consuming and costly, and…

Computer Vision and Pattern Recognition · Computer Science 2024-05-03 Seungyeop Lee , Knut Peterson , Solmaz Arezoomandan , Bill Cai , Peihan Li , Lifeng Zhou , David Han

A Framework for Auditable Synthetic Data Generation

Synthetic data has gained significant momentum thanks to sophisticated machine learning tools that enable the synthesis of high-dimensional datasets. However, many generation techniques do not give the data controller control over what…

Cryptography and Security · Computer Science 2022-11-22 Florimond Houssiau , Samuel N. Cohen , Lukasz Szpruch , Owen Daniel , Michaela G. Lawrence , Robin Mitra , Henry Wilde , Callum Mole

SYNAuG: Exploiting Synthetic Data for Data Imbalance Problems

Data imbalance in training data often leads to biased predictions from trained models, which in turn causes ethical and social issues. A straightforward solution is to carefully curate training data, but given the enormous scale of modern…

Computer Vision and Pattern Recognition · Computer Science 2024-04-26 Moon Ye-Bin , Nam Hyeon-Woo , Wonseok Choi , Nayeong Kim , Suha Kwak , Tae-Hyun Oh

Driving Privacy Forward: Mitigating Information Leakage within Smart Vehicles through Synthetic Data Generation

Smart vehicles produce large amounts of data, much of which is sensitive and at risk of privacy breaches. As attackers increasingly exploit anonymised metadata within these datasets to profile drivers, it's important to find solutions that…

Cryptography and Security · Computer Science 2024-10-14 Krish Parikh

Reasoning-Driven Synthetic Data Generation and Evaluation

Although many AI applications of interest require specialized multi-modal models, relevant data to train such models is inherently scarce or inaccessible. Filling these gaps with human annotators is prohibitively expensive, error-prone, and…

Artificial Intelligence · Computer Science 2026-04-01 Tim R. Davidson , Benoit Seguin , Enrico Bacis , Cesar Ilharco , Hamza Harkous

LiBaGS: Lightweight Boundary Gap Synthesis for Targeted Synthetic Data Selection

Synthetic data is useful only when the added samples fill missing parts of the training distribution that matter for the downstream task. We introduce LiBaGS, a lightweight, generator-agnostic method for targeted synthetic training data…

Machine Learning · Computer Science 2026-05-14 Abhishek Moturu , Anna Goldenberg , Babak Taati

Generating Synthetic but Plausible Healthcare Record Datasets

Generating datasets that "look like" given real ones is an interesting tasks for healthcare applications of ML and many other fields of science and engineering. In this paper we propose a new method of general application to binary datasets…

Machine Learning · Statistics 2018-07-05 Laura Aviñó , Matteo Ruffini , Ricard Gavaldà

Synthetic Test Data Generation Using Recurrent Neural Networks: A Position Paper

Testing in production-like test environments is an essential part of quality assurance processes in many industries. Provisioning of such test environments, for information-intensive services, involves setting up databases that are…

Software Engineering · Computer Science 2024-07-09 Razieh Behjati , Erik Arisholm , Chao Tan , Margrethe M. Bedregal