English
Related papers

Related papers: High-Quality Tabular Data Generation using Post-Se…

200 papers

The rising use of machine learning in various fields requires robust methods to create synthetic tabular data. Data should preserve key characteristics while addressing data scarcity challenges. Current approaches based on Generative…

Machine Learning · Computer Science 2024-11-15 Patricia A. Apellániz , Juan Parras , Santiago Zazo

High-quality training data is critical to the performance of machine learning models, particularly Large Language Models (LLMs). However, obtaining real, high-quality data can be challenging, especially for smaller organizations and…

Machine Learning · Computer Science 2025-06-24 Cristian Del Gobbo

Tabular data is one of the most prevalent and important data formats in real-world applications such as healthcare, finance, and education. However, its effective use in machine learning is often constrained by data scarcity, privacy…

Machine Learning · Computer Science 2025-07-18 Ruxue Shi , Yili Wang , Mengnan Du , Xu Shen , Yi Chang , Xin Wang

As privacy regulations become more stringent and access to real-world data becomes increasingly constrained, synthetic data generation has emerged as a vital solution, especially for tabular datasets, which are central to domains like…

Machine Learning · Computer Science 2025-07-17 Raju Challagundla , Mohsen Dorodchi , Pu Wang , Minwoo Lee

Recent advances in tabular data generation have greatly enhanced synthetic data quality. However, extending diffusion models to tabular data is challenging due to the intricately varied distributions and a blend of data types of tabular…

Tabular data synthesis is an emerging approach to circumvent strict regulations on data privacy while discovering knowledge through big data. Although state-of-the-art AI-based tabular data synthesizers, e.g., table-GAN, CTGAN, TVAE, and…

Machine Learning · Computer Science 2022-11-18 Yujin Zhu , Zilong Zhao , Robert Birke , Lydia Y. Chen

With the growing demand for synthetic data to address contemporary issues in machine learning, such as data scarcity, data fairness, and data privacy, having robust tools for assessing the utility and potential privacy risks of such data…

Machine Learning · Computer Science 2024-12-05 Anton Danholt Lautrup , Tobias Hyrup , Arthur Zimek , Peter Schneider-Kamp

In an era of rapidly advancing data-driven applications, there is a growing demand for data in both research and practice. Synthetic data have emerged as an alternative when no real data is available (e.g., due to privacy regulations).…

Artificial Intelligence · Computer Science 2024-06-03 Maria F. Davila R. , Sven Groen , Fabian Panse , Wolfram Wingerath

Tabular data synthesis is a long-standing research topic in machine learning. Many different methods have been proposed over the past decades, ranging from statistical methods to deep generative methods. However, it has not always been…

Machine Learning · Computer Science 2023-05-30 Jayoung Kim , Chaejeong Lee , Noseong Park

The generation of synthetic data is a state-of-the-art approach to leverage when access to real data is limited or privacy regulations limit the usability of sensitive data. A fair amount of research has been conducted on synthetic data…

Machine Learning · Computer Science 2024-11-12 Wilhelm Ågren , Victorio Úbeda Sosa

Synthetic data generation has become essential for securely sharing and analyzing sensitive data sets. Traditional anonymization techniques, however, often fail to adequately preserve privacy. We introduce the Tabular Auto-Regressive…

Machine Learning · Computer Science 2025-08-12 Andrey Sidorenko , Paul Tiwald

As E-commerce platforms face surging transactions during major shopping events like Black Friday, stress testing with synthesized data is crucial for resource planning. Most recent studies use Generative Adversarial Networks (GANs) to…

Machine Learning · Computer Science 2025-03-03 Youran Zhou , Jianzhong Qi

Synthetic data is often positioned as a solution to replace sensitive fixed-size datasets with a source of unlimited matching data, freed from privacy concerns. There has been much progress in synthetic data generation over the last decade,…

Machine Learning · Computer Science 2025-06-09 Graham Cormode , Samuel Maddock , Enayat Ullah , Shripad Gade

The increasing adoption of synthetic data in aviation research offers a promising solution to data scarcity and confidentiality challenges. This study investigates the potential of generative models to produce realistic synthetic flight…

Machine Learning · Computer Science 2026-04-24 Karim Aly , Alexei Sharpanskykh

Synthetic data has a key role to play in data sharing by statistical agencies and other generators of statistical data products. Generative Adversarial Networks (GANs), typically applied to image synthesis, are also a promising method for…

Machine Learning · Computer Science 2024-04-17 Nian Ran , Bahrul Ilmi Nasution , Claire Little , Richard Allmendinger , Mark Elliot

Recent advances in generative AI offer promising solutions for synthetic data generation but often rely on large datasets for effective training. To address this limitation, we propose a novel generative model that learns from limited data…

Machine Learning · Statistics 2025-05-27 Michail Spitieris , Massimiliano Ruocco , Abdulmajid Murad , Alessandro Nocente

This article provides a comprehensive synthesis of the recent developments in synthetic data generation via deep generative models, focusing on tabular datasets. We specifically outline the importance of synthetic data generation in the…

Machine Learning · Computer Science 2023-08-29 Conor Hassan , Robert Salomone , Kerrie Mengersen

Synthetic data generation is of great interest in diverse applications, such as for privacy protection. Deep generative models, such as variational autoencoders (VAEs), are a popular approach for creating such synthetic datasets from…

Machine Learning · Statistics 2021-05-17 Kiana Farhadyar , Federico Bonofiglio , Daniela Zoeller , Harald Binder

While data sharing is crucial for knowledge development, privacy concerns and strict regulation (e.g., European General Data Protection Regulation (GDPR)) unfortunately limits its full effectiveness. Synthetic tabular data emerges as an…

Machine Learning · Computer Science 2021-08-24 Aditya Kunar

Synthetic data serves as an alternative in training machine learning models, particularly when real-world data is limited or inaccessible. However, ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging…

Machine Learning · Computer Science 2023-10-27 Lasse Hansen , Nabeel Seedat , Mihaela van der Schaar , Andrija Petrovic
‹ Prev 1 2 3 10 Next ›