Related papers: SEDGE: Structural Extrapolated Data Generation

Synthetic Embedding-based Data Generation Methods for Student Performance

Given the inherent class imbalance issue within student performance datasets, samples belonging to the edges of the target class distribution pose a challenge for predictive machine learning algorithms to learn. In this paper, we introduce…

Machine Learning · Computer Science 2021-01-05 Dom Huh

Deep Extrapolation for Attribute-Enhanced Generation

Attribute extrapolation in sample generation is challenging for deep neural networks operating beyond the training distribution. We formulate a new task for extrapolation in sequence generation, focusing on natural language and proteins,…

Machine Learning · Computer Science 2021-10-27 Alvin Chan , Ali Madani , Ben Krause , Nikhil Naik

A supervised generative optimization approach for tabular data

Synthetic data generation has emerged as a crucial topic for financial institutions, driven by multiple factors, such as privacy protection and data augmentation. Many algorithms have been proposed for synthetic data generation but reaching…

Machine Learning · Computer Science 2024-05-13 Shinpei Nakamura-Sakai , Fadi Hamad , Saheed Obitayo , Vamsi K. Potluru

Extrapolative Controlled Sequence Generation via Iterative Refinement

We study the problem of extrapolative controlled generation, i.e., generating sequences with attribute values beyond the range seen in training. This task is of significant importance in automated design, especially drug discovery, where…

Machine Learning · Computer Science 2023-06-08 Vishakh Padmakumar , Richard Yuanzhe Pang , He He , Ankur P. Parikh

A systematic dataset generation technique applied to data-driven automotive aerodynamics

A novel strategy for generating datasets is developed within the context of drag prediction for automotive geometries using neural networks. A primary challenge in this space is constructing a training databse of sufficient size and…

Machine Learning · Computer Science 2024-08-15 Mark Benjamin , Gianluca Iaccarino

A generative model for sparse, evolving digraphs

Generating graphs that are similar to real ones is an open problem, while the similarity notion is quite elusive and hard to formalize. In this paper, we focus on sparse digraphs and propose SDG, an algorithm that aims at generating graphs…

Data Structures and Algorithms · Computer Science 2018-07-06 Georgios Papoudakis , Philippe Preux , Martin Monperrus

Engression: Extrapolation through the Lens of Distributional Regression

Distributional regression aims to estimate the full conditional distribution of a target variable, given covariates. Popular methods include linear and tree-ensemble based quantile regression. We propose a neural network-based…

Methodology · Statistics 2024-07-08 Xinwei Shen , Nicolai Meinshausen

A theory of independent mechanisms for extrapolation in generative models

Generative models can be trained to emulate complex empirical data, but are they useful to make predictions in the context of previously unobserved environments? An intuitive idea to promote such extrapolation capabilities is to have the…

Machine Learning · Computer Science 2022-01-03 Michel Besserve , Rémy Sun , Dominik Janzing , Bernhard Schölkopf

Structured Evaluation of Synthetic Tabular Data

Tabular data is common yet typically incomplete, small in volume, and access-restricted due to privacy concerns. Synthetic data generation offers potential solutions. Many metrics exist for evaluating the quality of synthetic tabular data;…

Machine Learning · Computer Science 2024-04-01 Scott Cheng-Hsin Yang , Baxter Eaves , Michael Schmidt , Ken Swanson , Patrick Shafto

Controllable and Progressive Image Extrapolation

Image extrapolation aims at expanding the narrow field of view of a given image patch. Existing models mainly deal with natural scene images of homogeneous regions and have no control of the content generation process. In this work, we…

Computer Vision and Pattern Recognition · Computer Science 2019-12-30 Yijun Li , Lu Jiang , Ming-Hsuan Yang

SCGG: A Deep Structure-Conditioned Graph Generative Model

Deep learning-based graph generation approaches have remarkable capacities for graph data modeling, allowing them to solve a wide range of real-world problems. Making these methods able to consider different conditions during the generation…

Machine Learning · Computer Science 2023-01-11 Faezeh Faez , Negin Hashemi Dijujin , Mahdieh Soleymani Baghshah , Hamid R. Rabiee

Efficient and Degree-Guided Graph Generation via Discrete Diffusion Modeling

Diffusion-based generative graph models have been proven effective in generating high-quality small graphs. However, they need to be more scalable for generating large graphs containing thousands of nodes desiring graph statistics. In this…

Machine Learning · Computer Science 2023-06-01 Xiaohui Chen , Jiaxing He , Xu Han , Li-Ping Liu

Generation and Simulation of Synthetic Datasets with Copulas

This paper proposes a new method to generate synthetic data sets based on copula models. Our goal is to produce surrogate data resembling real data in terms of marginal and joint distributions. We present a complete and reliable algorithm…

Machine Learning · Computer Science 2022-04-01 Regis Houssou , Mihai-Cezar Augustin , Efstratios Rappos , Vivien Bonvin , Stephan Robert-Nicoud

Principled Knowledge Extrapolation with GANs

Human can extrapolate well, generalize daily knowledge into unseen scenarios, raise and answer counterfactual questions. To imitate this ability via generative models, previous works have extensively studied explicitly encoding Structural…

Machine Learning · Computer Science 2022-05-27 Ruili Feng , Jie Xiao , Kecheng Zheng , Deli Zhao , Jingren Zhou , Qibin Sun , Zheng-Jun Zha

Struct-Bench: A Benchmark for Differentially Private Structured Text Generation

Differentially private (DP) synthetic data generation is a promising technique for utilizing private datasets that otherwise cannot be exposed for model training or other analytics. While much research literature has focused on generating…

Computation and Language · Computer Science 2025-09-16 Shuaiqi Wang , Vikas Raunak , Arturs Backurs , Victor Reis , Pei Zhou , Sihao Chen , Longqi Yang , Zinan Lin , Sergey Yekhanin , Giulia Fanti

Structured Generative Adversarial Networks

We study the problem of conditional generative modeling based on designated semantics or structures. Existing models that build conditional generators either require massive labeled instances as supervision or are unable to accurately…

Machine Learning · Computer Science 2017-11-06 Zhijie Deng , Hao Zhang , Xiaodan Liang , Luona Yang , Shizhen Xu , Jun Zhu , Eric P. Xing

SinGAN-Seg: Synthetic training data generation for medical image segmentation

Analyzing medical data to find abnormalities is a time-consuming and costly task, particularly for rare abnormalities, requiring tremendous efforts from medical experts. Artificial intelligence has become a popular tool for the automatic…

Image and Video Processing · Electrical Eng. & Systems 2022-05-04 Vajira Thambawita , Pegah Salehi , Sajad Amouei Sheshkal , Steven A. Hicks , Hugo L. Hammer , Sravanthi Parasa , Thomas de Lange , Pål Halvorsen , Michael A. Riegler

Diffusion-based Conditional ECG Generation with Structured State Space Models

Synthetic data generation is a promising solution to address privacy issues with the distribution of sensitive health data. Recently, diffusion models have set new standards for generative models for different data modalities. Also very…

Signal Processing · Electrical Eng. & Systems 2023-06-16 Juan Miguel Lopez Alcaraz , Nils Strodthoff

Synthetic data, real errors: how (not) to publish and use synthetic data

Generating synthetic data through generative models is gaining interest in the ML community and beyond, promising a future where datasets can be tailored to individual needs. Unfortunately, synthetic data is usually not perfect, resulting…

Machine Learning · Computer Science 2023-07-11 Boris van Breugel , Zhaozhi Qian , Mihaela van der Schaar

Generative Modeling of Complex Data

In recent years, several models have improved the capacity to generate synthetic tabular datasets. However, such models focus on synthesizing simple columnar tables and are not useable on real-life data with complex structures. This paper…

Machine Learning · Computer Science 2022-02-07 Luca Canale , Nicolas Grislain , Grégoire Lothe , Johan Leduc