English
Related papers

Related papers: SEDGE: Structural Extrapolated Data Generation

200 papers

Given the inherent class imbalance issue within student performance datasets, samples belonging to the edges of the target class distribution pose a challenge for predictive machine learning algorithms to learn. In this paper, we introduce…

Machine Learning · Computer Science 2021-01-05 Dom Huh

Attribute extrapolation in sample generation is challenging for deep neural networks operating beyond the training distribution. We formulate a new task for extrapolation in sequence generation, focusing on natural language and proteins,…

Machine Learning · Computer Science 2021-10-27 Alvin Chan , Ali Madani , Ben Krause , Nikhil Naik

Synthetic data generation has emerged as a crucial topic for financial institutions, driven by multiple factors, such as privacy protection and data augmentation. Many algorithms have been proposed for synthetic data generation but reaching…

Machine Learning · Computer Science 2024-05-13 Shinpei Nakamura-Sakai , Fadi Hamad , Saheed Obitayo , Vamsi K. Potluru

We study the problem of extrapolative controlled generation, i.e., generating sequences with attribute values beyond the range seen in training. This task is of significant importance in automated design, especially drug discovery, where…

Machine Learning · Computer Science 2023-06-08 Vishakh Padmakumar , Richard Yuanzhe Pang , He He , Ankur P. Parikh

A novel strategy for generating datasets is developed within the context of drag prediction for automotive geometries using neural networks. A primary challenge in this space is constructing a training databse of sufficient size and…

Machine Learning · Computer Science 2024-08-15 Mark Benjamin , Gianluca Iaccarino

Generating graphs that are similar to real ones is an open problem, while the similarity notion is quite elusive and hard to formalize. In this paper, we focus on sparse digraphs and propose SDG, an algorithm that aims at generating graphs…

Data Structures and Algorithms · Computer Science 2018-07-06 Georgios Papoudakis , Philippe Preux , Martin Monperrus

Distributional regression aims to estimate the full conditional distribution of a target variable, given covariates. Popular methods include linear and tree-ensemble based quantile regression. We propose a neural network-based…

Methodology · Statistics 2024-07-08 Xinwei Shen , Nicolai Meinshausen

Generative models can be trained to emulate complex empirical data, but are they useful to make predictions in the context of previously unobserved environments? An intuitive idea to promote such extrapolation capabilities is to have the…

Machine Learning · Computer Science 2022-01-03 Michel Besserve , Rémy Sun , Dominik Janzing , Bernhard Schölkopf

Tabular data is common yet typically incomplete, small in volume, and access-restricted due to privacy concerns. Synthetic data generation offers potential solutions. Many metrics exist for evaluating the quality of synthetic tabular data;…

Machine Learning · Computer Science 2024-04-01 Scott Cheng-Hsin Yang , Baxter Eaves , Michael Schmidt , Ken Swanson , Patrick Shafto

Image extrapolation aims at expanding the narrow field of view of a given image patch. Existing models mainly deal with natural scene images of homogeneous regions and have no control of the content generation process. In this work, we…

Computer Vision and Pattern Recognition · Computer Science 2019-12-30 Yijun Li , Lu Jiang , Ming-Hsuan Yang

Deep learning-based graph generation approaches have remarkable capacities for graph data modeling, allowing them to solve a wide range of real-world problems. Making these methods able to consider different conditions during the generation…

Machine Learning · Computer Science 2023-01-11 Faezeh Faez , Negin Hashemi Dijujin , Mahdieh Soleymani Baghshah , Hamid R. Rabiee

Diffusion-based generative graph models have been proven effective in generating high-quality small graphs. However, they need to be more scalable for generating large graphs containing thousands of nodes desiring graph statistics. In this…

Machine Learning · Computer Science 2023-06-01 Xiaohui Chen , Jiaxing He , Xu Han , Li-Ping Liu

This paper proposes a new method to generate synthetic data sets based on copula models. Our goal is to produce surrogate data resembling real data in terms of marginal and joint distributions. We present a complete and reliable algorithm…

Machine Learning · Computer Science 2022-04-01 Regis Houssou , Mihai-Cezar Augustin , Efstratios Rappos , Vivien Bonvin , Stephan Robert-Nicoud

Human can extrapolate well, generalize daily knowledge into unseen scenarios, raise and answer counterfactual questions. To imitate this ability via generative models, previous works have extensively studied explicitly encoding Structural…

Machine Learning · Computer Science 2022-05-27 Ruili Feng , Jie Xiao , Kecheng Zheng , Deli Zhao , Jingren Zhou , Qibin Sun , Zheng-Jun Zha

Differentially private (DP) synthetic data generation is a promising technique for utilizing private datasets that otherwise cannot be exposed for model training or other analytics. While much research literature has focused on generating…

Computation and Language · Computer Science 2025-09-16 Shuaiqi Wang , Vikas Raunak , Arturs Backurs , Victor Reis , Pei Zhou , Sihao Chen , Longqi Yang , Zinan Lin , Sergey Yekhanin , Giulia Fanti

We study the problem of conditional generative modeling based on designated semantics or structures. Existing models that build conditional generators either require massive labeled instances as supervision or are unable to accurately…

Machine Learning · Computer Science 2017-11-06 Zhijie Deng , Hao Zhang , Xiaodan Liang , Luona Yang , Shizhen Xu , Jun Zhu , Eric P. Xing

Analyzing medical data to find abnormalities is a time-consuming and costly task, particularly for rare abnormalities, requiring tremendous efforts from medical experts. Artificial intelligence has become a popular tool for the automatic…

Synthetic data generation is a promising solution to address privacy issues with the distribution of sensitive health data. Recently, diffusion models have set new standards for generative models for different data modalities. Also very…

Signal Processing · Electrical Eng. & Systems 2023-06-16 Juan Miguel Lopez Alcaraz , Nils Strodthoff

Generating synthetic data through generative models is gaining interest in the ML community and beyond, promising a future where datasets can be tailored to individual needs. Unfortunately, synthetic data is usually not perfect, resulting…

Machine Learning · Computer Science 2023-07-11 Boris van Breugel , Zhaozhi Qian , Mihaela van der Schaar

In recent years, several models have improved the capacity to generate synthetic tabular datasets. However, such models focus on synthesizing simple columnar tables and are not useable on real-life data with complex structures. This paper…

Machine Learning · Computer Science 2022-02-07 Luca Canale , Nicolas Grislain , Grégoire Lothe , Johan Leduc
‹ Prev 1 2 3 10 Next ›