English
Related papers

Related papers: Spatial Data Generators

200 papers

The machine learning community has mainly relied on real data to benchmark algorithms as it provides compelling evidence of model applicability. Evaluation on synthetic datasets can be a powerful tool to provide a better understanding of a…

Machine Learning · Computer Science 2022-11-01 Florence Regol , Anja Kroon , Mark Coates

Recent advances in generative modelling have led many to see synthetic data as the go-to solution for a range of problems around data access, scarcity, and under-representation. In this paper, we study three prominent use cases: (1) Sharing…

Machine Learning · Computer Science 2026-02-04 Bogdan Kulynych , Theresa Stadler , Jean Louis Raisaro , Carmela Troncoso

Many ground-breaking advancements in machine learning can be attributed to the availability of a large volume of rich data. Unfortunately, many large-scale datasets are highly sensitive, such as healthcare data, and are not widely available…

Machine Learning · Computer Science 2020-12-09 James Jordon , Alan Wilson , Mihaela van der Schaar

Private synthetic data sharing is preferred as it keeps the distribution and nuances of original data compared to summary statistics. The state-of-the-art methods adopt a select-measure-generate paradigm, but measuring large domain…

Cryptography and Security · Computer Science 2023-10-11 Meifan Zhang , Dihang Deng , Lihua Yin

Synthetic data has gained significant momentum thanks to sophisticated machine learning tools that enable the synthesis of high-dimensional datasets. However, many generation techniques do not give the data controller control over what…

Cryptography and Security · Computer Science 2022-11-22 Florimond Houssiau , Samuel N. Cohen , Lukasz Szpruch , Owen Daniel , Michaela G. Lawrence , Robin Mitra , Henry Wilde , Callum Mole

Synthetic datasets are important for evaluating and testing machine learning models. When evaluating real-life recommender systems, high-dimensional categorical (and sparse) datasets are often considered. Unfortunately, there are not many…

Information Retrieval · Computer Science 2024-12-11 Miha Malenšek , Blaž Škrlj , Blaž Mramor , Jure Demšar

Recent advances in deep generative models have greatly expanded the potential to create realistic synthetic health datasets. These synthetic datasets aim to preserve the characteristics, patterns, and overall scientific conclusions derived…

Machine Learning · Computer Science 2024-07-04 Jennifer A Bartell , Sander Boisen Valentin , Anders Krogh , Henning Langberg , Martin Bøgsted

Training models to high-end performance requires availability of large labeled datasets, which are expensive to get. The goal of our work is to automatically synthesize labeled datasets that are relevant for a downstream task. We propose…

Computer Vision and Pattern Recognition · Computer Science 2019-04-29 Amlan Kar , Aayush Prakash , Ming-Yu Liu , Eric Cameracci , Justin Yuan , Matt Rusiniak , David Acuna , Antonio Torralba , Sanja Fidler

This work presents a systematic benchmark of differentially private synthetic data generation algorithms that can generate tabular data. Utility of the synthetic data is evaluated by measuring whether the synthetic data preserve the…

Cryptography and Security · Computer Science 2022-02-16 Yuchao Tao , Ryan McKenna , Michael Hay , Ashwin Machanavajjhala , Gerome Miklau

As more tech companies engage in rigorous economic analyses, we are confronted with a data problem: in-house papers cannot be replicated due to use of sensitive, proprietary, or private data. Readers are left to assume that the obscured…

General Economics · Economics 2020-11-10 Allison Koenecke , Hal Varian

This paper proposes three different data generators, tailored to transactional datasets, based on existing itemset-based generative models. All these generators are intuitive and easy to implement and show satisfactory performance. The…

Databases · Computer Science 2020-07-15 Christian Lezcano , Marta Arias

Feature selection is an important and active field of research in machine learning and data science. Our goal in this paper is to propose a collection of synthetic datasets that can be used as a common reference point for feature selection…

Machine Learning · Computer Science 2022-11-08 Firuz Kamalov , Hana Sulieman , Aswani Kumar Cherukuri

Synthetic data is being used lately for training deep neural networks in computer vision applications such as object detection, object segmentation and 6D object pose estimation. Domain randomization hereby plays an important role in…

Computer Vision and Pattern Recognition · Computer Science 2024-05-13 Parth Rawal , Mrunal Sompura , Wolfgang Hintze

Machine learning heavily relies on data, but real-world applications often encounter various data-related issues. These include data of poor quality, insufficient data points leading to under-fitting of machine learning models, and…

Machine Learning · Computer Science 2025-04-07 Yingzhou Lu , Lulu Chen , Yuanyuan Zhang , Minjie Shen , Huazheng Wang , Xiao Wang , Capucine van Rechem , Tianfan Fu , Wenqi Wei

Individual-level data (microdata) that characterizes a population, is essential for studying many real-world problems. However, acquiring such data is not straightforward due to cost and privacy constraints, and access is often limited to…

Machine Learning · Computer Science 2022-12-13 Angeela Acharya , Siddhartha Sikdar , Sanmay Das , Huzefa Rangwala

Accurately evaluating model performance is crucial for deploying machine learning systems in real-world applications. Traditional methods often require a sufficiently large labeled test set to ensure a reliable evaluation. However, in many…

Machine Learning · Computer Science 2025-11-04 Hai Hoang Thanh , Duy-Tung Nguyen , Hung The Tran , Khoat Than

Multi-spectral satellite imagery provides valuable data at global scale for many environmental and socio-economic applications. Building supervised machine learning models based on these imagery, however, may require ground reference labels…

Computer Vision and Pattern Recognition · Computer Science 2020-12-08 Tharun Mohandoss , Aditya Kulkarni , Daniel Northrup , Ernest Mwebaze , Hamed Alemohammad

Satellite imagery and remote sensing provide explanatory variables at relatively high resolutions for modeling geospatial phenomena, yet regional summaries are often desirable for analysis and actionable insight. In this paper, we propose a…

Machine Learning · Statistics 2017-12-15 Sam Kriegman , Marcin Szubert , Josh C. Bongard , Christian Skalka

Testing in production-like test environments is an essential part of quality assurance processes in many industries. Provisioning of such test environments, for information-intensive services, involves setting up databases that are…

Software Engineering · Computer Science 2024-07-09 Razieh Behjati , Erik Arisholm , Chao Tan , Margrethe M. Bedregal

Due to the increasing volume, volatility, and diversity of data in virtually all areas of our lives, the ability to detect duplicates in potentially linked data sources is more important than ever before. However, while research is already…

Databases · Computer Science 2024-01-01 Fabian Panse , Wolfram Wingerath , Benjamin Wollmer
‹ Prev 1 2 3 10 Next ›