Related papers: Relationship-aware Multivariate Sampling Strategy …

Multivariate Pointwise Information-Driven Data Sampling and Visualization

With increasing computing capabilities of modern supercomputers, the size of the data generated from the scientific simulations is growing rapidly. As a result, application scientists need effective data summarization techniques that can…

Human-Computer Interaction · Computer Science 2019-07-30 Soumya Dutta , Ayan Biswas , James Ahrens

Multi-resolution subsampling for large-scale linear classification

Subsampling is one of the popular methods to balance statistical efficiency and computational efficiency in the big data era. Most approaches aim at selecting informative or representative sample points to achieve good overall information…

Methodology · Statistics 2024-07-10 Haolin Chen , Holger Dette , Jun Yu

Novel Subsampling Strategies for Heavily Censored Reliability Data

Computational capability often falls short when confronted with massive data, posing a common challenge in establishing a statistical model or statistical inference method dealing with big data. While subsampling techniques have been…

Methodology · Statistics 2024-10-31 Yixiao Ruan , Zan Li , Zhaohui Li , Dennis K. J. Lin , Qingpei Hu , Dan Yu

Scientific Data Compression and Super-Resolution Sampling

Modern scientific simulations, observations, and large-scale experiments generate data at volumes that often exceed the limits of storage, processing, and analysis. This challenge drives the development of data reduction methods that…

Machine Learning · Computer Science 2025-11-18 Minh Vu , Andrey Lokhov

Data Sampling Strategies in Stochastic Algorithms for Empirical Risk Minimization

Gradient descent methods and especially their stochastic variants have become highly popular in the last decade due to their efficiency on big data optimization problems. In this thesis we present the development of data sampling strategies…

Optimization and Control · Mathematics 2018-04-03 Dominik Csiba

STULL: Unbiased Online Sampling for Visual Exploration of Large Spatiotemporal Data

Online sampling-supported visual analytics is increasingly important, as it allows users to explore large datasets with acceptable approximate answers at interactive rates. However, existing online spatiotemporal sampling techniques are…

Databases · Computer Science 2020-09-01 Guizhen Wang , Jingjing Guo , Mingjie Tang , José Florencio de Queiroz Neto , Calvin Yau , Anas Daghistani , Morteza Karimzadeh , Walid G. Aref , David S. Ebert

Predictive Subsampling for Scalable Inference in Networks

Network datasets appear across a wide range of scientific fields, including biology, physics, and the social sciences. To enable data-driven discoveries from these networks, statistical inference techniques like estimation and hypothesis…

Methodology · Statistics 2026-02-19 Arpan Kumar , Minh Tang , Srijan Sengupta

Uniform-in-Phase-Space Data Selection with Iterative Normalizing Flows

Improvements in computational and experimental capabilities are rapidly increasing the amount of scientific data that is routinely generated. In applications that are constrained by memory and computational intensity, excessively large…

Machine Learning · Computer Science 2023-02-28 Malik Hassanaly , Bruce A. Perry , Michael E. Mueller , Shashank Yellapantula

Enhanced sampling methods for molecular dynamics simulations

Enhanced sampling algorithms have emerged as powerful methods to extend the utility of molecular dynamics simulations and allow the sampling of larger portions of the configuration space of complex systems in a given amount of simulation…

Statistical Mechanics · Physics 2022-12-19 Jérôme Hénin , Tony Lelièvre , Michael R. Shirts , Omar Valsson , Lucie Delemotte

Multivariate Spatial Data Visualization: A Survey

Multivariate spatial data plays an important role in computational science and engineering simulations. The potential features and hidden relationships in multivariate data can assist scientists to gain an in-depth understanding of a…

Human-Computer Interaction · Computer Science 2019-08-30 Xiangyang He , Yubo Tao , Qirui Wang , Hai Lin

Sampling and Update Frequencies in Proximal Variance-Reduced Stochastic Gradient Methods

Variance-reduced stochastic gradient methods have gained popularity in recent times. Several variants exist with different strategies for the storing and sampling of gradients and this work concerns the interactions between these two…

Optimization and Control · Mathematics 2022-10-19 Martin Morin , Pontus Giselsson

An extensive simulation study evaluating the interaction of resampling techniques across multiple causal discovery contexts

Despite the accelerating presence of exploratory causal analysis in modern science and medicine, the available non-experimental methods for validating causal models are not well characterized. One of the most popular methods is to evaluate…

Methodology · Statistics 2025-03-20 Ritwick Banerjee , Bryan Andrews , Erich Kummerfeld

A maximin optimal approach for sampling designs in two-phase studies

Data collection costs can vary widely across variables in data science tasks. Two-phase designs can be employed to save data collection costs. This paper considers the two-phase studies where inexpensive variables are collected for all…

Methodology · Statistics 2025-12-04 Ruoyu Wang , Qihua Wang , Wang Miao

Maximum-Variance-Reduction Stratification for Improved Subsampling

Subsampling is a widely used and effective approach for addressing the computational challenges posed by massive datasets. Substantial progress has been made in developing non-uniform, probability-based subsampling schemes that prioritize…

Methodology · Statistics 2026-05-07 Dingyi Wang , Haiying Wang , Qingpei Hu

A Novel Approach for Data Integration with Multiple Heterogeneous Data Sources

The integration of data from multiple sources is increasingly used to achieve larger sample sizes and enhance population diversity. Our previous work established that, under random sampling from the same underlying population, integrating…

Methodology · Statistics 2026-01-01 Farimah Shamsi , Andriy Derkach

Variable Selection Methods for Multivariate, Functional, and Complex Biomedical Data in the AI Age

Many problems within personalized medicine and digital health rely on the analysis of continuous-time functional biomarkers and other complex data structures emerging from high-resolution patient monitoring. In this context, this work…

Machine Learning · Statistics 2025-01-14 Marcos Matabuena

Diversity Subsampling: Custom Subsamples from Large Data Sets

Subsampling from a large data set is useful in many supervised learning contexts to provide a global view of the data based on only a fraction of the observations. Diverse (or space-filling) subsampling is an appealing subsampling approach…

Methodology · Statistics 2023-11-27 Boyang Shang , Daniel W. Apley , Sanjay Mehrotra

Local Uncertainty Sampling for Large-Scale Multi-Class Logistic Regression

A major challenge for building statistical models in the big data era is that the available data volume far exceeds the computational capability. A common approach for solving this problem is to employ a subsampled dataset that can be…

Computation · Statistics 2018-09-14 Lei Han , Kean Ming Tan , Ting Yang , Tong Zhang

Decomposition Sampling applied to Parallelization of Metropolis-Hastings

This paper presents an algorithm for sampling random variables that allows to separation of the sampling process into subproblems by dividing the sample space into overlapping parts. The subproblems can be solved independently of each other…

Computation · Statistics 2016-01-26 Jonas Hallgren , Timo Koski

Multivariate Scalar on Multidimensional Distribution Regression

We develop a new method for multivariate scalar on multidimensional distribution regression. Traditional approaches typically analyze isolated univariate scalar outcomes or consider unidimensional distributional representations as…

Methodology · Statistics 2023-10-17 Rahul Ghosal , Marcos Matabuena