English
Related papers

Related papers: Relationship-aware Multivariate Sampling Strategy …

200 papers

With increasing computing capabilities of modern supercomputers, the size of the data generated from the scientific simulations is growing rapidly. As a result, application scientists need effective data summarization techniques that can…

Human-Computer Interaction · Computer Science 2019-07-30 Soumya Dutta , Ayan Biswas , James Ahrens

Subsampling is one of the popular methods to balance statistical efficiency and computational efficiency in the big data era. Most approaches aim at selecting informative or representative sample points to achieve good overall information…

Methodology · Statistics 2024-07-10 Haolin Chen , Holger Dette , Jun Yu

Computational capability often falls short when confronted with massive data, posing a common challenge in establishing a statistical model or statistical inference method dealing with big data. While subsampling techniques have been…

Methodology · Statistics 2024-10-31 Yixiao Ruan , Zan Li , Zhaohui Li , Dennis K. J. Lin , Qingpei Hu , Dan Yu

Modern scientific simulations, observations, and large-scale experiments generate data at volumes that often exceed the limits of storage, processing, and analysis. This challenge drives the development of data reduction methods that…

Machine Learning · Computer Science 2025-11-18 Minh Vu , Andrey Lokhov

Gradient descent methods and especially their stochastic variants have become highly popular in the last decade due to their efficiency on big data optimization problems. In this thesis we present the development of data sampling strategies…

Optimization and Control · Mathematics 2018-04-03 Dominik Csiba

Online sampling-supported visual analytics is increasingly important, as it allows users to explore large datasets with acceptable approximate answers at interactive rates. However, existing online spatiotemporal sampling techniques are…

Network datasets appear across a wide range of scientific fields, including biology, physics, and the social sciences. To enable data-driven discoveries from these networks, statistical inference techniques like estimation and hypothesis…

Methodology · Statistics 2026-02-19 Arpan Kumar , Minh Tang , Srijan Sengupta

Improvements in computational and experimental capabilities are rapidly increasing the amount of scientific data that is routinely generated. In applications that are constrained by memory and computational intensity, excessively large…

Machine Learning · Computer Science 2023-02-28 Malik Hassanaly , Bruce A. Perry , Michael E. Mueller , Shashank Yellapantula

Enhanced sampling algorithms have emerged as powerful methods to extend the utility of molecular dynamics simulations and allow the sampling of larger portions of the configuration space of complex systems in a given amount of simulation…

Statistical Mechanics · Physics 2022-12-19 Jérôme Hénin , Tony Lelièvre , Michael R. Shirts , Omar Valsson , Lucie Delemotte

Multivariate spatial data plays an important role in computational science and engineering simulations. The potential features and hidden relationships in multivariate data can assist scientists to gain an in-depth understanding of a…

Human-Computer Interaction · Computer Science 2019-08-30 Xiangyang He , Yubo Tao , Qirui Wang , Hai Lin

Variance-reduced stochastic gradient methods have gained popularity in recent times. Several variants exist with different strategies for the storing and sampling of gradients and this work concerns the interactions between these two…

Optimization and Control · Mathematics 2022-10-19 Martin Morin , Pontus Giselsson

Despite the accelerating presence of exploratory causal analysis in modern science and medicine, the available non-experimental methods for validating causal models are not well characterized. One of the most popular methods is to evaluate…

Methodology · Statistics 2025-03-20 Ritwick Banerjee , Bryan Andrews , Erich Kummerfeld

Data collection costs can vary widely across variables in data science tasks. Two-phase designs can be employed to save data collection costs. This paper considers the two-phase studies where inexpensive variables are collected for all…

Methodology · Statistics 2025-12-04 Ruoyu Wang , Qihua Wang , Wang Miao

Subsampling is a widely used and effective approach for addressing the computational challenges posed by massive datasets. Substantial progress has been made in developing non-uniform, probability-based subsampling schemes that prioritize…

Methodology · Statistics 2026-05-07 Dingyi Wang , Haiying Wang , Qingpei Hu

The integration of data from multiple sources is increasingly used to achieve larger sample sizes and enhance population diversity. Our previous work established that, under random sampling from the same underlying population, integrating…

Methodology · Statistics 2026-01-01 Farimah Shamsi , Andriy Derkach

Many problems within personalized medicine and digital health rely on the analysis of continuous-time functional biomarkers and other complex data structures emerging from high-resolution patient monitoring. In this context, this work…

Machine Learning · Statistics 2025-01-14 Marcos Matabuena

Subsampling from a large data set is useful in many supervised learning contexts to provide a global view of the data based on only a fraction of the observations. Diverse (or space-filling) subsampling is an appealing subsampling approach…

Methodology · Statistics 2023-11-27 Boyang Shang , Daniel W. Apley , Sanjay Mehrotra

A major challenge for building statistical models in the big data era is that the available data volume far exceeds the computational capability. A common approach for solving this problem is to employ a subsampled dataset that can be…

Computation · Statistics 2018-09-14 Lei Han , Kean Ming Tan , Ting Yang , Tong Zhang

This paper presents an algorithm for sampling random variables that allows to separation of the sampling process into subproblems by dividing the sample space into overlapping parts. The subproblems can be solved independently of each other…

Computation · Statistics 2016-01-26 Jonas Hallgren , Timo Koski

We develop a new method for multivariate scalar on multidimensional distribution regression. Traditional approaches typically analyze isolated univariate scalar outcomes or consider unidimensional distributional representations as…

Methodology · Statistics 2023-10-17 Rahul Ghosal , Marcos Matabuena
‹ Prev 1 2 3 10 Next ›