English
Related papers

Related papers: Maximum-Variance-Reduction Stratification for Impr…

200 papers

Recent works have proposed optimal subsampling algorithms to improve computational efficiency in large datasets and to design validation studies in the presence of measurement error. Existing approaches generally fall into two categories:…

Methodology · Statistics 2025-12-25 Jasper B. Yang , Thomas Lumley , Bryan E. Shepherd , Pamela A. Shaw

Adaptive importance sampling for stochastic optimization is a promising approach that offers improved convergence through variance reduction. In this work, we propose a new framework for variance reduction that enables the use of mixtures…

Machine Learning · Computer Science 2019-04-01 Zalán Borsos , Sebastian Curi , Kfir Y. Levy , Andreas Krause

Subsampling is one of the popular methods to balance statistical efficiency and computational efficiency in the big data era. Most approaches aim at selecting informative or representative sample points to achieve good overall information…

Methodology · Statistics 2024-07-10 Haolin Chen , Holger Dette , Jun Yu

A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large sample data is subsampling, by which one takes a random subsample…

Methodology · Statistics 2015-11-24 Rong Zhu , Ping Ma , Michael W. Mahoney , Bin Yu

This paper investigates the use of stratified sampling as a variance reduction technique for approximating integrals over large dimensional spaces. The accuracy of this method critically depends on the choice of the space partition, the…

Probability · Mathematics 2009-09-15 Pierre Etoré , Gersende Fort , Benjamin Jourdain , Eric Moulines

Stochastic Gradient Boosting (SGB) is a widely used approach to regularization of boosting models based on decision trees. It was shown that, in many cases, random sampling at each iteration can lead to better generalization performance of…

Machine Learning · Statistics 2019-10-30 Bulat Ibragimov , Gleb Gusev

We study a class of nonconvex nonsmooth optimization problems in which the objective is a sum of two functions: One function is the average of a large number of differentiable functions, while the other function is proper, lower…

Optimization and Control · Mathematics 2023-05-12 Duy-Nhat Phan , Sedi Bartz , Nilabja Guha , Hung M. Phan

For massive data, the family of subsampling algorithms is popular to downsize the data volume and reduce computational burden. Existing studies focus on approximating the ordinary least squares estimate in linear regression, where…

Computation · Statistics 2019-06-27 HaiYing Wang , Rong Zhu , Ping Ma

Subsampling algorithms for various parametric regression models with massive data have been extensively investigated in recent years. However, all existing studies on subsampling heavily rely on clean massive data. In practical…

Statistics Theory · Mathematics 2025-06-11 Jiangshan Ju , Mingqiu Wang , Shengli Zhao

The maximum likelihood estimation is computationally demanding for large datasets, particularly when the likelihood function includes integrals. Subsampling can reduce the computational burden, but it often results in efficiency loss.This…

Methodology · Statistics 2026-04-27 Miaomiao Su , Qihua Wang , Ruoyu Wang

This article introduces a subbagging (subsample aggregating) approach for variable selection in regression within the context of big data. The proposed subbagging approach not only ensures that variable selection is scalable given the…

Methodology · Statistics 2025-03-10 Xian Li , Xuan Liang , Tao Zou

A variance reduction technique in nonparametric smoothing is proposed: at each point of estimation, form a linear combination of a preliminary estimator evaluated at nearby points with the coefficients specified so that the asymptotic bias…

Statistics Theory · Mathematics 2007-08-22 Ming-Yen Cheng , Liang Peng , Jyh-Shyang Wu

Data valuation, especially quantifying data value in algorithmic prediction and decision-making, is a fundamental problem in data trading scenarios. The most widely used method is to define the data Shapley and approximate it by means of…

Machine Learning · Statistics 2023-05-23 Mengmeng Wu , Ruoxi Jia , Changle Lin , Wei Huang , Xiangyu Chang

Nonuniform subsampling methods are effective to reduce computational burden and maintain estimation efficiency for massive data. Existing methods mostly focus on subsampling with replacement due to its high computational efficiency. If the…

Methodology · Statistics 2021-07-06 Jun Yu , HaiYing Wang , Mingyao Ai , Huiming Zhang

Subsampling is a computationally efficient and scalable method to draw inference in large data settings based on a subset of the data rather than needing to consider the whole dataset. When employing subsampling techniques, a crucial…

Methodology · Statistics 2025-10-08 Amalan Mahendran , Helen Thompson , James M. McGree

Computational capability often falls short when confronted with massive data, posing a common challenge in establishing a statistical model or statistical inference method dealing with big data. While subsampling techniques have been…

Methodology · Statistics 2024-10-31 Yixiao Ruan , Zan Li , Zhaohui Li , Dennis K. J. Lin , Qingpei Hu , Dan Yu

The bootstrap is a widely used procedure for statistical inference because of its simplicity and attractive statistical properties. However, the vanilla version of bootstrap is no longer feasible computationally for many modern massive…

Methodology · Statistics 2023-02-16 Yingying Ma , Chenlei Leng , Hansheng Wang

Stochastic variance reduction has proven effective at accelerating first-order algorithms for solving convex finite-sum optimization tasks such as empirical risk minimization. Incorporating second-order information has proven helpful in…

Optimization and Control · Mathematics 2025-04-30 Michał Dereziński

Subsampling is an efficient method to deal with massive data. In this paper, we investigate the optimal subsampling for linear quantile regression when the covariates are functions. The asymptotic distribution of the subsampling estimator…

Numerical Analysis · Mathematics 2022-05-06 Qian Yan , Hanyu Li , Chengmei Niu

The performance of a machine learning system is usually evaluated by using i.i.d.\ observations with true labels. However, acquiring ground truth labels is expensive, while obtaining unlabeled samples may be cheaper. Stratified sampling can…

Machine Learning · Computer Science 2019-07-29 Tiancheng Yu , Xiyu Zhai , Suvrit Sra
‹ Prev 1 2 3 10 Next ›