Related papers: Orthogonal Subsampling for Big Data Linear Regress…

Optimal Subsampling Approaches for Large Sample Linear Regression

A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large sample data is subsampling, by which one takes a random subsample…

Methodology · Statistics 2015-11-24 Rong Zhu , Ping Ma , Michael W. Mahoney , Bin Yu

Group-Orthogonal Subsampling for Hierarchical Data Based on Linear Mixed Models

Hierarchical data analysis is crucial in various fields for making discoveries. The linear mixed model is often used for training hierarchical data, but its parameter estimation is computationally expensive, especially with big data.…

Methodology · Statistics 2023-10-17 Jiaqing Zhu , Lin Wang , Fasheng Sun

Information-Based Optimal Subdata Selection for Big Data Linear Regression

Extraordinary amounts of data are being produced in many branches of science. Proven statistical methods are no longer applicable with extraordinary large data sets due to computational limitations. A critical step in big data analysis is…

Methodology · Statistics 2019-06-27 HaiYing Wang , Min Yang , John Stufken

Subdata selection for big data regression: an improved approach

In the big data era researchers face a series of problems. Even standard approaches/methodologies, like linear regression, can be difficult or problematic with huge volumes of data. Traditional approaches for regression in big datasets may…

Methodology · Statistics 2024-11-13 Vasilis Chasiotis , Dimitris Karlis

On best subset regression

In this paper we discuss the variable selection method from \ell0-norm constrained regression, which is equivalent to the problem of finding the best subset of a fixed size. Our study focuses on two aspects, consistency and computation. We…

Methodology · Statistics 2013-03-20 Shifeng Xiong

D-optimal Subsampling Design for Massive Data Linear Regression

Data reduction is a fundamental challenge of modern technology, where classical statistical methods are not applicable because of computational limitations. We consider multiple linear regression for an extraordinarily large number of…

Methodology · Statistics 2025-05-30 Torsten Glemser , Rainer Schwabe

Optimal Subsampling for Large Sample Logistic Regression

For massive data, the family of subsampling algorithms is popular to downsize the data volume and reduce computational burden. Existing studies focus on approximating the ordinary least squares estimate in linear regression, where…

Computation · Statistics 2019-06-27 HaiYing Wang , Rong Zhu , Ping Ma

Poisson Subsampling Algorithms for Large Sample Linear Regression in Massive Data

Large sample size brings the computation bottleneck for modern data analysis. Subsampling is one of efficient strategies to handle this problem. In previous studies, researchers make more fo- cus on subsampling with replacement (SSR) than…

Machine Learning · Statistics 2015-11-24 Rong Zhu

Subsampling for Big Data Linear Models with Measurement Errors

Subsampling algorithms for various parametric regression models with massive data have been extensively investigated in recent years. However, all existing studies on subsampling heavily rely on clean massive data. In practical…

Statistics Theory · Mathematics 2025-06-11 Jiangshan Ju , Mingqiu Wang , Shengli Zhao

Orthogonal Subspace Clustering: Enhancing High-Dimensional Data Analysis through Adaptive Dimensionality Reduction and Efficient Clustering

This paper presents Orthogonal Subspace Clustering (OSC), an innovative method for high-dimensional data clustering. We first establish a theoretical theorem proving that high-dimensional data can be decomposed into orthogonal subspaces in…

Machine Learning · Computer Science 2026-03-17 Qing-Yuan Wen , Da-Qing Zhang

Modern Subsampling Methods for Large-Scale Least Squares Regression

Subsampling methods aim to select a subsample as a surrogate for the observed sample. As a powerful technique for large-scale data analysis, various subsampling methods are developed for more effective coefficient estimation and model…

Methodology · Statistics 2021-05-05 Tao Li , Cheng Meng

Sparse Linear Regression via Generalized Orthogonal Least-Squares

Sparse linear regression, which entails finding a sparse solution to an underdetermined system of linear equations, can formally be expressed as an $l_0$-constrained least-squares problem. The Orthogonal Least-Squares (OLS) algorithm…

Machine Learning · Statistics 2016-08-01 Abolfazl Hashemi , Haris Vikalo

Multi-resolution subsampling for large-scale linear classification

Subsampling is one of the popular methods to balance statistical efficiency and computational efficiency in the big data era. Most approaches aim at selecting informative or representative sample points to achieve good overall information…

Methodology · Statistics 2024-07-10 Haolin Chen , Holger Dette , Jun Yu

Robust and Efficient Subspace Segmentation via Least Squares Regression

This paper studies the subspace segmentation problem which aims to segment data drawn from a union of multiple linear subspaces. Recent works by using sparse representation, low rank representation and their extensions attract much…

Computer Vision and Pattern Recognition · Computer Science 2014-04-29 Can-Yi Lu , Hai Min , Zhong-Qiu Zhao , Lin Zhu , De-Shuang Huang , Shuicheng Yan

COMBSS: Best Subset Selection via Continuous Optimization

The problem of best subset selection in linear regression is considered with the aim to find a fixed size subset of features that best fits the response. This is particularly challenging when the total available number of features is very…

Methodology · Statistics 2023-11-28 Sarat Moka , Benoit Liquet , Houying Zhu , Samuel Muller

High Performance Out-of-sample Embedding Techniques for Multidimensional Scaling

The recent rapid growth of the dimension of many datasets means that many approaches to dimension reduction (DR) have gained significant attention. High-performance DR algorithms are required to make data analysis feasible for big and fast…

Machine Learning · Computer Science 2021-11-09 Samudra Herath , Matthew Roughan , Gary Glonek

Optimal Subsampling Bootstrap for Massive Data

The bootstrap is a widely used procedure for statistical inference because of its simplicity and attractive statistical properties. However, the vanilla version of bootstrap is no longer feasible computationally for many modern massive…

Methodology · Statistics 2023-02-16 Yingying Ma , Chenlei Leng , Hansheng Wang

SOFAR: large-scale association network learning

Many modern big data applications feature large scale in both numbers of responses and predictors. Better statistical efficiency and scientific insights can be enabled by understanding the large-scale response-predictor association network…

Methodology · Statistics 2017-04-28 Yoshimasa Uematsu , Yingying Fan , Kun Chen , Jinchi Lv , Wei Lin

Large-scale linear regression: Development of high-performance routines

In statistics, series of ordinary least squares problems (OLS) are used to study the linear correlation among sets of variables of interest; in many studies, the number of such variables is at least in the millions, and the corresponding…

Computational Engineering, Finance, and Science · Computer Science 2015-04-30 Alvaro Frank , Diego Fabregat-Traver , Paolo Bientinesi

Balanced Subsampling for Big Data with Categorical Covariates

Supervised learning under measurement constraints is a common challenge in statistical and machine learning. In many applications, despite extensive design points, acquiring responses for all points is often impractical due to resource…

Methodology · Statistics 2025-03-19 Lin Wang