Related papers: Algorithmic subsampling under multiway clustering

Asymptotic results under multiway clustering

If multiway cluster-robust standard errors are used routinely in applied economics, surprisingly few theoretical results justify this practice. This paper aims to fill this gap. We first prove, under nearly the same conditions as with…

Econometrics · Economics 2018-08-06 Laurent Davezies , Xavier D'Haultfoeuille , Yannick Guyonvarch

Subsampling Under Two-way Clustering with Serial Correlation

We prove the validity of using subsampling method for inference under a two-way clustered panel in which the time effects are serially correlated. Subsamples should be drawn without replacement from randomly partitioned individual index set…

Econometrics · Economics 2026-05-01 Haonan Miao

A parallel sampling based clustering

The problem of automatically clustering data is an age old problem. People have created numerous algorithms to tackle this problem. The execution time of any of this algorithm grows with the number of input points and the number of cluster…

Machine Learning · Computer Science 2014-12-08 Aditya AV Sastry , Kalyan Netti

Multi-resolution subsampling for large-scale linear classification

Subsampling is one of the popular methods to balance statistical efficiency and computational efficiency in the big data era. Most approaches aim at selecting informative or representative sample points to achieve good overall information…

Methodology · Statistics 2024-07-10 Haolin Chen , Holger Dette , Jun Yu

Novel Subsampling Strategies for Heavily Censored Reliability Data

Computational capability often falls short when confronted with massive data, posing a common challenge in establishing a statistical model or statistical inference method dealing with big data. While subsampling techniques have been…

Methodology · Statistics 2024-10-31 Yixiao Ruan , Zan Li , Zhaohui Li , Dennis K. J. Lin , Qingpei Hu , Dan Yu

Statistical properties of sketching algorithms

Sketching is a probabilistic data compression technique that has been largely developed in the computer science community. Numerical operations on big datasets can be intolerably slow; sketching algorithms address this issue by generating a…

Methodology · Statistics 2019-04-04 Daniel Ahfock , William J. Astle , Sylvia Richardson

Optimal Subsampling Approaches for Large Sample Linear Regression

A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large sample data is subsampling, by which one takes a random subsample…

Methodology · Statistics 2015-11-24 Rong Zhu , Ping Ma , Michael W. Mahoney , Bin Yu

New summing algorithm using ensemble computing

We propose an ensemble algorithm, which provides a new approach for evaluating and summing up a set of function samples. The proposed algorithm is not a quantum algorithm, insofar it does not involve quantum entanglement. The query…

Quantum Physics · Physics 2009-11-07 C. D'Helon , V. Protopopescu

Adaptive Cluster-Based Synthetic Minority Oversampling Technique for Traffic Mode Choice Prediction with Imbalanced Dataset

Urban datasets such as citizen transportation modes often contain disproportionately distributed classes, posing significant challenges to the classification of under-represented samples using data-driven models. In the literature, various…

Machine Learning · Computer Science 2025-04-15 Guang An Ooi , Shehab Ahmed

Optimal Subsampling for Large Sample Logistic Regression

For massive data, the family of subsampling algorithms is popular to downsize the data volume and reduce computational burden. Existing studies focus on approximating the ordinary least squares estimate in linear regression, where…

Computation · Statistics 2019-06-27 HaiYing Wang , Rong Zhu , Ping Ma

Optimal Subsampling Algorithms for Big Data Regressions

To fast approximate maximum likelihood estimators with massive data, this paper studies the Optimal Subsampling Method under the A-optimality Criterion (OSMAC) for generalized linear models. The consistency and asymptotic normality of the…

Methodology · Statistics 2021-06-15 Mingyao Ai , Jun Yu , Huiming Zhang , HaiYing Wang

Poisson Subsampling Algorithms for Large Sample Linear Regression in Massive Data

Large sample size brings the computation bottleneck for modern data analysis. Subsampling is one of efficient strategies to handle this problem. In previous studies, researchers make more fo- cus on subsampling with replacement (SSR) than…

Machine Learning · Statistics 2015-11-24 Rong Zhu

Optimal subsampling for large scale Elastic-net regression

Datasets with sheer volume have been generated from fields including computer vision, medical imageology, and astronomy whose large-scale and high-dimensional properties hamper the implementation of classical statistical models. To tackle…

Statistics Theory · Mathematics 2023-05-30 Hang Yu , Zhenxing Dou , Zhiwei Chen , Xiaomeng Yan

Sketched Subspace Clustering

The immense amount of daily generated and communicated data presents unique challenges in their processing. Clustering, the grouping of data without the presence of ground-truth labels, is an important tool for drawing inferences from data.…

Machine Learning · Statistics 2018-02-08 Panagiotis A. Traganitis , Georgios B. Giannakis

Multiway Cluster Robust Double/Debiased Machine Learning

This paper investigates double/debiased machine learning (DML) under multiway clustered sampling environments. We propose a novel multiway cross fitting algorithm and a multiway DML estimator based on this algorithm. We also develop a…

Econometrics · Economics 2020-03-05 Harold D. Chiang , Kengo Kato , Yukun Ma , Yuya Sasaki

Optimal subsampling designs

Subsampling is commonly used to overcome computational and economical bottlenecks in the analysis of finite populations and massive datasets. Existing methods are often limited in scope and use optimality criteria (e.g., A-optimality) with…

Statistics Theory · Mathematics 2023-04-07 Henrik Imberg , Marina Axelson-Fisk , Johan Jonasson

Local Uncertainty Sampling for Large-Scale Multi-Class Logistic Regression

A major challenge for building statistical models in the big data era is that the available data volume far exceeds the computational capability. A common approach for solving this problem is to employ a subsampled dataset that can be…

Computation · Statistics 2018-09-14 Lei Han , Kean Ming Tan , Ting Yang , Tong Zhang

Subspace Clustering through Sub-Clusters

The problem of dimension reduction is of increasing importance in modern data analysis. In this paper, we consider modeling the collection of points in a high dimensional space as a union of low dimensional subspaces. In particular we…

Machine Learning · Statistics 2020-06-12 Weiwei Li , Jan Hannig , Sayan Mukherjee

Optimal subsampling algorithm for the marginal model with large longitudinal data

Big data is ubiquitous in practices, and it has also led to heavy computation burden. To reduce the calculation cost and ensure the effectiveness of parameter estimators, an optimal subset sampling method is proposed to estimate the…

Methodology · Statistics 2023-11-16 Haohui Han , Liya Fu

Compressing Large Sample Data for Discriminant Analysis

Large-sample data became prevalent as data acquisition became cheaper and easier. While a large sample size has theoretical advantages for many statistical methods, it presents computational challenges. Sketching, or compression, is a…

Machine Learning · Statistics 2020-05-11 Alexander F. Lapanowski , Irina Gaynanova